feat: structured telemetry module for observability#2107
Open
Ayush10 wants to merge 1 commit intomicrosoft:mainfrom
Open
feat: structured telemetry module for observability#2107Ayush10 wants to merge 1 commit intomicrosoft:mainfrom
Ayush10 wants to merge 1 commit intomicrosoft:mainfrom
Conversation
Introduce a pluggable telemetry framework (qlib/utils/telemetry.py) that provides metrics collection and workflow tracing with zero overhead when no backend is registered. Core components: - QlibMetrics: counter/gauge/histogram with pluggable backends - QlibTracer: context-manager spans with parent-child tracking - LoggingBackend: integrates with existing get_module_logger - InMemoryBackend: for testing and programmatic access Proof-of-concept instrumentation: - DataHandlerLP.setup_data: span tracing + row/column gauges - DataHandlerLP._run_proc_l: per-processor span tracing - MemCacheUnit: cache hit counter Includes 29 unit tests covering metrics, tracing, thread safety, nested spans, error recording, and backend isolation.
9fcbbd6 to
b0f0e1e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #2098
This PR introduces a pluggable telemetry framework for Qlib that provides structured metrics collection and workflow tracing. It ships as a foundational module with proof-of-concept instrumentation, designed to be extended incrementally across the codebase.
Architecture
Core Components (
qlib/utils/telemetry.py)MetricEvent/SpanEventMetricsBackend(ABC)QlibMetricsQlibTracerLoggingBackendget_module_loggerinfrastructureInMemoryBackendsummary()Design Principles
TimeInspectorandget_module_loggerProof-of-Concept Instrumentation
Three high-value instrumentation points demonstrate the pattern:
DataHandlerLP.setup_data()— Span tracing + row/column gauge metricsDataHandlerLP._run_proc_l()— Per-processor span tracing with rows in/outMemCacheUnit.__getitem__()— Cache hit counterUsage
Suggested Follow-up Work
This PR is intentionally scoped as a foundation. Subsequent PRs could:
fit()/predict()) and backtesting workflowsFileBackendfor JSON/CSV metric exportOpenTelemetryBackendfor production observabilityExpressionCacheandDatasetCachefor cache hit ratiosTest Plan
tests/test_telemetry.pycovering:@traceddecorator, thread safety (10 concurrent threads)python -m pytest tests/test_telemetry.py -v