feat: structured telemetry module for observability #2107
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Closes #2098
This PR introduces a pluggable telemetry framework for Qlib that provides structured metrics collection and workflow tracing. It ships as a foundational module with proof-of-concept instrumentation, designed to be extended incrementally across the codebase.
Architecture
Core Components (
qlib/utils/telemetry.py)MetricEvent/SpanEventMetricsBackend(ABC)QlibMetricsQlibTracerLoggingBackendget_module_loggerinfrastructureInMemoryBackendsummary()Design Principles
TimeInspectorandget_module_loggerProof-of-Concept Instrumentation
Three high-value instrumentation points demonstrate the pattern:
DataHandlerLP.setup_data()— Span tracing + row/column gauge metricsDataHandlerLP._run_proc_l()— Per-processor span tracing with rows in/outMemCacheUnit.__getitem__()— Cache hit counterUsage
Suggested Follow-up Work
This PR is intentionally scoped as a foundation. Subsequent PRs could:
fit()/predict()) and backtesting workflowsFileBackendfor JSON/CSV metric exportOpenTelemetryBackendfor production observabilityExpressionCacheandDatasetCachefor cache hit ratiosTest Plan
tests/test_telemetry.pycovering:@traceddecorator, thread safety (10 concurrent threads)python -m pytest tests/test_telemetry.py -v