Description
Summary
This feature request proposes adding an optional integration of OpenTelemetry to the Zarr-Python codebase. OpenTelemetry is a widely adopted, vendor-neutral standard for generating, collecting, and exporting telemetry data (traces, metrics, and logs) used by many modern observability platforms. The goal is to improve observability, facilitate performance tuning, and enable integration with full-stack monitoring systems — all while preserving a lightweight default behavior.
📌 Motivation
Zarr is widely used in performance-critical and production environments such as:
- Large-scale data processing
- Scientific computing
- Cloud-native workflows
- Backend data source for web APIs (e.g. Xpublish)
Currently, Zarr provides limited visibility into internal operations like:
- Chunk reads/writes
- Compression and decompression
- Storage backend access
- Performance bottlenecks
By integrating OpenTelemetry (OTel), Zarr users and developers would benefit from:
- Enhanced observability into internal workflows
- Easier performance tuning via traces and profiling tools (e.g., Jaeger, Zipkin, Grafana Tempo)
- Seamless integration into modern observability pipelines
☝ Each of these are particularly important following Zarr's recent adoption of asyncio - where the execution of concurrent operations is increasingly hard to track explicitly.
🧩 Proposal
- Introduce optional support for OpenTelemetry instrumentation in key parts of the Zarr codebase:
- Data access (inside stores)
- Compression/decompression
- Encoding/decoding
- Provide a clean interface or hooks to register and emit OpenTelemetry traces.
- Default behavior should be:
- No-op (i.e. tracing is disabled unless explicitly enabled)
- Optionally fall back to a basic Python logger for basic introspection
- Ensure zero overhead when OpenTelemetry is not enabled
✅ Benefits
- Opt-in observability with minimal performance impact
- Compatibility with OpenTelemetry-native tools and frameworks
- Aids in debugging and performance analysis
- Foundation for future enhancements (e.g., metrics, structured logging)
🛠️ Implementation Notes
- Introduce a
tracing.py
module (or similar) to encapsulate OpenTelemetry usage - Use
@contextmanager
orTracer.start_as_current_span()
decorators in key areas - Conditional instrumentation based on config or environment variable(s)
🙋♂️ Call for Feedback
We would love to hear from maintainers and the community:
- Does OpenTelemetry seem like a good fit for Zarr?
- Are there specific areas of the codebase that would benefit most from tracing?
- Would a structured logger fallback be helpful in low-overhead environments?