Skip to content

Feature Request: Optional OpenTelemetry Integration for Observability and Performance Tuning #2958

Open
@jhamman

Description

@jhamman

Summary

This feature request proposes adding an optional integration of OpenTelemetry to the Zarr-Python codebase. OpenTelemetry is a widely adopted, vendor-neutral standard for generating, collecting, and exporting telemetry data (traces, metrics, and logs) used by many modern observability platforms. The goal is to improve observability, facilitate performance tuning, and enable integration with full-stack monitoring systems — all while preserving a lightweight default behavior.


📌 Motivation

Zarr is widely used in performance-critical and production environments such as:

  • Large-scale data processing
  • Scientific computing
  • Cloud-native workflows
  • Backend data source for web APIs (e.g. Xpublish)

Currently, Zarr provides limited visibility into internal operations like:

  • Chunk reads/writes
  • Compression and decompression
  • Storage backend access
  • Performance bottlenecks

By integrating OpenTelemetry (OTel), Zarr users and developers would benefit from:

  • Enhanced observability into internal workflows
  • Easier performance tuning via traces and profiling tools (e.g., Jaeger, Zipkin, Grafana Tempo)
  • Seamless integration into modern observability pipelines

☝ Each of these are particularly important following Zarr's recent adoption of asyncio - where the execution of concurrent operations is increasingly hard to track explicitly.


🧩 Proposal

  • Introduce optional support for OpenTelemetry instrumentation in key parts of the Zarr codebase:
    • Data access (inside stores)
    • Compression/decompression
    • Encoding/decoding
  • Provide a clean interface or hooks to register and emit OpenTelemetry traces.
  • Default behavior should be:
    • No-op (i.e. tracing is disabled unless explicitly enabled)
    • Optionally fall back to a basic Python logger for basic introspection
  • Ensure zero overhead when OpenTelemetry is not enabled

✅ Benefits

  • Opt-in observability with minimal performance impact
  • Compatibility with OpenTelemetry-native tools and frameworks
  • Aids in debugging and performance analysis
  • Foundation for future enhancements (e.g., metrics, structured logging)

🛠️ Implementation Notes

  • Introduce a tracing.py module (or similar) to encapsulate OpenTelemetry usage
  • Use @contextmanager or Tracer.start_as_current_span() decorators in key areas
  • Conditional instrumentation based on config or environment variable(s)

🙋‍♂️ Call for Feedback

We would love to hear from maintainers and the community:

  • Does OpenTelemetry seem like a good fit for Zarr?
  • Are there specific areas of the codebase that would benefit most from tracing?
  • Would a structured logger fallback be helpful in low-overhead environments?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions