Skip to content

[v3] benchmarks and performance tools #2034

Open
@d-v-b

Description

@d-v-b

It would be very useful to record and publish benchmarks of how zarr-python performs in various workloads. Especially with the addition of sharding, I think people working with Zarr will benefit from some guidance for how to avoid performance problems. And without benchmarks, we can't do performance optimization of zarr-python itself.

So we should write some benchmarking code, tracking things like duration and memory usage for a few core workloads, like:

  • writing chunks to an array
  • reading chunks from an array
  • creating arrays and groups
  • deleting chunks from an existing array

As a reach goal, the benchmark code itself should be useful to people who want to check zarr-python performance on different compute / storage backends.

it looks like @JackKelly already started work in this direction at https://github.com/zarr-developers/zarr-benchmark. @JackKelly does the direction i'm proposing align with your vision for that repo?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew features or improvementsperformancePotential issues with Zarr performance (I/O, memory, etc.)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions