Description
It would be very useful to record and publish benchmarks of how zarr-python performs in various workloads. Especially with the addition of sharding, I think people working with Zarr will benefit from some guidance for how to avoid performance problems. And without benchmarks, we can't do performance optimization of zarr-python itself.
So we should write some benchmarking code, tracking things like duration and memory usage for a few core workloads, like:
- writing chunks to an array
- reading chunks from an array
- creating arrays and groups
- deleting chunks from an existing array
As a reach goal, the benchmark code itself should be useful to people who want to check zarr-python
performance on different compute / storage backends.
it looks like @JackKelly already started work in this direction at https://github.com/zarr-developers/zarr-benchmark. @JackKelly does the direction i'm proposing align with your vision for that repo?