Discussion: zarr-developers/zarr-python#2904
This benchmark writes a (100, 1000, 1000) ndarray of float32 data split into 10 chunks along the first dimension.
| Component | Shape | nbytes |
|---|---|---|
| Chunk | (10, 1000, 1000) |
40,000,000 |
| Array | (100, 1000, 1000) |
400,000,000 |
Peak memory usage is about 1.1 GiB.
Questions: what's the memory overhead of zstd? Do we know the uncompressed size? Can we tell zstd that?
Can we effectively readinto the decompression buffer? Maybe...
Why does buf.as_numpy_array apparently allocate memory?
For this special case, the peak memory usage ought to be the size of the ndarray. Currently, it's about 2x.
This is probably because LocalStore uses path.read_bytes, and then we put that into an array using prototype.buffer.from_bytes. See here.
We would optimially use readinto into the memory backing the out ndarray. With enough effort that's probably doable. Given how rare uncompressed data is in practice, it might not be worthwhile.
As a test for what's possible, sol.py implements basic reads for compressed and uncompressed data.
- read uncompressed: 381.5 MiB (~1x the size of the array. Best we can do.)
- read compressed: 734.1 MiB (size of the array + size of the compressed data. Best we can do.)