This repository was archived by the owner on Sep 11, 2023. It is now read-only.
This repository was archived by the owner on Sep 11, 2023. It is now read-only.
Don't read entire chunks at a time #57
Closed
Description
Because we're reading entire chunks at a time, we're reading and decompressing a lot more data than we need, which is almost certainly a major bottleneck.
Some ideas:
- Try again to use
zarr.core.Array(partial_decompress=True)
using FSStore & Blosc compression (we're already using Blosc zstd level 5 for NWPs) to read small chunks, especially for NWPs where we're using tiny 2x2 images. - Try with uncompressed Zarr. It's possible that the performance increase from compression is far smaller than the performance increase of being able to precisely extract just the data we want from each chunk.
If it's possible to quickly load subsets of each chunk, then modify each Zarr DataSource so it no longer pre-reads entire chunks into memory, but instead, for each batch, loads each example separately using a different thread.
Metadata
Metadata
Assignees
Labels
No labels