This repository has been archived by the owner on Sep 11, 2023. It is now read-only.
-
-
Notifications
You must be signed in to change notification settings - Fork 6
Don't read entire chunks at a time #57
Comments
highly relevant: which includes this text (copy-and-pasted for convenience):
|
PR for |
This issue will be made obsolete (i.e. won't matter) if #58 works :) |
JackKelly
added a commit
that referenced
this issue
Jul 20, 2021
No longer relevant. Made obsolete by #58 |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Because we're reading entire chunks at a time, we're reading and decompressing a lot more data than we need, which is almost certainly a major bottleneck.
Some ideas:
zarr.core.Array(partial_decompress=True)
using FSStore & Blosc compression (we're already using Blosc zstd level 5 for NWPs) to read small chunks, especially for NWPs where we're using tiny 2x2 images.If it's possible to quickly load subsets of each chunk, then modify each Zarr DataSource so it no longer pre-reads entire chunks into memory, but instead, for each batch, loads each example separately using a different thread.
The text was updated successfully, but these errors were encountered: