Skip to content
This repository has been archived by the owner on Sep 11, 2023. It is now read-only.

Don't read entire chunks at a time #57

Closed
JackKelly opened this issue Jul 19, 2021 · 4 comments
Closed

Don't read entire chunks at a time #57

JackKelly opened this issue Jul 19, 2021 · 4 comments

Comments

@JackKelly
Copy link
Member

Because we're reading entire chunks at a time, we're reading and decompressing a lot more data than we need, which is almost certainly a major bottleneck.

Some ideas:

  • Try again to use zarr.core.Array(partial_decompress=True) using FSStore & Blosc compression (we're already using Blosc zstd level 5 for NWPs) to read small chunks, especially for NWPs where we're using tiny 2x2 images.
  • Try with uncompressed Zarr. It's possible that the performance increase from compression is far smaller than the performance increase of being able to precisely extract just the data we want from each chunk.

If it's possible to quickly load subsets of each chunk, then modify each Zarr DataSource so it no longer pre-reads entire chunks into memory, but instead, for each batch, loads each example separately using a different thread.

@JackKelly
Copy link
Member Author

JackKelly commented Jul 19, 2021

highly relevant:

which includes this text (copy-and-pasted for convenience):

  • zarr-specs#59 (zarr specs for partial chunk reads)
  • zarr-python#40 (partial chunk reads feature request)
  • zarr-python#521 (partial chunk reads feature request)
  • zarr-python#584 (partial decompress draft PR)
  • it sounds like even with partial decompress, we will need to fetch full compressed file

@JackKelly
Copy link
Member Author

PR for partial_decompress, including lots of performance graphs: zarr-developers/zarr-python#667

@JackKelly
Copy link
Member Author

This issue will be made obsolete (i.e. won't matter) if #58 works :)

@JackKelly
Copy link
Member Author

JackKelly commented Jul 23, 2021

No longer relevant. Made obsolete by #58

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant