Don't read entire chunks at a time

Because we're reading entire chunks at a time, we're reading and decompressing a _lot_ more data than we need, which is almost certainly a major bottleneck.

Some ideas:

* Try again to use [`zarr.core.Array(partial_decompress=True)`](https://zarr.readthedocs.io/en/stable/api/core.html) using FSStore & Blosc compression (we're already using Blosc zstd level 5 for NWPs) to read small chunks, especially for NWPs where we're using tiny 2x2 images.
* Try with uncompressed Zarr.  It's possible that the performance increase from compression is far smaller than the performance increase of being able to precisely extract just the data we want from each chunk.

If it's possible to quickly load subsets of each chunk, then modify each Zarr DataSource so it no longer pre-reads entire chunks into memory, but instead, for each batch, loads each example separately using a different thread.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Don't read entire chunks at a time #57

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Don't read entire chunks at a time #57

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions