Description
The USGS contracted the HDFGroup to do a test:
Could we make HDF5 format as performant on the Cloud as Zarr format by writing the HDF5 chunk locations into .zmetadata
and then having the Zarr library read from those chunks instead of Zarr format chunks?
From our first test the answer appears to be YES: https://gist.github.com/rsignell-usgs/3cbe15670bc2be05980dec7c5947b540
We modified both the zarr
and xarray
libraries to make that notebook possible, adding the FileChunkStore
concept. The modified libraries are: https://github.com/rsignell-usgs/hurricane-ike-water-levels/blob/zarr-hdf5/binder/environment.yml#L20-L21
Feel free to try running the notebook yourself:
(If you run into a 'stream is closed` error computing the max of the zarr data, just run the cell again.
I'm trying to figure out why that error occurs sometimes)