Using the Zarr library to read HDF5

The USGS contracted the HDFGroup to do a test: 
Could we make HDF5 format as performant on the Cloud as Zarr format by writing the HDF5 chunk locations into `.zmetadata` and then having the Zarr library read from those chunks instead of Zarr format chunks? 

From our first test the answer appears to be YES: https://gist.github.com/rsignell-usgs/3cbe15670bc2be05980dec7c5947b540

We modified both the `zarr` and `xarray` libraries to make that notebook possible, adding the `FileChunkStore` concept.  The modified libraries are: https://github.com/rsignell-usgs/hurricane-ike-water-levels/blob/zarr-hdf5/binder/environment.yml#L20-L21

Feel free to try running the notebook yourself: [![Binder](https://aws-uswest2-binder.pangeo.io/badge_logo.svg)](https://aws-uswest2-binder.pangeo.io/v2/gh/rsignell-usgs/hurricane-ike-water-levels.git/zarr-hdf5?filepath=zarr_vs_zarr-hdf5.ipynb)
(If you run into a 'stream is closed` error computing the max of the zarr data, just run the cell again. 
 I'm trying to figure out why that error occurs sometimes)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Using the Zarr library to read HDF5 #535

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Using the Zarr library to read HDF5 #535

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions