Skip to content

jsignell/VirtualiZarr

 
 

Repository files navigation

VirtualiZarr

VirtualiZarr creates virtual Zarr stores for cloud-friendly access to archival data, using familiar xarray syntax.

VirtualiZarr (pronounced like "virtualize" but more piratey) grew out of discussions on the kerchunk repository, and is an attempt to provide the game-changing power of kerchunk in a zarr-native way, and with a familiar array-like API.

Please see the documentation

Development Status and Roadmap

VirtualiZarr is ready to use for many of the tasks that we are used to using kerchunk for, but the most general and powerful vision of this library can only be implemented once certain changes upstream in Zarr have occurred.

VirtualiZarr is therefore evolving in tandem with developments in the Zarr Specification, which then need to be implemented in specific Zarr reader implementations (especially the Zarr-Python V3 implementation). There is an overall roadmap for this integration with Zarr, whose final completion requires acceptance of at least two new Zarr Enhancement Proposals (the "Chunk Manifest" and "Virtual Concatenation" ZEPs).

Whilst we wait for these upstream changes, in the meantime VirtualiZarr aims to provide utility in a significant subset of cases, for example by enabling writing virtualized zarr stores out to the existing kerchunk references format, so that they can be read by fsspec today.

Credits

This package was originally developed by Tom Nicholas whilst working at [C]Worthy, who deserve credit for allowing him to prioritise a generalizable open-source solution to the dataset virtualization problem. VirtualiZarr is now a community-owned multi-stakeholder project.

Licence

Apache 2.0

About

Create virtual Zarr stores from archival data files using xarray syntax

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%