Description
Let's list all the file formats that could potentially be represented efficiently as "virtual zarr" - i.e. zarr + chunk manifests.
The important criteria here is that the format must store data in a small number of contiguous chunks, such that access using http range requests to object storage is efficient. This rules out some formats, for example I don't think we can efficiently access this format that @kmuehlbauer mentioned over in openradar/xradar#187 (comment):
file formats where variables are written interleaved within one chunk of data (eg: 100 bytes v1, 100 bytes v2, 100 bytes v3, 100 bytes v1, 100 bytes v2, 100 bytes v3, ...)? Is there something like strides available?
If we start thinking of Zarr as a "SuperFormat" (super as in superset, not as in super-duper), then this is the list of existing formats comprising that set of what can be referenced using chunk manifests (see zarr-developers/zarr-specs#287).
Definitely can support:
- HDF5
- NetCDF4
- GRIB Non-kerchunk backend for GRIB files. #312
- TIFF
- FITS
- DMR++ Reading from dmrcp index files? #85
- Zarr itself! (a "Zarr of Zarrs") Add Zarr Reader(s) #262
Probably can support:
- HDF4 Support HDF4? #216
- arbitrary collections of
.npz
files - HEC-RAS HDF5 data Nested HDF5 Data / HEC-RAS fsspec/kerchunk#490
- Hugging Face safetensors (see Reader for Hugging Face's SafeTensor format #367)
Maybe can support?
- CSV CSV reader #200
- Virtual Rasters GDAL Virtual Rasters #166
- rainbow Kerchunk / VirtualiZarr way to open radar files openradar/xradar#187 (comment)
- MATLAB
.mat
files (specification documented here) - Parquet? Virtualize Parquet? #441
- VCF?
Probably can't support:
- furuno Kerchunk / VirtualiZarr way to open radar files openradar/xradar#187 (comment)
- sigmet/nexrad Kerchunk / VirtualiZarr way to open radar files openradar/xradar#187 (comment)
(The checkboxes indicate whether or not a working implementation already exists - going through kerchunks' in-memory format as an intermediate or creating a ManifestArray
directly.)