Description
Zarr version
v3
Numcodecs version
v0.15.1
Python Version
3.12
Operating System
Linux
Installation
Mamba
Description
In zarr v2, passing a path (str) ending with *.zip
to zarr.open
or zarr.open_consolidated
would work transparently, the same way as is a *.zarr
path was given.
In zarr v3, this fails with FileNotFoundError : Unable to find group
. The way around is to open a ZipStore
explicitely and then pass that to zarr.open
.
The zarr.open
docstring says:
zarr-python/src/zarr/api/synchronous.py
Lines 166 to 169 in 99621ec
which seems to imply that passing a string path ending in
zip
should work.
Moreover, I found it more convenient when one didn't need to test for file extensions and explicitly handle storage objects. I use zarr dataset through xarray and it seems to me that xr.open_dataset('example.zarr.zip', engine='zarr')
should usually be how a normal user should open a such a file ?
I understand that the ZipStore is still « experimental » in v3, and I really hope you keep it in the officiel scheme because it is very useful, to us at least. Zipped zarrs have many of the benefits of zarr (over netCDF for example), but without the inode-explosion that pure zarr folders create on unix filesystems (slowing down the disk operations).
I think I see that the store guessing happens in zarr.storage._common.make_store_path
?
Like it could happen here:
zarr-python/src/zarr/storage/_common.py
Lines 298 to 309 in 99621ec
Would this convenience be welcomed back in zarr-python ? I could do a PR if the team here agrees with adding this case handling. To avoid pure string checking, one could even use zipfile.is_zipfile
from the standard library to check for zip stores ?
Otherwise, I guess this could be done by xarray
itself ? Many of my scripts go through intake-esm
also, I guess we could fix it there too if the proposal gets refused here.
Steps to reproduce
Example adapted from the doc.
import numpy as np
import zarr
store = zarr.storage.ZipStore("example-3.zip", mode='w')
z = zarr.create_array(
store=store,
shape=(100, 100),
chunks=(10, 10),
dtype="f4"
)
z[:, :] = np.random.random((100, 100))
store.close()
zarr.open('example-3.zip', mode='r')
Additional output
No response