Skip to content

Can't conveniently open zip store from path with zarr v3 #2831

Open
@aulemahal

Description

@aulemahal

Zarr version

v3

Numcodecs version

v0.15.1

Python Version

3.12

Operating System

Linux

Installation

Mamba

Description

In zarr v2, passing a path (str) ending with *.zip to zarr.open or zarr.open_consolidated would work transparently, the same way as is a *.zarr path was given.

In zarr v3, this fails with FileNotFoundError : Unable to find group. The way around is to open a ZipStore explicitely and then pass that to zarr.open.

The zarr.open docstring says:

Parameters
----------
store : Store or str, optional
Store or path to directory in file system or name of zip file.

which seems to imply that passing a string path ending in zip should work.

Moreover, I found it more convenient when one didn't need to test for file extensions and explicitly handle storage objects. I use zarr dataset through xarray and it seems to me that xr.open_dataset('example.zarr.zip', engine='zarr') should usually be how a normal user should open a such a file ?

I understand that the ZipStore is still « experimental » in v3, and I really hope you keep it in the officiel scheme because it is very useful, to us at least. Zipped zarrs have many of the benefits of zarr (over netCDF for example), but without the inode-explosion that pure zarr folders create on unix filesystems (slowing down the disk operations).

I think I see that the store guessing happens in zarr.storage._common.make_store_path ?
Like it could happen here:

elif isinstance(store_like, Path):
store = await LocalStore.open(root=store_like, read_only=_read_only)
elif isinstance(store_like, str):
storage_options = storage_options or {}
if _is_fsspec_uri(store_like):
used_storage_options = True
store = FsspecStore.from_url(
store_like, storage_options=storage_options, read_only=_read_only
)
else:
store = await LocalStore.open(root=Path(store_like), read_only=_read_only)

Would this convenience be welcomed back in zarr-python ? I could do a PR if the team here agrees with adding this case handling. To avoid pure string checking, one could even use zipfile.is_zipfile from the standard library to check for zip stores ?

Otherwise, I guess this could be done by xarray itself ? Many of my scripts go through intake-esm also, I guess we could fix it there too if the proposal gets refused here.

Steps to reproduce

Example adapted from the doc.

import numpy as np
import zarr

store = zarr.storage.ZipStore("example-3.zip", mode='w')
z = zarr.create_array(
    store=store,
    shape=(100, 100),
    chunks=(10, 10),
    dtype="f4"
)
z[:, :] = np.random.random((100, 100))
store.close()

zarr.open('example-3.zip', mode='r')

Additional output

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugPotential issues with the zarr-python library

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions