zarr.core.group.Group
does not allow to access nested groups using file path-like syntax (?) #2765
Description
Zarr version
v3.0
Numcodecs version
v0.15
Python Version
3.11
Operating System
Linux
Installation
conda
Description
Hi everyone,
Working on #9960 issue on Xarray, I discovered that the new Zarr python version does not allow access to group members using file path-like syntax.
Steps to reproduce
This is an MCVE:
import xarray
import xarray.testing as xt
import numpy as np
import zarr
if __name__ == "__main__":
ds_a = xarray.Dataset({
"A": (("x", "y"), np.ones((128, 256))),
})
ds_b = xarray.Dataset({
"B": (("y", "x"), np.ones((256, 128)) * 2)
})
ds_c = xarray.Dataset({
"G": (("x", "y"), np.zeros((128, 256)))
})
ds_rt = xarray.Dataset({
"z": (("x", "y"), np.zeros((128, 256))),
"w": (("x"), np.random.rand(128))
})
dt = xarray.DataTree.from_dict(
{
"/": ds_rt,
"/a": ds_a,
"/b": ds_b,
"/c/d": ds_c
}
)
path = "testv3_dt.zarr"
dt.to_zarr(path, compute=True, mode="w")
The group paths in this datatree are ['/', '/c', '/c/d', '/b', '/a']
. However, when opening the zarr store back it returns None
when trying to get any of these paths
kwargs = {'mode': 'r', 'path': '/', 'storage_options': None, 'synchronizer': None, 'zarr_format': None}
store = zarr.open_consolidated(path, **kwargs)
print(store.get("/a"))
None
Digging a little bit more, I found out that we can get the path for each group in zarr-python v3 using the store.members()
method as follows and this will allow us to get the groups within the zarr store.
print([path for path, _ in store.members()])
['a', 'b', 'w', 'c', 'z']
Now, we can access the nested groups using these results
print(store.get("a"))
<Group file://testv3_dt.zarr/a>
Shall zarr-pyhton v3 groups support file path-like syntax to access groups?
Another thing that I noticed is that datasets stored at the root level (ds_rt
that contains z
and w
dataArrays) are not represented as a group (root group "/") but instead represented as zarr Arrays.
print(store.get("z"))
<Array file://testv3_dt.zarr/z shape=(128, 256) dtype=float64>
How could we access the root group (store.get("/")
) instead of directly the arrays (store.get("z")
)?
Additional output
The Zarr python v2 used return a <class 'zarr.hierarchy.Group'>
which allowed us to access nested groups using file path-like syntax.
## this part of the code was executed using zarr-python v2
kwargs = {'mode': 'r', 'path': '/', 'storage_options': None, 'synchronizer': None}
store = zarr.open_consolidated(path, **kwargs)
print(store.get("/"))
<zarr.hierarchy.Group '/' read-only>
print(store.get("/a"))
<zarr.hierarchy.Group '/a' read-only>