Skip to content

zarr.core.group.Group does not allow to access nested groups using file path-like syntax (?) #2765

Open
@aladinor

Description

Zarr version

v3.0

Numcodecs version

v0.15

Python Version

3.11

Operating System

Linux

Installation

conda

Description

Hi everyone,

Working on #9960 issue on Xarray, I discovered that the new Zarr python version does not allow access to group members using file path-like syntax.

Steps to reproduce

This is an MCVE:

import xarray
import xarray.testing as xt
import numpy as np
import zarr


if __name__ == "__main__":
    ds_a = xarray.Dataset({
      "A": (("x", "y"), np.ones((128, 256))),
    })
    ds_b = xarray.Dataset({
        "B": (("y", "x"), np.ones((256, 128)) * 2)
    })
    ds_c = xarray.Dataset({
        "G": (("x", "y"), np.zeros((128, 256)))
    })
    ds_rt = xarray.Dataset({
        "z": (("x", "y"), np.zeros((128, 256))),
        "w": (("x"), np.random.rand(128))
    })

    dt = xarray.DataTree.from_dict(
        {
            "/": ds_rt,
            "/a": ds_a,
            "/b": ds_b,
            "/c/d": ds_c
        }
    )
    path = "testv3_dt.zarr"
    dt.to_zarr(path, compute=True, mode="w")

The group paths in this datatree are ['/', '/c', '/c/d', '/b', '/a']. However, when opening the zarr store back it returns None when trying to get any of these paths

kwargs = {'mode': 'r', 'path': '/', 'storage_options': None, 'synchronizer': None, 'zarr_format': None}
store = zarr.open_consolidated(path, **kwargs)
print(store.get("/a"))
None

Digging a little bit more, I found out that we can get the path for each group in zarr-python v3 using the store.members() method as follows and this will allow us to get the groups within the zarr store.

print([path for path, _ in store.members()])
['a', 'b', 'w', 'c', 'z']

Now, we can access the nested groups using these results

print(store.get("a"))
<Group file://testv3_dt.zarr/a>

Shall zarr-pyhton v3 groups support file path-like syntax to access groups?

Another thing that I noticed is that datasets stored at the root level (ds_rt that contains z and w dataArrays) are not represented as a group (root group "/") but instead represented as zarr Arrays.

print(store.get("z"))
<Array file://testv3_dt.zarr/z shape=(128, 256) dtype=float64>

How could we access the root group (store.get("/")) instead of directly the arrays (store.get("z"))?

Additional output

The Zarr python v2 used return a <class 'zarr.hierarchy.Group'> which allowed us to access nested groups using file path-like syntax.

## this part of the code was executed using zarr-python v2
kwargs = {'mode': 'r', 'path': '/', 'storage_options': None, 'synchronizer': None}
store = zarr.open_consolidated(path, **kwargs)

print(store.get("/"))
<zarr.hierarchy.Group '/' read-only>

print(store.get("/a"))
<zarr.hierarchy.Group '/a' read-only>

Metadata

Assignees

No one assigned

    Labels

    bugPotential issues with the zarr-python library

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions