Skip to content

Opening a datatree from S3 bucket with zarr store #9197

Open
@vlevasseur073

Description

@vlevasseur073

What happened?

Trying to open a datatree using the Zarr backend from a zarr file stored in a private S3 bucket leads to the following error:

GroupNotFoundError: group not found at path ''

This issue was already in the xarray-contrib/datatree, see xarray-contrib/datatree#322
The fix could be more or less the same, but at that time I did not take time to propose a PR.

What did you expect to happen?

The open_datatree function from zarr.py has a storage_options argument. Yet this argument is not passed to the ZarrStore.open_store.

Minimal Complete Verifiable Example

import xarray.backends.api as xr_api
storage_options = { 
"s3": {
        "key": [access-key]
        "secret": [secret-key],
        "endpoint_url": [endpoint-url]
    }
}
dt=xr_api.open_datatree("s3://path/to/product",engine="zarr",storage_options=storage_options)
dt

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

A possible fix could be, in xarray.backends.zarr.open_datatree:

filename_or_obj = _normalize_path(filename_or_obj)
        if group:
            parent = NodePath("/") / NodePath(group)
            stores = ZarrStore.open_store(filename_or_obj, group=parent,storage_options=storage_options)
            if not stores:
                ds = open_dataset(
                    filename_or_obj, group=parent, engine="zarr", **kwargs
                )
                return DataTree.from_dict({str(parent): ds})
        else:
            parent = NodePath("/")
            stores = ZarrStore.open_store(filename_or_obj, group=parent,storage_options=storage_options)
        if storage_options:
            kwargs["backend_kwargs"] = {"storage_options": storage_options}
        ds = open_dataset(filename_or_obj, group=parent, engine="zarr", **kwargs)

As a summary:

  • add storage_options in ZarrStore.open_store
  • set backend_kwargs in open_dataset

Environment

INSTALLED VERSIONS

commit: None
python: 3.11.9 (main, Apr 19 2024, 16:48:06) [GCC 11.2.0]
python-bits: 64
OS: Linux
OS-release: 5.15.0-113-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8
LOCALE: ('fr_FR', 'UTF-8')
libhdf5: 1.14.2
libnetcdf: 4.9.3-development

xarray: 2024.6.0
pandas: 2.2.2
numpy: 2.0.0
scipy: 1.13.1
netCDF4: 1.7.1
pydap: None
h5netcdf: 1.3.0
h5py: 3.11.0
zarr: 2.18.2
cftime: 1.6.4
nc_time_axis: None
iris: None
bottleneck: None
dask: 2024.6.2
distributed: None
matplotlib: 3.9.0
cartopy: None
seaborn: None
numbagg: None
fsspec: 2024.6.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 69.5.1
pip: 24.0
conda: None
pytest: None
mypy: None
IPython: 8.26.0
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugtopic-DataTreeRelated to the implementation of a DataTree class

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions