Skip to content

Dataset.to_zarr() with mode='a' does not work with groups #3170

Closed
@VincentDehaye

Description

@VincentDehaye

MCVE Code Sample

import xarray as xr
import numpy as np
from s3fs import S3FileSystem, S3Map

s3 = S3FileSystem()
bucket_name = 'your-bucket-name'
s3_path = bucket_name + 'some_path.zarr'
store = S3Map(s3_path, s3=s3)
for i in range(6):
    if i%2 == 0:
        group = 'Group1'
    else:
        group = 'Group2'
    lead_time = i//2
    var1 = np.random.rand(1)
    ds = xr.Dataset({'var1': (['lead_time'], var1)},
                    coords={'lead_time': [lead_time]})
    ds.to_zarr(store=store, mode='a', append_dim='lead_time', group=group)

Output

This code returns the following error:

Traceback (most recent call last):
  File "/home/vincent/anaconda3/envs/hanover_backend/lib/python3.7/site-packages/xarray/core/dataset.py", line 1019, in _construct_dataarray
    variable = self._variables[name]
KeyError: 'var1'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/vincent/Documents/Greenlytics/SiteForecast/debugging-script.py", line 201, in <module>
    ds.to_zarr(store=store, mode='a', append_dim='lead_time', group=group)
  File "/home/vincent/anaconda3/envs/hanover_backend/lib/python3.7/site-packages/xarray/core/dataset.py", line 1433, in to_zarr
    consolidated=consolidated, append_dim=append_dim)
  File "/home/vincent/anaconda3/envs/hanover_backend/lib/python3.7/site-packages/xarray/backends/api.py", line 1101, in to_zarr
    dump_to_store(dataset, zstore, writer, encoding=encoding)
  File "/home/vincent/anaconda3/envs/hanover_backend/lib/python3.7/site-packages/xarray/backends/api.py", line 929, in dump_to_store
    unlimited_dims=unlimited_dims)
  File "/home/vincent/anaconda3/envs/hanover_backend/lib/python3.7/site-packages/xarray/backends/zarr.py", line 358, in store
    variables_with_encoding[vn].encoding = ds[vn].encoding
  File "/home/vincent/anaconda3/envs/hanover_backend/lib/python3.7/site-packages/xarray/core/dataset.py", line 1103, in __getitem__
    return self._construct_dataarray(key)
  File "/home/vincent/anaconda3/envs/hanover_backend/lib/python3.7/site-packages/xarray/core/dataset.py", line 1022, in _construct_dataarray
    self._variables, name, self._level_coords, self.dims)
  File "/home/vincent/anaconda3/envs/hanover_backend/lib/python3.7/site-packages/xarray/core/dataset.py", line 91, in _get_virtual_variable
    ref_var = variables[ref_name]
KeyError: 'var1'
The KeyError can happen on a variable name as well as on a dimension name, it depends on the runs.

Problem Description

I am trying to use the append mode introduced in the PR #2706 on zarr groups. This raises a KeyError as you can see in the trace above. Is it a bug or a feature that is not supported (yet)?

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.1 (default, Dec 14 2018, 19:28:38)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 4.18.0-20-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.2
libnetcdf: 4.6.3

xarray: 0.12.3
pandas: 0.24.2
numpy: 1.16.2
scipy: 1.2.1
netCDF4: 1.5.1.2
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.3.1
cftime: 1.0.3.4
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: 0.9.6.1.post1
iris: None
bottleneck: None
dask: 1.1.5
distributed: None
matplotlib: 3.0.3
cartopy: None
seaborn: None
numbagg: None
setuptools: 40.8.0
pip: 19.0.3
conda: None
pytest: None
IPython: 7.5.0
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    topic-zarrRelated to zarr storage library

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions