Skip to content

Concatenating zarr groups with xr.open_datatree results in bad output #9912

Open
@lukegre

Description

@lukegre

What happened?

I tried to load and then concatenate groups of a zarr file that was loaded with xr.open_datatree. The groups of the data are years in an ERA5 time series. The concatenated result seems to be more of a mean seasonal cycle than the time series.

image

What did you expect to happen?

When looping through the groups with xr.open_zarr, the data is as expected, showing interannual variability.

image

Minimal Complete Verifiable Example

# %pip install s3fs

import xarray as xr
from matplotlib import pyplot as plt

# data set up for bug report
s3_uri = 's3://spi-greenfjord-public/era5_t2m-test_data_for_bug_report.zarr'

## specifications for zarr + s3 bucket ####################################
kwargs = dict(
    consolidated=True, 
    chunks={},
    storage_options=dict(
        anon=True, 
        endpoint_url='https://os.zhdk.cloud.switch.ch'))

## xr.open_datatree #######################################################
datatree = xr.open_datatree(s3_uri, engine='zarr', **kwargs)
ds_treecat = xr.combine_nested([datatree[year].ds for year in datatree], concat_dim="time")  # same behaviour with xr.concat
ds_treecat = ds_treecat.compute()

## xr.open_zarr #######################################################
ds_zarrlist = [xr.open_zarr(s3_uri, group=year, **kwargs) for year in range(1980, 2023)]
ds_zarrcat = xr.combine_nested(ds_zarrlist, concat_dim='time')
ds_zarrcat = ds_zarrcat.compute()

## Plotting #######################################################
def plot_t2m_time_series(da_hourly, label='', **kwargs):
    if 'ax' not in kwargs:
        fig, ax = plt.subplots(figsize=(12, 3), dpi=140)
        kwargs['ax'] = ax

    da_daily = da_hourly.resample(time='1D').mean()
    da_yearly = da_hourly.resample(time='1YS').mean()

    props = dict(lw=0.2) | kwargs
    da_daily.plot(**props)

    props = props | dict(lw=5, label=label, c=ax.get_lines()[-1].get_color())
    da_yearly.plot(**props)
    return fig, ax


_, ax0 = plot_t2m_time_series(ds_treecat.t2m, label="xr.open_datatree(s3_uri, engine='zarr', consolidated=True, chunks={{}})")
_, ax1 = plot_t2m_time_series(ds_zarrcat.t2m, label="xr.open_zarr(s3_uri, group=year, consolidated=True) ...", c='C1')

for ax in [ax0, ax1]:
    ax.set_title('ERA5 2m temperature for area in Greenland (1980-2022)', loc='left')
    ax.legend(ncol=2, frameon=True, edgecolor='none')

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None
python: 3.11.10 | packaged by conda-forge | (main, Sep 10 2024, 10:57:35) [Clang 17.0.6 ]
python-bits: 64
OS: Darwin
OS-release: 23.2.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: None
LOCALE: (None, 'UTF-8')
libhdf5: 1.14.4
libnetcdf: 4.9.2

xarray: 2024.11.0
pandas: 2.2.3
numpy: 2.2.0
scipy: 1.14.1
netCDF4: 1.7.2
pydap: None
h5netcdf: 1.4.1
h5py: 3.12.1
zarr: 2.18.4
cftime: 1.6.4.post1
nc_time_axis: None
iris: None
bottleneck: None
dask: 2024.12.1
distributed: None
matplotlib: 3.10.0
cartopy: None
seaborn: None
numbagg: None
fsspec: 2024.12.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 75.6.0
pip: None
conda: None
pytest: None
mypy: None
IPython: 8.30.0
sphinx: None
/Users/luke/SDSC/CryoGrid/era5-downloader/.venv/lib/python3.11/site-packages/_distutils_hack/init.py:30: UserWarning: Setuptools is replacing distutils. Support for replacing an already imported distutils is deprecated. In the future, this condition will fail. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml
warnings.warn(

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugtopic-DataTreeRelated to the implementation of a DataTree classtopic-combinecombine/concat/merge

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions