Description
What happened?
I tried to load and then concatenate groups of a zarr file that was loaded with xr.open_datatree
. The groups of the data are years in an ERA5 time series. The concatenated result seems to be more of a mean seasonal cycle than the time series.
What did you expect to happen?
When looping through the groups with xr.open_zarr
, the data is as expected, showing interannual variability.
Minimal Complete Verifiable Example
# %pip install s3fs
import xarray as xr
from matplotlib import pyplot as plt
# data set up for bug report
s3_uri = 's3://spi-greenfjord-public/era5_t2m-test_data_for_bug_report.zarr'
## specifications for zarr + s3 bucket ####################################
kwargs = dict(
consolidated=True,
chunks={},
storage_options=dict(
anon=True,
endpoint_url='https://os.zhdk.cloud.switch.ch'))
## xr.open_datatree #######################################################
datatree = xr.open_datatree(s3_uri, engine='zarr', **kwargs)
ds_treecat = xr.combine_nested([datatree[year].ds for year in datatree], concat_dim="time") # same behaviour with xr.concat
ds_treecat = ds_treecat.compute()
## xr.open_zarr #######################################################
ds_zarrlist = [xr.open_zarr(s3_uri, group=year, **kwargs) for year in range(1980, 2023)]
ds_zarrcat = xr.combine_nested(ds_zarrlist, concat_dim='time')
ds_zarrcat = ds_zarrcat.compute()
## Plotting #######################################################
def plot_t2m_time_series(da_hourly, label='', **kwargs):
if 'ax' not in kwargs:
fig, ax = plt.subplots(figsize=(12, 3), dpi=140)
kwargs['ax'] = ax
da_daily = da_hourly.resample(time='1D').mean()
da_yearly = da_hourly.resample(time='1YS').mean()
props = dict(lw=0.2) | kwargs
da_daily.plot(**props)
props = props | dict(lw=5, label=label, c=ax.get_lines()[-1].get_color())
da_yearly.plot(**props)
return fig, ax
_, ax0 = plot_t2m_time_series(ds_treecat.t2m, label="xr.open_datatree(s3_uri, engine='zarr', consolidated=True, chunks={{}})")
_, ax1 = plot_t2m_time_series(ds_zarrcat.t2m, label="xr.open_zarr(s3_uri, group=year, consolidated=True) ...", c='C1')
for ax in [ax0, ax1]:
ax.set_title('ERA5 2m temperature for area in Greenland (1980-2022)', loc='left')
ax.legend(ncol=2, frameon=True, edgecolor='none')
MVCE confirmation
- Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- Complete example — the example is self-contained, including all data and the text of any traceback.
- Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- New issue — a search of GitHub Issues suggests this is not a duplicate.
- Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
No response
Anything else we need to know?
No response
Environment
INSTALLED VERSIONS
commit: None
python: 3.11.10 | packaged by conda-forge | (main, Sep 10 2024, 10:57:35) [Clang 17.0.6 ]
python-bits: 64
OS: Darwin
OS-release: 23.2.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: None
LOCALE: (None, 'UTF-8')
libhdf5: 1.14.4
libnetcdf: 4.9.2
xarray: 2024.11.0
pandas: 2.2.3
numpy: 2.2.0
scipy: 1.14.1
netCDF4: 1.7.2
pydap: None
h5netcdf: 1.4.1
h5py: 3.12.1
zarr: 2.18.4
cftime: 1.6.4.post1
nc_time_axis: None
iris: None
bottleneck: None
dask: 2024.12.1
distributed: None
matplotlib: 3.10.0
cartopy: None
seaborn: None
numbagg: None
fsspec: 2024.12.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 75.6.0
pip: None
conda: None
pytest: None
mypy: None
IPython: 8.30.0
sphinx: None
/Users/luke/SDSC/CryoGrid/era5-downloader/.venv/lib/python3.11/site-packages/_distutils_hack/init.py:30: UserWarning: Setuptools is replacing distutils. Support for replacing an already imported distutils is deprecated. In the future, this condition will fail. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml
warnings.warn(