Description
What happened?
Got a TypeError when resampling a dataset along a dimension, mapping a function to each group. The function returns a DataArray.
Failed with : TypeError: _overwrite_indexes() got an unexpected keyword argument 'variables'
What did you expect to happen?
This worked before the merging of #5692. A DataArray was returned as expected.
Minimal Complete Verifiable Example
import xarray as xr
ds = xr.tutorial.open_dataset("air_temperature")
ds.resample(time="YS").map(lambda grp: grp.air.mean("time"))
Relevant log output
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [37], in <module>
----> 1 ds.resample(time="YS").map(lambda grp: grp.air.mean("time"))
File /opt/miniconda3/envs/xclim-pip/lib/python3.9/site-packages/xarray/core/resample.py:300, in DatasetResample.map(self, func, args, shortcut, **kwargs)
298 # ignore shortcut if set (for now)
299 applied = (func(ds, *args, **kwargs) for ds in self._iter_grouped())
--> 300 combined = self._combine(applied)
302 return combined.rename({self._resample_dim: self._dim})
File /opt/miniconda3/envs/xclim-pip/lib/python3.9/site-packages/xarray/core/groupby.py:999, in DatasetGroupByBase._combine(self, applied)
997 index, index_vars = create_default_index_implicit(coord)
998 indexes = {k: index for k in index_vars}
--> 999 combined = combined._overwrite_indexes(indexes, variables=index_vars)
1000 combined = self._maybe_restore_empty_groups(combined)
1001 combined = self._maybe_unstack(combined)
TypeError: _overwrite_indexes() got an unexpected keyword argument 'variables'
Anything else we need to know?
In the docstring of DatasetGroupBy.map
it is not made clear that the passed function should return a dataset, but the opposite is also not said. This worked before and I think the issues comes from #5692, which introduced different signatures for DataArray._overwrite_indexes
(which is called in my case) and Dataset._overwrite_indexes
(which is expected by the new _combine
).
If the function passed to Dataset.resample(...).map
should only return Dataset
s then I believe a more explicit error is needed, as well as some notice in the docs and a breaking change entry in the changelog. If DataArray
s should be accepted, then we have a regression here.
I may have time to help on this.
Environment
INSTALLED VERSIONS
commit: None
python: 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:39:48)
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 5.16.13-arch1-1
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: fr_CA.utf8
LOCALE: ('fr_CA', 'UTF-8')
libhdf5: 1.12.0
libnetcdf: 4.7.4
xarray: 2022.3.1.dev16+g3ead17ea
pandas: 1.4.0
numpy: 1.20.3
scipy: 1.7.1
netCDF4: 1.5.7
pydap: None
h5netcdf: 0.11.0
h5py: 3.4.0
Nio: None
zarr: 2.10.0
cftime: 1.5.0
nc_time_axis: 1.3.1
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2021.08.0
distributed: 2021.08.0
matplotlib: 3.4.3
cartopy: None
seaborn: None
numbagg: None
fsspec: 2021.07.0
cupy: None
pint: 0.18
sparse: None
setuptools: 57.4.0
pip: 21.2.4
conda: None
pytest: 6.2.5
IPython: 8.0.1
sphinx: 4.1.2