Description
What is your issue?
A pattern we used in xclim
(and elsewhere) seems to be broken on the master.
See MWE:
import pandas as pd
import xarray as xr
da = xr.DataArray([1] * 730, coords={"time": xr.date_range('1900-01-01', periods=730, freq='D', calendar='noleap')})
mulind = pd.MultiIndex.from_arrays((da.time.dt.year.values, da.time.dt.dayofyear.values), names=('year', 'doy'))
# Override previous time axis with new MultiIndex
da.assign_coords(time=mulind).unstack('time')
Now this works ok with both the current master and the latest release. However, if we chunk da
, the last line now fails:
da.chunk(time=50).assign_coords(time=mulind).unstack('time')
On the master, this gives: ValueError: unmatched keys found in indexes and variables: {'year', 'doy'}
Full traceback:
ValueError Traceback (most recent call last)
Cell In[44], line 1
----> 1 da.chunk(time=50).assign_coords(time=mulind).unstack("time")
File ~/Projets/xarray/xarray/core/dataarray.py:2868, in DataArray.unstack(self, dim, fill_value, sparse)
2808 def unstack(
2809 self,
2810 dim: Dims = None,
2811 fill_value: Any = dtypes.NA,
2812 sparse: bool = False,
2813 ) -> DataArray:
2814 """
2815 Unstack existing dimensions corresponding to MultiIndexes into
2816 multiple new dimensions.
(...)
2866 DataArray.stack
2867 """
-> 2868 ds = self._to_temp_dataset().unstack(dim, fill_value, sparse)
2869 return self._from_temp_dataset(ds)
File ~/Projets/xarray/xarray/core/dataset.py:5481, in Dataset.unstack(self, dim, fill_value, sparse)
5479 for d in dims:
5480 if needs_full_reindex:
-> 5481 result = result._unstack_full_reindex(
5482 d, stacked_indexes[d], fill_value, sparse
5483 )
5484 else:
5485 result = result._unstack_once(d, stacked_indexes[d], fill_value, sparse)
File ~/Projets/xarray/xarray/core/dataset.py:5365, in Dataset._unstack_full_reindex(self, dim, index_and_vars, fill_value, sparse)
5362 else:
5363 # TODO: we may depreciate implicit re-indexing with a pandas.MultiIndex
5364 xr_full_idx = PandasMultiIndex(full_idx, dim)
-> 5365 indexers = Indexes(
5366 {k: xr_full_idx for k in index_vars},
5367 xr_full_idx.create_variables(index_vars),
5368 )
5369 obj = self._reindex(
5370 indexers, copy=False, fill_value=fill_value, sparse=sparse
5371 )
5373 for name, var in obj.variables.items():
File ~/Projets/xarray/xarray/core/indexes.py:1435, in Indexes.init(self, indexes, variables, index_type)
1433 unmatched_keys = set(indexes) ^ set(variables)
1434 if unmatched_keys:
-> 1435 raise ValueError(
1436 f"unmatched keys found in indexes and variables: {unmatched_keys}"
1437 )
1439 if any(not isinstance(idx, index_type) for idx in indexes.values()):
1440 index_type_str = f"{index_type.module}.{index_type.name}"
ValueError: unmatched keys found in indexes and variables: {'year', 'doy'}
This seems related to PR #7368.
The reason for the title of this issue is that in both versions, I now realize the da.assign_coords(time=mulind)
prints as:
<xarray.DataArray (time: 730)>
dask.array<xarray-<this-array>, shape=(730,), dtype=int64, chunksize=(50,), chunktype=numpy.ndarray>
Coordinates:
* time (time) object MultiIndex
Something's fishy, because the two "sub" indexes are not showing.
And indeed, with the current master, I can get this to work by doing (again changing the last line):
da2 = xr.DataArray(da.data, coords=xr.Coordinates.from_pandas_multiindex(mulind, 'time'))
da2.chunk(time=50).unstack('time')
But it seems a bit odd to me that we need to reconstruct the DataArray to replace its coordinate with a "MultiIndex" one.
Thus, my questions are:
- How does one properly override a coordinate by a MultiIndex ? Is there a way to use
assign_coords
? If not, then this issue would become a feature request. - Is this a regression ? Or was I just "lucky" before ?
Metadata
Metadata
Assignees
Type
Projects
Status