Skip to content

Update assign_coords with a MultiIndex to match new Coordinates API #8039

Closed
@aulemahal

Description

@aulemahal

What is your issue?

A pattern we used in xclim (and elsewhere) seems to be broken on the master.

See MWE:

import pandas as pd
import xarray as xr

da = xr.DataArray([1] * 730, coords={"time": xr.date_range('1900-01-01', periods=730, freq='D', calendar='noleap')})
mulind = pd.MultiIndex.from_arrays((da.time.dt.year.values, da.time.dt.dayofyear.values), names=('year', 'doy'))

# Override previous time axis with new MultiIndex
da.assign_coords(time=mulind).unstack('time')

Now this works ok with both the current master and the latest release. However, if we chunk da, the last line now fails:

da.chunk(time=50).assign_coords(time=mulind).unstack('time')

On the master, this gives: ValueError: unmatched keys found in indexes and variables: {'year', 'doy'}

Full traceback:


ValueError Traceback (most recent call last)
Cell In[44], line 1
----> 1 da.chunk(time=50).assign_coords(time=mulind).unstack("time")

File ~/Projets/xarray/xarray/core/dataarray.py:2868, in DataArray.unstack(self, dim, fill_value, sparse)
2808 def unstack(
2809 self,
2810 dim: Dims = None,
2811 fill_value: Any = dtypes.NA,
2812 sparse: bool = False,
2813 ) -> DataArray:
2814 """
2815 Unstack existing dimensions corresponding to MultiIndexes into
2816 multiple new dimensions.
(...)
2866 DataArray.stack
2867 """
-> 2868 ds = self._to_temp_dataset().unstack(dim, fill_value, sparse)
2869 return self._from_temp_dataset(ds)

File ~/Projets/xarray/xarray/core/dataset.py:5481, in Dataset.unstack(self, dim, fill_value, sparse)
5479 for d in dims:
5480 if needs_full_reindex:
-> 5481 result = result._unstack_full_reindex(
5482 d, stacked_indexes[d], fill_value, sparse
5483 )
5484 else:
5485 result = result._unstack_once(d, stacked_indexes[d], fill_value, sparse)

File ~/Projets/xarray/xarray/core/dataset.py:5365, in Dataset._unstack_full_reindex(self, dim, index_and_vars, fill_value, sparse)
5362 else:
5363 # TODO: we may depreciate implicit re-indexing with a pandas.MultiIndex
5364 xr_full_idx = PandasMultiIndex(full_idx, dim)
-> 5365 indexers = Indexes(
5366 {k: xr_full_idx for k in index_vars},
5367 xr_full_idx.create_variables(index_vars),
5368 )
5369 obj = self._reindex(
5370 indexers, copy=False, fill_value=fill_value, sparse=sparse
5371 )
5373 for name, var in obj.variables.items():

File ~/Projets/xarray/xarray/core/indexes.py:1435, in Indexes.init(self, indexes, variables, index_type)
1433 unmatched_keys = set(indexes) ^ set(variables)
1434 if unmatched_keys:
-> 1435 raise ValueError(
1436 f"unmatched keys found in indexes and variables: {unmatched_keys}"
1437 )
1439 if any(not isinstance(idx, index_type) for idx in indexes.values()):
1440 index_type_str = f"{index_type.module}.{index_type.name}"

ValueError: unmatched keys found in indexes and variables: {'year', 'doy'}

This seems related to PR #7368.

The reason for the title of this issue is that in both versions, I now realize the da.assign_coords(time=mulind) prints as:

<xarray.DataArray (time: 730)>
dask.array<xarray-<this-array>, shape=(730,), dtype=int64, chunksize=(50,), chunktype=numpy.ndarray>
Coordinates:
  * time     (time) object MultiIndex

Something's fishy, because the two "sub" indexes are not showing.

And indeed, with the current master, I can get this to work by doing (again changing the last line):

da2 = xr.DataArray(da.data, coords=xr.Coordinates.from_pandas_multiindex(mulind, 'time'))
da2.chunk(time=50).unstack('time')

But it seems a bit odd to me that we need to reconstruct the DataArray to replace its coordinate with a "MultiIndex" one.

Thus, my questions are:

  1. How does one properly override a coordinate by a MultiIndex ? Is there a way to use assign_coords ? If not, then this issue would become a feature request.
  2. Is this a regression ? Or was I just "lucky" before ?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions