Skip to content

Dropping a MultiIndex variable raises an error after explicit indexes refactor #6505

@shoyer

Description

@shoyer

What happened?

With the latest released version of Xarray, it is possible to delete all variables corresponding to a MultiIndex by simply deleting the name of the MultiIndex.

After the explicit indexes refactor (i.e,. using the "main" development branch) this now raises error about how this would "corrupt" index state. This comes up when using drop() and assign_coords() and possibly some other methods.

This is not hard to work around, but we may want to consider this bug a blocker for the next Xarray release. I found the issue surfaced in several projects when attempting to use the new version of Xarray inside Google's codebase.

CC @benbovy in case you have any thoughts to share.

What did you expect to happen?

For now, we should preserve the behavior of deleting the variables corresponding to MultiIndex levels, but should issue a deprecation warning encouraging users to explicitly delete everything.

Minimal Complete Verifiable Example

import xarray

array = xarray.DataArray(
    [[1, 2], [3, 4]],
    dims=['x', 'y'],
    coords={'x': ['a', 'b']},
)
stacked = array.stack(z=['x', 'y'])
print(stacked.drop('z'))
print()
print(stacked.assign_coords(z=[1, 2, 3, 4]))

Relevant log output

ValueError                                Traceback (most recent call last)
Input In [1], in <cell line: 9>()
      3 array = xarray.DataArray(
      4     [[1, 2], [3, 4]],
      5     dims=['x', 'y'],
      6     coords={'x': ['a', 'b']},
      7 )
      8 stacked = array.stack(z=['x', 'y'])
----> 9 print(stacked.drop('z'))
     10 print()
     11 print(stacked.assign_coords(z=[1, 2, 3, 4]))

File ~/dev/xarray/xarray/core/dataarray.py:2425, in DataArray.drop(self, labels, dim, errors, **labels_kwargs)
   2408 def drop(
   2409     self,
   2410     labels: Mapping = None,
   (...)
   2414     **labels_kwargs,
   2415 ) -> DataArray:
   2416     """Backward compatible method based on `drop_vars` and `drop_sel`
   2417
   2418     Using either `drop_vars` or `drop_sel` is encouraged
   (...)
   2423     DataArray.drop_sel
   2424     """
-> 2425     ds = self._to_temp_dataset().drop(labels, dim, errors=errors)
   2426     return self._from_temp_dataset(ds)

File ~/dev/xarray/xarray/core/dataset.py:4590, in Dataset.drop(self, labels, dim, errors, **labels_kwargs)
   4584 if dim is None and (is_scalar(labels) or isinstance(labels, Iterable)):
   4585     warnings.warn(
   4586         "dropping variables using `drop` will be deprecated; using drop_vars is encouraged.",
   4587         PendingDeprecationWarning,
   4588         stacklevel=2,
   4589     )
-> 4590     return self.drop_vars(labels, errors=errors)
   4591 if dim is not None:
   4592     warnings.warn(
   4593         "dropping labels using list-like labels is deprecated; using "
   4594         "dict-like arguments with `drop_sel`, e.g. `ds.drop_sel(dim=[labels]).",
   4595         DeprecationWarning,
   4596         stacklevel=2,
   4597     )

File ~/dev/xarray/xarray/core/dataset.py:4549, in Dataset.drop_vars(self, names, errors)
   4546 if errors == "raise":
   4547     self._assert_all_in_dataset(names)
-> 4549 assert_no_index_corrupted(self.xindexes, names)
   4551 variables = {k: v for k, v in self._variables.items() if k not in names}
   4552 coord_names = {k for k in self._coord_names if k in variables}

File ~/dev/xarray/xarray/core/indexes.py:1394, in assert_no_index_corrupted(indexes, coord_names)
   1392 common_names_str = ", ".join(f"{k!r}" for k in common_names)
   1393 index_names_str = ", ".join(f"{k!r}" for k in index_coords)
-> 1394 raise ValueError(
   1395     f"cannot remove coordinate(s) {common_names_str}, which would corrupt "
   1396     f"the following index built from coordinates {index_names_str}:\n"
   1397     f"{index}"
   1398 )

ValueError: cannot remove coordinate(s) 'z', which would corrupt the following index built from coordinates 'z', 'x', 'y':
<xarray.core.indexes.PandasMultiIndex object at 0x148c95150>

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: 33cdabd
python: 3.10.4 | packaged by conda-forge | (main, Mar 24 2022, 17:42:03) [Clang 12.0.1 ]
python-bits: 64
OS: Darwin
OS-release: 21.4.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1

xarray: 0.18.3.dev137+g96c56836
pandas: 1.4.2
numpy: 1.22.3
scipy: 1.8.0
netCDF4: 1.5.8
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.11.3
cftime: 1.6.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2022.04.1
distributed: 2022.4.1
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2022.3.0
cupy: None
pint: None
sparse: None
setuptools: 62.1.0
pip: 22.0.4
conda: None
pytest: 7.1.1
IPython: 8.2.0
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions