Description
What happened?
With the latest released version of Xarray, it is possible to delete all variables corresponding to a MultiIndex by simply deleting the name of the MultiIndex.
After the explicit indexes refactor (i.e,. using the "main" development branch) this now raises error about how this would "corrupt" index state. This comes up when using drop()
and assign_coords()
and possibly some other methods.
This is not hard to work around, but we may want to consider this bug a blocker for the next Xarray release. I found the issue surfaced in several projects when attempting to use the new version of Xarray inside Google's codebase.
CC @benbovy in case you have any thoughts to share.
What did you expect to happen?
For now, we should preserve the behavior of deleting the variables corresponding to MultiIndex levels, but should issue a deprecation warning encouraging users to explicitly delete everything.
Minimal Complete Verifiable Example
import xarray
array = xarray.DataArray(
[[1, 2], [3, 4]],
dims=['x', 'y'],
coords={'x': ['a', 'b']},
)
stacked = array.stack(z=['x', 'y'])
print(stacked.drop('z'))
print()
print(stacked.assign_coords(z=[1, 2, 3, 4]))
Relevant log output
ValueError Traceback (most recent call last)
Input In [1], in <cell line: 9>()
3 array = xarray.DataArray(
4 [[1, 2], [3, 4]],
5 dims=['x', 'y'],
6 coords={'x': ['a', 'b']},
7 )
8 stacked = array.stack(z=['x', 'y'])
----> 9 print(stacked.drop('z'))
10 print()
11 print(stacked.assign_coords(z=[1, 2, 3, 4]))
File ~/dev/xarray/xarray/core/dataarray.py:2425, in DataArray.drop(self, labels, dim, errors, **labels_kwargs)
2408 def drop(
2409 self,
2410 labels: Mapping = None,
(...)
2414 **labels_kwargs,
2415 ) -> DataArray:
2416 """Backward compatible method based on `drop_vars` and `drop_sel`
2417
2418 Using either `drop_vars` or `drop_sel` is encouraged
(...)
2423 DataArray.drop_sel
2424 """
-> 2425 ds = self._to_temp_dataset().drop(labels, dim, errors=errors)
2426 return self._from_temp_dataset(ds)
File ~/dev/xarray/xarray/core/dataset.py:4590, in Dataset.drop(self, labels, dim, errors, **labels_kwargs)
4584 if dim is None and (is_scalar(labels) or isinstance(labels, Iterable)):
4585 warnings.warn(
4586 "dropping variables using `drop` will be deprecated; using drop_vars is encouraged.",
4587 PendingDeprecationWarning,
4588 stacklevel=2,
4589 )
-> 4590 return self.drop_vars(labels, errors=errors)
4591 if dim is not None:
4592 warnings.warn(
4593 "dropping labels using list-like labels is deprecated; using "
4594 "dict-like arguments with `drop_sel`, e.g. `ds.drop_sel(dim=[labels]).",
4595 DeprecationWarning,
4596 stacklevel=2,
4597 )
File ~/dev/xarray/xarray/core/dataset.py:4549, in Dataset.drop_vars(self, names, errors)
4546 if errors == "raise":
4547 self._assert_all_in_dataset(names)
-> 4549 assert_no_index_corrupted(self.xindexes, names)
4551 variables = {k: v for k, v in self._variables.items() if k not in names}
4552 coord_names = {k for k in self._coord_names if k in variables}
File ~/dev/xarray/xarray/core/indexes.py:1394, in assert_no_index_corrupted(indexes, coord_names)
1392 common_names_str = ", ".join(f"{k!r}" for k in common_names)
1393 index_names_str = ", ".join(f"{k!r}" for k in index_coords)
-> 1394 raise ValueError(
1395 f"cannot remove coordinate(s) {common_names_str}, which would corrupt "
1396 f"the following index built from coordinates {index_names_str}:\n"
1397 f"{index}"
1398 )
ValueError: cannot remove coordinate(s) 'z', which would corrupt the following index built from coordinates 'z', 'x', 'y':
<xarray.core.indexes.PandasMultiIndex object at 0x148c95150>
Anything else we need to know?
No response
Environment
INSTALLED VERSIONS
commit: 33cdabd
python: 3.10.4 | packaged by conda-forge | (main, Mar 24 2022, 17:42:03) [Clang 12.0.1 ]
python-bits: 64
OS: Darwin
OS-release: 21.4.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1
xarray: 0.18.3.dev137+g96c56836
pandas: 1.4.2
numpy: 1.22.3
scipy: 1.8.0
netCDF4: 1.5.8
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.11.3
cftime: 1.6.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2022.04.1
distributed: 2022.4.1
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2022.3.0
cupy: None
pint: None
sparse: None
setuptools: 62.1.0
pip: 22.0.4
conda: None
pytest: 7.1.1
IPython: 8.2.0
sphinx: None