Skip to content

groubpy on array with multiindex renames indices #6313

Closed
@headtr1ck

Description

@headtr1ck

What happened?

When grouping and reducing an array or dataset over a multi-index the coordinates that make up the multi-index get renamed to "{name_of_multiindex}_level_{i}".

It only works correctly when the Multiindex is a "homogenous grid", i.e. as obtained by stacking.

What did you expect to happen?

I expect that all coordinates keep their initial names.

Minimal Complete Verifiable Example

import xarray as xr

# this works:

d = xr.DataArray(range(4), dims="t", coords={"x": ("t", [0, 0, 1, 1]), "y": ("t", [0, 1, 0, 1])})
dd = d.set_index({"t": ["x", "y"]})
# returns
# <xarray.DataArray (t: 4)>
# array([0, 1, 2, 3])
# Coordinates:
#   * t        (t) MultiIndex
#   - x        (t) int64 0 0 1 1
#   - y        (t) int64 0 1 0 1

dd.groupby("t").mean(...)
# returns
# <xarray.DataArray (t: 4)>
# array([0., 1., 2., 3.])
# Coordinates:
#   * t        (t) MultiIndex
#   - x        (t) int64 0 0 1 1
#   - y        (t) int64 0 1 0 1


# this does not work
d2 = xr.DataArray(range(6), dims="t", coords={"x": ("t", [0, 0, 1, 1, 0, 1]), "y": ("t", [0, 1, 0, 1, 0, 0])})
dd2 = d2.set_index({"t": ["x", "y"]})
# returns
# <xarray.DataArray (t: 6)>
# array([0, 1, 2, 3, 4, 5])
# Coordinates:
#   * t        (t) MultiIndex
#   - x        (t) int64 0 0 1 1 0 1
#   - y        (t) int64 0 1 0 1 0 0

dd2.groupby("t").mean(...)
# returns
# <xarray.DataArray (t: 4)>
# array([2. , 1. , 3.5, 3. ])
# Coordinates:
#   * t          (t) MultiIndex
#   - t_level_0  (t) int64 0 0 1 1
#   - t_level_1  (t) int64 0 1 0 1

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None
python: 3.9.1 (default, Jan 13 2021, 15:21:08)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)]
python-bits: 64
OS: Linux
OS-release: 3.10.0-1160.49.1.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.0
libnetcdf: 4.7.4

xarray: 0.21.1
pandas: 1.4.0
numpy: 1.21.5
scipy: 1.7.3
netCDF4: 1.5.8
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.5.1.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.5.1
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
setuptools: 49.2.1
pip: 22.0.3
conda: None
pytest: 6.2.5
IPython: 8.0.0
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions