Skip to content

Deprecate the multi-index dimension coordinate #8143

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

benbovy
Copy link
Member

@benbovy benbovy commented Sep 4, 2023

  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst

This PR adds a future_no_mindex_dim_coord=False option that, if set to True, enables the future behavior of PandasMultiIndex (i.e., no added dimension coordinate with tuple values):

import xarray as xr

ds = xr.Dataset(coords={"x": ["a", "b"], "y": [1, 2]})

ds.stack(z=["x", "y"])

# <xarray.Dataset>
# Dimensions:  (z: 4)
# Coordinates:
#   * z        (z) object MultiIndex
#   * x        (z) <U1 'a' 'a' 'b' 'b'
#   * y        (z) int64 1 2 1 2
# Data variables:
#     *empty*

with xr.set_options(future_no_mindex_dim_coord=True):
    ds.stack(z=["x", "y"])

# <xarray.Dataset>
# Dimensions:  (z: 4)
# Coordinates:
#   * x        (z) <U1 'a' 'a' 'b' 'b'
#   * y        (z) int64 1 2 1 2
# Dimensions without coordinates: z
# Data variables:
#     *empty*

There are a few other things that we'll need to adapt or deprecate:

  • Dropping multi-index dimension coordinate de-facto allows having several multi-indexes along the same dimension. Normally stack should already take this into account, but there may be other places where this is not yet supported or where we should raise an explicit error.
  • Deprecate Dataset.reorder_levels: API is not compatible with the absence of dimension coordinate and several multi-indexes along the same dimension. I think it is OK to deprecate such edge case, which alternatively could be done by extracting the pandas index, updating it and then re-assign it to a the dataset with assign_coords(xr.Coordinates.from_pandas_multiindex(...))
  • The text-based repr: in the example above, Dimensions without coordinate: z doesn't make much sense
  • ... ?

I started updating the tests, although this will be much easier once #8140 is merged. This is something that we could also easily split into multiple PRs. It is probably OK if some features are (temporarily) breaking badly when setting future_no_mindex_dim_coord=True.

- This is rather an edge case: same result can be done with a couple of
extra steps (update the pandas index directly and re-assign it)

- API won't be compatible with possibly several multi-indexes along the
same dimension with no dimension coordinate
@max-sixty
Copy link
Collaborator

I've been trying to use .set_xindex more, and rely less on MultiIndexes. It's overall worked really well! I do think there's a better world just over the horizon...

One thing I haven't managed to do is .unstack without MultiIndexes — lmk if I'm missing something. There's a possible API like .unstack("foo", "bar") which asserts foo & bar are indexes along the same dimensions, and unstacks them, without needing a multiindex...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In progress
Development

Successfully merging this pull request may close these issues.

2 participants