Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: MultiIndex.names|codes|levels returns tuples #57042

Merged
merged 13 commits into from
Feb 7, 2024

Conversation

mroeschke
Copy link
Member

@mroeschke mroeschke added this to the 3.0 milestone Jan 26, 2024
@WillAyd
Copy link
Member

WillAyd commented Jan 30, 2024

Just playing devil's advocate but why would we prefer tuple here over Index? The latter is what we return from DataFrame.columns which maybe should harmonize with this

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in spite of my comments this lgtm

@@ -2925,7 +2931,9 @@ def _partial_tup_index(self, tup: tuple, side: Literal["left", "right"] = "left"
if lab not in lev and not isna(lab):
# short circuit
try:
loc = algos.searchsorted(lev, lab, side=side)
# Argument 1 to "searchsorted" has incompatible type "Index";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is surprising to see?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think our typing isn't entirely correct somewhere

@mroeschke
Copy link
Member Author

mroeschke commented Jan 30, 2024

Just playing devil's advocate but why would we prefer tuple here over Index?

I picked tuple to align with the prior FrozenList which seemed to act like an immutable list. If we were to go with Index I think each would have to be `Index[object]

  • MultiIndex.codes would be an Index of ndarrays
  • MultiIndex.levels would be an Index of Indexes
  • MultiIndex.names would be an Index of object/inferred dtype which would be reasonable.

But yeah the biggest thing is that the returned object should not be able to be resized.

@@ -143,7 +143,7 @@ the columns except the one we specify:
.. ipython:: python

df2 = df.set_index(["A", "B"])
grouped = df2.groupby(level=df2.index.names.difference(["B"]))
grouped = df2.groupby(level="A")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line leading into this is:

If we also have a MultiIndex on columns A and B, we can group by all
the columns except the one we specify:

If we want to preserve this, one can do [e for e in df2.index.names if e not in {"A"}]. But I'd also be okay to remove the entire example.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried taking union and difference here might be a common use case that we're breaking. I'm okay with removing the feature, but we could deprecate uses of these methods to make the breaking change a bit more soft. Of course, with our deprecation policy that would mean punting this off for some time.

Copy link
Member Author

@mroeschke mroeschke Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. If you feel strongly about it we can go through a deprecation, but the fact that FrozenList was never really publically documented (except indirectly here with this example) makes me OK with breaking this use case.

As a data point, in cudf uses FrozenList in cudf.MultiIndex to match return types but has no usage of FrozenList.union/difference

xref #44823

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not feel strongly here. Okay to go forward in my opinion.

@mroeschke
Copy link
Member Author

Will merge this week unless there's any objections

@rhshadrach
Copy link
Member

@mroeschke - just the request to fix the docs: #57042 (comment)

@mroeschke
Copy link
Member Author

@mroeschke - just the request to fix the docs: #57042 (comment)

Ah thanks for the reminder. Removed

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mroeschke mroeschke merged commit 99e3afe into pandas-dev:main Feb 7, 2024
47 checks passed
@mroeschke mroeschke deleted the ref/mi/tuples branch February 7, 2024 04:52
@jorisvandenbossche
Copy link
Member

This is quite a breaking change .. I know you knew this and therefore kept it for 3.0, but I still do wonder if this is worth the breakage, and if we really want to change it, whether we couldn't first deprecate some aspects of it.

The breaking change I ran into with geopandas, is the fact that a tuple cannot be concatenated like a list. We actually used the same pattern in pandas itself, as the diff in this PR has this change in the groupby code:

-        mi = MultiIndex(levels=levels, codes=codes, names=idx.names + [None])
+        mi = MultiIndex(levels=levels, codes=codes, names=list(idx.names) + [None])

I can imagine quite some external libraries use this when manipulating index / multi-index objects.

We could first deprecate those methods on FrozenList that won't work anymore for tuple (I see @rhshadrach mentioned the same on the issue for the difference/union methods)

pmhatre1 pushed a commit to pmhatre1/pandas-pmhatre1 that referenced this pull request May 7, 2024
* MultiIndex.names|codes|levels returns tuples

* Fix typing

* Add whatsnew note

* Fix stacking

* Fix doctest, test

* Fix other test

* Remove example
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API: Change MultiIndex.levels/codes to use tuple instead of FrozenList?
4 participants