Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH203 Split groupby with as_index (temptative) #1014

Merged
merged 9 commits into from
Oct 31, 2024

Conversation

loicdiridollou
Copy link
Contributor

@loicdiridollou
Copy link
Contributor Author

I am still having overlap issues in the overload but looking into it, feel free to suggest something else.
Right now just done for the Scalar but can/should be expanded to the other types.

Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start, but I think you will have to work on the following:

@@ -1048,7 +1066,7 @@ def test_types_groupby() -> None:

df1: pd.DataFrame = df.groupby(by="col1").agg("sum")
df2: pd.DataFrame = df.groupby(level="ind").aggregate("sum")
df3: pd.DataFrame = df.groupby(by="col1", sort=False, as_index=True).transform(
df3: pd.Series = df.groupby(by="col1", sort=False, as_index=True).transform(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the nature of this change, can you change this test to use the check(assert_type(... pattern?

We have old tests that haven't been converted - this is a good opportunity to convert them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I will tackle those once we agree on the potential solution so I can do one thing at a time not to confuse the amount of code to review too much.

pandas-stubs/core/frame.pyi Outdated Show resolved Hide resolved
@loicdiridollou
Copy link
Contributor Author

Seems like this is going to take a little longer, I think I understand your suggestion in #203, let me try to come up with something.

@loicdiridollou loicdiridollou requested a review from Dr-Irv October 16, 2024 01:13
Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hope my comments help. Note - I will be not be able to give feedback until after Wed., October 23.

pandas-stubs/core/frame.pyi Outdated Show resolved Hide resolved
pandas-stubs/core/frame.pyi Outdated Show resolved Hide resolved
pandas-stubs/core/groupby/groupby.pyi Outdated Show resolved Hide resolved
Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is looking good, but CI is failing due to overlapping overload. You'll have to use # type: ignore statements to get around that.

@loicdiridollou loicdiridollou requested a review from Dr-Irv October 31, 2024 21:32
@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Oct 31, 2024

So I am fine with this. Do you want me to merge it so that size() is fixed? Or do you want to work on other methods where the type of the result depends on as_index parameter?

@loicdiridollou
Copy link
Contributor Author

I will open another issue to track down the other methods, would rather merge small changes than one large PR.

Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dr-Irv Dr-Irv merged commit 53c299f into pandas-dev:main Oct 31, 2024
10 checks passed
@loicdiridollou loicdiridollou deleted the gh203_groupby branch December 14, 2024 19:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: df.groupby(col, as_index=False).value_counts() returns a DataFrame but is annotated as Series
2 participants