-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLN: type up core.groupby.grouper.get_grouper #29458
CLN: type up core.groupby.grouper.get_grouper #29458
Conversation
a6f53cf
to
18f1516
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool thanks for doing this
pandas/core/groupby/grouper.py
Outdated
@@ -438,9 +438,9 @@ def _get_grouper( | |||
observed=False, | |||
mutated=False, | |||
validate=True, | |||
): | |||
) -> Tuple[BaseGrouper, List[Hashable], NDFrame]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of NDFrame should use FrameOrSeries
which will ensure this function is generic; should also change obj
a few lines up from NDFrame
to FrameOrSeries` (unless this does for whatever reason interchange between Series and DataFrame objects)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure FrameOrSeries
works as intended? I can't get it to work in PyCharm, so I opted to use NDFrame
, so I could get the type hints from NDFrame. at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not clear on the error you see in PyCharm but for sure we use it elsewhere. You would probably need to change the annotation for obj
in this func
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed it to use FrameOrSeries
@@ -517,7 +517,7 @@ def _get_grouper( | |||
if key.key is None: | |||
return grouper, [], obj | |||
else: | |||
return grouper, {key.key}, obj | |||
return grouper, [key.key], obj |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So is there no impact changing this from a set to a list? If not, does it even need to be a container?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This just makes the types consistent for all returns. A set or a list is not important when there's only 1 item in the container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW It sometimes has lists with several items, so needs to be a list/container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm not sure I follow. Maybe better to ask what the error was that mypy was giving? I assume this gets mutated outside of the function right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, it wasn't a mypy error, I think the function should have a clear return type, so List
rather than Contaioner
.
(More general): I think function may have broad parameter types, but should strive to have clear return types, so there's less ambiguity over what you're looking at. Don't you agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Yea I'm not necessarily saying the old container type was correct and this is wrong, I guess was just questioning why this even needs to be held in a container if it's simply one value being returned.
Can be refactored or reviewed in a follow up if that's a big effort; not a blocker as is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm haven't checked if a single value could be returned, instead of the list.
My motivation for doing this is mostly wanting to pin down what I'm looking at in groupBy
, Grouper
etc. I think "real" clean-ups must come later. It's quite complex to me now, and I'm hoping these typeups will ease understanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Yea this part of code is tricky and could use some refactor, so this is all moving in the right direction
pandas/core/groupby/grouper.py
Outdated
@@ -573,7 +571,7 @@ def _get_grouper( | |||
all_in_columns_index = all( | |||
g in obj.columns or g in obj.index.names for g in keys | |||
) | |||
elif isinstance(obj, Series): | |||
else: # Series |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm is there a reason to remove this isinstance check? Seems to loosen the type checker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the reverse problem where no else clause made the type unknown. An alternative would be to do:
elif isinstance(obj, Series):
....
else:
raise TypeError(...)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the next line accesses obj.index, so implicitly pins it down. the comment above is fine or an assertion if that helps mypy out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I read you comment clearly. The chosen solution os ok for you ( if not preferred:-))?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think assert isinstance(obj, Series) within the branch should work then. Sounds like it doesn't make much of a difference with what is hitting this path now, but the original intent is definitely only for Series objects to make it down this path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not checked, but I assume that mypy can't perform type narrowing between a DataFrame and Series since we use NDFrame as the bound for FrameOrSeries.
so we know that if obj is not a DataFrame then it must be a Series, but mypy can't know that.
if it's a public method, and we don't have control of the calling types then #29458 (comment) makes sense
@@ -3,7 +3,7 @@ | |||
split-apply-combine paradigm. | |||
""" | |||
|
|||
from typing import Optional, Tuple | |||
from typing import Hashable, List, Optional, Tuple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are some places in these modules where name
attrs are annotated as str
or Optional[str]
where Hashable would be more accurate I think
pandas/core/groupby/grouper.py
Outdated
@@ -573,7 +571,7 @@ def _get_grouper( | |||
all_in_columns_index = all( | |||
g in obj.columns or g in obj.index.names for g in keys | |||
) | |||
elif isinstance(obj, Series): | |||
else: # Series |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think assert isinstance(obj, Series) within the branch should work then. Sounds like it doesn't make much of a difference with what is hitting this path now, but the original intent is definitely only for Series objects to make it down this path
44f70c8
to
015a223
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm @simonjayhawkins
Thanks @topper-123 |
Add types + make some minor cleanups.