Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN: type up core.groupby.grouper.get_grouper #29458

Merged
merged 2 commits into from
Nov 8, 2019

Conversation

topper-123
Copy link
Contributor

Add types + make some minor cleanups.

@topper-123 topper-123 added Clean Typing type annotations, mypy/pyright type checking labels Nov 7, 2019
@topper-123 topper-123 added this to the 1.0 milestone Nov 7, 2019
@topper-123 topper-123 force-pushed the clean_get_grouper branch 3 times, most recently from a6f53cf to 18f1516 Compare November 7, 2019 15:53
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool thanks for doing this

@@ -438,9 +438,9 @@ def _get_grouper(
observed=False,
mutated=False,
validate=True,
):
) -> Tuple[BaseGrouper, List[Hashable], NDFrame]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of NDFrame should use FrameOrSeries which will ensure this function is generic; should also change obj a few lines up from NDFrame to FrameOrSeries` (unless this does for whatever reason interchange between Series and DataFrame objects)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure FrameOrSeries works as intended? I can't get it to work in PyCharm, so I opted to use NDFrame, so I could get the type hints from NDFrame. at least.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear on the error you see in PyCharm but for sure we use it elsewhere. You would probably need to change the annotation for obj in this func

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed it to use FrameOrSeries

@@ -517,7 +517,7 @@ def _get_grouper(
if key.key is None:
return grouper, [], obj
else:
return grouper, {key.key}, obj
return grouper, [key.key], obj
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is there no impact changing this from a set to a list? If not, does it even need to be a container?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just makes the types consistent for all returns. A set or a list is not important when there's only 1 item in the container.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW It sometimes has lists with several items, so needs to be a list/container.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm not sure I follow. Maybe better to ask what the error was that mypy was giving? I assume this gets mutated outside of the function right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it wasn't a mypy error, I think the function should have a clear return type, so List rather than Contaioner.

(More general): I think function may have broad parameter types, but should strive to have clear return types, so there's less ambiguity over what you're looking at. Don't you agree?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Yea I'm not necessarily saying the old container type was correct and this is wrong, I guess was just questioning why this even needs to be held in a container if it's simply one value being returned.

Can be refactored or reviewed in a follow up if that's a big effort; not a blocker as is

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm haven't checked if a single value could be returned, instead of the list.

My motivation for doing this is mostly wanting to pin down what I'm looking at in groupBy, Grouper etc. I think "real" clean-ups must come later. It's quite complex to me now, and I'm hoping these typeups will ease understanding.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Yea this part of code is tricky and could use some refactor, so this is all moving in the right direction

@@ -573,7 +571,7 @@ def _get_grouper(
all_in_columns_index = all(
g in obj.columns or g in obj.index.names for g in keys
)
elif isinstance(obj, Series):
else: # Series
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm is there a reason to remove this isinstance check? Seems to loosen the type checker

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the reverse problem where no else clause made the type unknown. An alternative would be to do:

elif isinstance(obj, Series):
    ....
else:
  raise TypeError(...)

?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the next line accesses obj.index, so implicitly pins it down. the comment above is fine or an assertion if that helps mypy out

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I read you comment clearly. The chosen solution os ok for you ( if not preferred:-))?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think assert isinstance(obj, Series) within the branch should work then. Sounds like it doesn't make much of a difference with what is hitting this path now, but the original intent is definitely only for Series objects to make it down this path

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not checked, but I assume that mypy can't perform type narrowing between a DataFrame and Series since we use NDFrame as the bound for FrameOrSeries.

so we know that if obj is not a DataFrame then it must be a Series, but mypy can't know that.

if it's a public method, and we don't have control of the calling types then #29458 (comment) makes sense

@@ -3,7 +3,7 @@
split-apply-combine paradigm.
"""

from typing import Optional, Tuple
from typing import Hashable, List, Optional, Tuple
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are some places in these modules where name attrs are annotated as str or Optional[str] where Hashable would be more accurate I think

@@ -573,7 +571,7 @@ def _get_grouper(
all_in_columns_index = all(
g in obj.columns or g in obj.index.names for g in keys
)
elif isinstance(obj, Series):
else: # Series
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think assert isinstance(obj, Series) within the branch should work then. Sounds like it doesn't make much of a difference with what is hitting this path now, but the original intent is definitely only for Series objects to make it down this path

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonjayhawkins simonjayhawkins merged commit 67ee16a into pandas-dev:master Nov 8, 2019
@simonjayhawkins
Copy link
Member

Thanks @topper-123

@topper-123 topper-123 deleted the clean_get_grouper branch November 8, 2019 14:58
Reksbril pushed a commit to Reksbril/pandas that referenced this pull request Nov 18, 2019
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants