Skip to content

Typ parts of c parser #44677

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Dec 22, 2021
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Improve callable
  • Loading branch information
phofl committed Dec 1, 2021
commit 95a0de0cc11d242417c7b10f17cb94aae2c7db9d
4 changes: 2 additions & 2 deletions pandas/io/parsers/base_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -939,7 +939,7 @@ def _check_data_length(

@overload
def _evaluate_usecols(
self, usecols: set[int] | Callable, names: Sequence[Hashable]
self, usecols: set[int] | Callable[[Hashable], int], names: Sequence[Hashable]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can the callable return a bool?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the return values are indices

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry should have been more clear.

I'm curious as to why usecols is a callable with an int return type.

ParserBase.__init__ is not typed, so it is not clear to me why this is int. bool is type compatible here and surely any truthy value is valid. So the return type of the usecols callable is whatever the public api accepts. This is any object that can be truthy?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah got you, yeah you are correct, this has to return a bool

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the docs for say read_csv https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.read_csv.html

If callable, the callable function will be evaluated against the column names, returning names where the callable function evaluates to True.

So doesn't explicitly say it should be bool, but why has int been chosen?

Copy link
Member Author

@phofl phofl Dec 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I minsinterpreted the return value of the callable as the return value of _evaluate_use_cols. Confused both with eacht other

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure. we do use bool in many places in the public api where the api probably accepts any object that can be truthy. This may become an issue for users when the types are public and users start getting false positives when type checking.

so changing to bool is fine for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically this could return anything, which evaluates to True/False. But since the docs say that this has to evaluate to True, I think we can type it like this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the doc-string (I didn't look at the code), it should probably be Callable[[Hashable], object].

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to object

) -> set[int]:
...

Expand All @@ -951,7 +951,7 @@ def _evaluate_usecols(

def _evaluate_usecols(
self,
usecols: Callable | set[str] | set[int],
usecols: Callable[[Hashable], int] | set[str] | set[int],
names: Sequence[Hashable],
) -> set[str] | set[int]:
"""
Expand Down