Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeIs annotation for pands.isna is not quite accurate #1009

Open
steve-mavens opened this issue Sep 26, 2024 · 10 comments
Open

TypeIs annotation for pands.isna is not quite accurate #1009

steve-mavens opened this issue Sep 26, 2024 · 10 comments

Comments

@steve-mavens
Copy link

steve-mavens commented Sep 26, 2024

Describe the bug

Recently #972 introduced a TypeIs annotation for pd.isna. The annotation states that the function returns True if and only if the input value is one of the types NaTType | NAType | None. However, this is not correct, since the function also returns True for the value float('nan'), which is of type float.

To Reproduce

  1. Provide a minimal runnable pandas example that is not properly checked by the stubs.
from pandas import isna

def main() -> None:
   foo: float = float('nan')
   if isna(foo) and 1 == 1:
       print('not a number')
   else:
       print('is a number')

if __name__ == '__main__':
    main()

The output is not a number, but because of the TypeIs annotation, mypy thinks that line is unreachable

  1. Indicate which type checker you are using (mypy or pyright).

mypy 1.10.1

  1. Show the error message received from that type checker while checking your example.

type_test.py:6: error: Right operand of "and" is never evaluated [unreachable]
if isna(foo) and 1 == 1:
^~~~~~
type_test.py:7: error: Statement is unreachable [unreachable]
print('not a number')
^~~~~~~~~~~~~~~~~~~~~
Found 2 errors in 1 file (checked 1 source file)

Please complete the following information:

  • OS: [e.g. Windows, Linux, MacOS]: Windows
  • OS Version [e.g. 22]: 11
  • python version: 3.11.4
  • version of type checker: 1.10.1
  • version of installed pandas-stubs: 2.2.2.240909

Additional context
Add any other context about the problem here.

@gandhis1
Copy link
Contributor

gandhis1 commented Sep 26, 2024

Python's type system does not provide a way to discriminate between a regular float and a NaN float. I would argue that the current implementation does the most intuitive thing in most cases at the expense of a relatively rare use case. Anecdotally, I have found None, np.nan, and pd.NA to be more common.

We could update to include the Literal[float('NaN')] in the overloads, but (1) I'm not sure an instantiation like that is actually supported and (2) it may not even solve the issue, if type checkers infer it as a regular float.

@steve-mavens
Copy link
Author

Yes, I think the basic problem here is that TypeIs requires a clean division of types. It doesn't allow for a function that takes input type A | B | C and returns True for all A and some values of C, and False for all B and the remaining values of C.

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Sep 26, 2024

Thanks for the report. I'm not sure what we can do here, because we can't specify float("nan") or np.nan as a Literal.

@steve-mavens
Copy link
Author

steve-mavens commented Sep 26, 2024

One workaround for users would be math.isnan(x) if isinstance(x, float) else pd.isna(x), but obviously there's some performance implications there. It's probably better to suppress the unreachable warning than take the hit.

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Sep 26, 2024

We could update to include the Literal[float('NaN')] in the overloads, but (1) I'm not sure an instantiation like that is actually supported and (2) it may not even solve the issue, if type checkers infer it as a regular float.

I thought that might work, but the typing system doesn't support it. Found this discussion:
python/typing#1160

Yes, I think the basic problem here is that TypeIs requires a clean division of types. It doesn't allow for a function that takes input type A | B | C and returns True for all A and some values of C, and False for all B and the remaining values of C.

Well, it might work if we could do Literal[float("nan")], since that would "divide" the types, although I'm not sure if the type checkers are handling that.

@steve-mavens
Copy link
Author

steve-mavens commented Sep 26, 2024

Also, there are allowed to be float NaN values distinct from float('nan'), so it could still go wrong in even rarer cases.

Is there a way to import pandas.api.typing.NAType (or pandas.libs._missing.NAType) that the stubs approve of? The reason I found this quirk of the type hint is actually because of a bug in my real code. I have a variable taken from an unpacked row of a DataFrame, and I typed it as float, when it could also be pd.NA.

@steve-mavens
Copy link
Author

steve-mavens commented Sep 26, 2024

So, another workaround is to make sure variables/expressions that might contain a float NaN are typed as float | NAType, or float | None, so that mypy doesn't think the return value of pd.isna is guaranteed False by the input type.

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Sep 26, 2024

Is there a way to import pandas.api.typing.NAType (or pandas.libs._missing.NAType) that the stubs approve of?

Right now, if you do from pandas._libs.missing import NAType , that should work.

We're behind on updating the stubs for all 2.x support, which means that pandas.api.typing doesn't exist yet in the stubs.

@steve-mavens
Copy link
Author

steve-mavens commented Sep 26, 2024

Great, thanks for your help. I'll probably use None until NAType is public in the stubs, since either way I just need to tell mypy that pd.isna might return True.

@gboeing
Copy link

gboeing commented Sep 27, 2024

So, another workaround is to make sure variables/expressions that might contain a float NaN are typed as float | NAType, or float | None, so that mypy doesn't think the return value of pd.isna is guaranteed False by the input type.

I ran into the same issue today, and resolved it by typing my variable like value: float | None

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants