Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: When done as part of pd.Series, NaN | True is False which contradicts logic #51267

Open
3 tasks done
corneliusroemer opened this issue Feb 9, 2023 · 2 comments
Open
3 tasks done
Labels
Bug Numeric Operations Arithmetic, Comparison, and Logical operations

Comments

@corneliusroemer
Copy link

corneliusroemer commented Feb 9, 2023

Pandas version checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandas.
  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
df = pd.DataFrame([{"a": True}, {"b":True}])
df["a"]|df["b"]

Issue Description

The result of a bitwise or operator (| or pd.Series.__or__()) should not depend on the order of the operands. Or should be commutative.

But in this case, when one of the operands is NaN, pandas violates this law. This seems odd. Numpy doesn't show this inconsistency. Numpy complains that NaN cannot be used with Or.

Pandas silently casts, without any warning.

There's been a StackOverflow issue about this for a long time, but I couldn't find an issue here: https://stackoverflow.com/questions/39000907/pandas-column-selection-non-commutative-bitwise-or-when-selecting-on-str-and-na

Expected Behavior

The result should always be true in bitwise or comparison if one of the operands is true.

It's ok if the user gets a warning or there's an error. But silently making it go false is bad.

Actual behavior

> df
	a	b
0	True	NaN
1	NaN	True
> df["a"]|df["b"]
0    True
1     False
dtype: bool

Installed Versions

INSTALLED VERSIONS

commit : 2e218d1
python : 3.10.8.final.0
python-bits : 64
OS : Darwin
OS-release : 22.3.0
Version : Darwin Kernel Version 22.3.0: Thu Jan 5 20:48:54 PST 2023; root:xnu-8792.81.2~2/RELEASE_ARM64_T6000
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8

pandas : 1.5.3
numpy : 1.24.2
pytz : 2022.7.1
dateutil : 2.8.2
setuptools : 67.1.0
pip : 23.0
Cython : None
pytest : 7.2.1
hypothesis : None
...
xlrd : None
xlwt : None
zstandard : 0.19.0
tzdata : None

@corneliusroemer corneliusroemer added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 9, 2023
@corneliusroemer corneliusroemer changed the title BUG: pd.Series.__or__() and | is non-commutative if one value is NaN BUG: When done as part of pd.Series, NaN | True is False which contradicts logic Feb 9, 2023
@corneliusroemer
Copy link
Author

Possible duplicate of #41604

@topper-123
Copy link
Contributor

This is a bit of edge case for pandas as generally (outside of pandas) we have that

>>> True | 1.5
TypeError: unsupported operand type(s) for |: 'bool' and 'float'
>>> True | np.nan
TypeError: unsupported operand type(s) for |: 'bool' and 'float'
>>> True | None
TypeError: unsupported operand type(s) for |: 'bool' and 'NoneType'

So in this case you are working with object dtypes with incompatible data, which is never fun.

We do have some half-baked support for this, but I'd strongly recommend convert to "boolean" dtype when working with booleans with nan data.

So IDK, it's probably almost hopeless to support all "expected" cases when using object dtypes. I guess you could try to see if you can patch this to work, but I think it has to be relatively simple to be accepted.

@topper-123 topper-123 added Numeric Operations Arithmetic, Comparison, and Logical operations and removed Needs Triage Issue that has not been reviewed by a pandas team member labels May 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

No branches or pull requests

2 participants