Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: binop methods with fill_value not being respected in pandas 2.2 #57447

Open
mroeschke opened this issue Feb 16, 2024 · 2 comments
Open

BUG: binop methods with fill_value not being respected in pandas 2.2 #57447

mroeschke opened this issue Feb 16, 2024 · 2 comments
Labels
Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate NA - MaskedArrays Related to pd.NA and nullable extension arrays Numeric Operations Arithmetic, Comparison, and Logical operations

Comments

@mroeschke
Copy link
Member

In [1]: import pandas as pd; from io import StringIO; import numpy as np

In [2]: pd.__version__
Out[2]: '2.1.4'

In [3]: ser = pd.Series(pd.array([1, pd.NA], dtype="Float64"))

In [4]: ser.eq(np.nan, fill_value=0.0).  # 2.1.4
Out[4]: 
0    False
1    False
dtype: boolean

In [4]: ser.eq(np.nan, fill_value=0.0). # 2.2
Out[4]: 
0    False
1     <NA>
dtype: boolean

I would expect fill_value to replace NA with 0.0 before comparing with np.nan as described in the docs

@mroeschke mroeschke added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Numeric Operations Arithmetic, Comparison, and Logical operations NA - MaskedArrays Related to pd.NA and nullable extension arrays labels Feb 16, 2024
@rohanjain101
Copy link
Contributor

rohanjain101 commented Feb 18, 2024

I believe behavior was changed in #55568 but is the 2.2.0 result not the expected result? Since the documentation states that the fill value should only be used when one side is missing, in this case, since the left series is a nullable dtype, should np.nan not be treated as a missing value? I would have thought that even for nullable types, since nan is converted to NA in construction:

>>> pd.Series([np.nan], dtype="Float64")
0    <NA>
dtype: Float64
>>>

In this example, since both left and right are missing values, fill value should not do a replacement.

rapids-bot bot pushed a commit to rapidsai/cudf that referenced this issue Feb 20, 2024
2 tests needed to be adjusted due to pandas changes in behaviors in pandas-dev/pandas#57447 and pandas-dev/pandas#57448

Authors:
  - Matthew Roeschke (https://github.com/mroeschke)

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

URL: #15078
@sfc-gh-jkew
Copy link

The current behavior makes sense (only fill when one side is missing) because it follows the docs, but I can see why this would be confusing. It's entirely reasonable to assume without a close reading that fill_value might fill both sides as opposed to only one side when there's an np.nan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate NA - MaskedArrays Related to pd.NA and nullable extension arrays Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

No branches or pull requests

3 participants