Skip to content

DataFrameGroupBy.agg with nan results into inf #59106

Open
@glaucouri

Description

@glaucouri

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

pd.__version__
# '2.2.0'

df = pd.DataFrame({'A':['A','B'], 'B':[0,0]}).astype({'A':'string','B':'Float32'})
df['C'] = df['B']/df['B']

df.groupby('A')['C'].agg(['max','min'])

# 
#    max  min
# A          
# A -inf  inf
# B -inf  inf

df.groupby('A')['C'].max() # the same with .agg(max)

# A
# A   -inf
# B   -inf
# Name: C, dtype: Float32

df.groupby('A')['C'].min()  # the same with .agg(min)

# A
# A    inf
# B    inf
# Name: C, dtype: Float32

Issue Description

DataFrameGroupBy.agg handles poorly nan.

Unfortunately, sometimes happens that some nullable fields have some nan.
cfr: #32265

And this case falls into unexpected behavior in conjunction with groupby.

In a nutshell:

Having nan into a Float field make the groupby()[min/max] computation wrong

Expected Behavior

From my perspective, a nan must generate other nan,
an aggregation of nan, must again generate nan

semantically: "An invalid value, cannot be computed, so a transformation of it should result again into an invalid value"

an aggregation (via groupby) of nan, should result into nan

Installed Versions

INSTALLED VERSIONS

commit : fd3f571
python : 3.10.13.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-186-generic
Version : #206-Ubuntu SMP Fri Apr 26 12:31:10 UTC 2024
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.2.0
numpy : 1.23.5
pytz : 2024.1
dateutil : 2.8.2
setuptools : 65.5.0
pip : 24.1
Cython : 0.29.37
pytest : 7.4.4
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 3.1.9
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.3
IPython : 8.22.1
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
bottleneck : None
dataframe-api-compat : None
fastparquet : 2024.2.0
fsspec : 2024.6.0
gcsfs : 2024.6.0
matplotlib : 3.7.5
numba : 0.59.1
numexpr : None
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : 12.0.1
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.10.1
sqlalchemy : 2.0.28
tables : None
tabulate : 0.9.0
xarray : None
xlrd : None
zstandard : None
tzdata : 2024.1
qtpy : None
pyqt5 : None

Metadata

Metadata

Assignees

Labels

ApplyApply, Aggregate, Transform, MapBugGroupby

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions