Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
Adapted from "Automatic exclusion of nuisance columns" in the User Guide "Group by" docs:
from decimal import Decimal
import pandas as pd
df_dec = pd.DataFrame(
{
"id": [1, 2, 1, 2],
"int_column": [1, 2, 3, 4],
"dec_column": [
Decimal("0.50"),
Decimal("0.15"),
Decimal("0.25"),
Decimal("0.40"),
],
}
)
print('\ndf_dec.groupby(["id"])[["dec_column"]].sum()')
print('According to docs this should sum correctly')
print('It works for pandas 1.2.x but generates an empty dataframe for 1.3.x')
print(df_dec.groupby(["id"])[["dec_column"]].sum())
print('\ndf_dec.groupby(["id"])[["int_column", "dec_column"]].sum()')
print('This drops `dec_column` as expected for both 1.2.x and 1.3.x')
print(df_dec.groupby(["id"])[["int_column", "dec_column"]].sum())
print('\ndf_dec.groupby(["id"]).agg({"int_column": "sum", "dec_column": "sum"})')
print('This aggregates everything correctly as expected for both 1.2.x and 1.3.x')
print(df_dec.groupby(["id"]).agg({"int_column": "sum", "dec_column": "sum"}))
Problem description
The User Guide "Group by" docs provides a code example that shows when nuisance columns will be excluded from aggregation. According to this doc the case:
df_dec.groupby(["id"])[["dec_column"]].sum()
should produce a valid aggregation, but for pandas >= 1.3.0 it results in an empty dataframe.
The impact of the regression can even be seen in the published docs. If we look an archive.org 2021-02-25 snapshot of the "Automatic exclusion of nuisance columns" section, we can see that the example produces correct output (see Out[170]
in the code example):
https://web.archive.org/web/20210225195813/https://pandas.pydata.org/docs/user_guide/groupby.html#automatic-exclusion-of-nuisance-columns
By contrast the 2021-08-24 snapshot displays an empty dataframe for the Out[170]
example:
https://web.archive.org/web/20210824151314/https://pandas.pydata.org/docs/user_guide/groupby.html#automatic-exclusion-of-nuisance-columns
However, note that the docs in both cases indicate that the example should produce a correct aggregation.
Expected Output
df_dec.groupby(["id"])[["dec_column"]].sum()
should produce the result:
dec_column
id
1 0.75
2 0.55
For pandas 1.2.x it does so as expected. For pandas >= 1.3.0 it produces instead the incorrect
Empty DataFrame
Columns: []
Index: [1, 2]