Skip to content

BUG: pd.crosstab(dropna=True, values=) drops rows and columns where any aggregation result is null. #60767

Open
@sfc-gh-mvashishtha

Description

@sfc-gh-mvashishtha

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd


df = pd.DataFrame(
    {"A": [1, 1, 2, 2, 2], "B": [3, 3, 4, 4, 4], "C": [1, 1, 1, 1, 1]},
    dtype=float    
)

assert (
    pd.crosstab(index=df["A"], columns=df["B"], values=df["C"], aggfunc="skew", dropna=True).shape == 
    pd.crosstab(index=df["A"], columns=df["B"], values=df["C"], aggfunc="skew", dropna=False).shape
)

Issue Description

When I provide dropna=True to crosstab and specify a values array to aggregate, pandas seems to drop rows and columns where all the results are NaN. In this case pandas skew gives NaN if there are fewer than 3 values in a group, so only the value for index=2, column=4 has a non-NaN value. That leaves a row of NaN and a column of NaN, so pandas drops both the all-NaN row and the all-NaN column.

Expected Behavior

The documentation says that dropna=True means, "Do not include columns whose entries are all NaN." I expect pandas to follow that rule and to keep values where the aggregation result is NaN. In the example above, I expect to see 2 rows for the 2 possible values of A, 1 and 2, an 2 columns for the 2 possible values of B, 3 and 4.

Installed Versions

INSTALLED VERSIONS
------------------
commit                : 0691c5cf90477d3503834d983f69350f250a6ff7
python                : 3.9.21
python-bits           : 64
OS                    : Darwin
OS-release            : 24.2.0
Version               : Darwin Kernel Version 24.2.0: Fri Dec  6 18:56:34 PST 2024; root:xnu-11215.61.5~2/RELEASE_ARM64_T6020
machine               : arm64
processor             : arm
byteorder             : little
LC_ALL                : None
LANG                  : en_US.UTF-8
LOCALE                : en_US.UTF-8

pandas                : 2.2.3
numpy                 : 2.0.2
pytz                  : 2024.2
dateutil              : 2.8.2
pip                   : 24.2
Cython                : None
sphinx                : None
IPython               : 8.12.0
adbc-driver-postgresql: None
adbc-driver-sqlite    : None
bs4                   : None
blosc                 : None
bottleneck            : None
dataframe-api-compat  : None
fastparquet           : None
fsspec                : None
html5lib              : None
hypothesis            : None
gcsfs                 : None
jinja2                : None
lxml.etree            : None
matplotlib            : None
numba                 : None
numexpr               : None
odfpy                 : None
openpyxl              : None
pandas_gbq            : None
psycopg2              : None
pymysql               : None
pyarrow               : None
pyreadstat            : None
pytest                : None
python-calamine       : None
pyxlsb                : None
s3fs                  : None
scipy                 : None
sqlalchemy            : None
tables                : None
tabulate              : None
xarray                : None
xlrd                  : None
xlsxwriter            : None
zstandard             : None
tzdata                : 2025.1
qtpy                  : None
pyqt5                 : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugNeeds DiscussionRequires discussion from core team before further actionReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions