Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: pd.Series.describe ignores include, exclude arguments #54193

Open
3 tasks done
natalymr opened this issue Jul 19, 2023 · 6 comments
Open
3 tasks done

BUG: pd.Series.describe ignores include, exclude arguments #54193

natalymr opened this issue Jul 19, 2023 · 6 comments
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Error Reporting Incorrect or improved errors from pandas

Comments

@natalymr
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
df = pd.DataFrame({"M": [1 + 20j] * 2})
df['M'].describe(exclude=[np.complex128])

Issue Description

I explicitly exclude this type, but anyway, I get a description.

In the desribe.py module in the describe_ndframe function, I saw this code:

    if obj.ndim == 1:
        describer = SeriesDescriber(
            obj=cast("Series", obj),
        )
    else:
        describer = DataFrameDescriber(
            obj=cast("DataFrame", obj),
            include=include,
            exclude=exclude,
        )

Which is the root cause of this bug.
If these arguments are ignored why they are mentioned in the documentation?

Expected Behavior

include and exclude arguments are not ignored

Installed Versions

INSTALLED VERSIONS

commit : 37ea63d
python : 3.9.6.final.0
python-bits : 64
OS : Darwin
OS-release : 22.3.0
Version : Darwin Kernel Version 22.3.0: Mon Jan 30 20:39:46 PST 2023; root:xnu-8792.81.3~2/RELEASE_ARM64_T6020
machine : arm64
processor : arm
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.0.1
numpy : 1.24.2
pytz : 2023.3
dateutil : 2.8.2
setuptools : 65.5.1
pip : 23.1.2
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.12.0
pandas_datareader: None
bs4 : 4.12.2
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.7.1
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 12.0.0
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None

@natalymr natalymr added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 19, 2023
@natalymr natalymr changed the title BUG: pd.Series.describe ignore include, exclude arguments BUG: pd.Series.describe ignores include, exclude arguments Jul 19, 2023
@mroeschke
Copy link
Member

Thanks for the report, but this is mentioned in the docstring in the notes section

https://pandas.pydata.org/docs/reference/api/pandas.Series.describe.html

The include and exclude parameters can be used to limit which columns in a DataFrame are analyzed for the output. The parameters are ignored when analyzing a Series

It probably would be more explicit to raise a ValueError though.

@mroeschke mroeschke added Error Reporting Incorrect or improved errors from pandas Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 19, 2023
@natalymr
Copy link
Author

Thanks for the answer

But I'm sorry, I still don't get the point why it's mentioned only in the last line in Notes section.

And why is there no opportunity to exclude some types for Series.describe?
In case I process a bunch of Series and I don't want to deal, for example, with percentiles/min/max statistics for complex types that don't make sense, why I can't use the same logic/code as for DataFrames?
Why do I need to manually check types before the method call?

@mroeschke
Copy link
Member

Well a Series has one data dtype, so include/exclude only really should have an impact where exclude matches the Series data dtype, and it doesn't make sense to compute describe on an empty Series.

@natalymr
Copy link
Author

Because of this commit in the numpy library - numpy/numpy@b3c0960
I anyway need to manually check the type before calling describe method. Because in the previous versions of numpy there was no problem computing percentiles for the complex type. Now this code raises the TypeError.

Maybe you should point it out in the documentation or perhaps return the exclude argument for the describe method.
WDYT?

@mroeschke
Copy link
Member

Maybe you should point it out in the documentation or perhaps return the exclude argument for the describe method.
WDYT?

Yeah either of these enhancements sounds good

@natalymr
Copy link
Author

Frankly, the second option is preferable. To return exclude parameter for the describe method for Series

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Error Reporting Incorrect or improved errors from pandas
Projects
None yet
Development

No branches or pull requests

2 participants