Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inappropriate result of expr skew and kurtosis for a magic constant inputs #15067

Closed
2 tasks done
j7168908jx opened this issue Mar 14, 2024 · 1 comment · Fixed by #15137
Closed
2 tasks done

Inappropriate result of expr skew and kurtosis for a magic constant inputs #15067

j7168908jx opened this issue Mar 14, 2024 · 1 comment · Fixed by #15137
Assignees
Labels
bug Something isn't working P-high Priority: high python Related to Python Polars

Comments

@j7168908jx
Copy link

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

I encountered this inappropriate output at this very exact float number:

import polars as pl
import scipy.stats as st
import pandas as pd

a = pl.DataFrame({"a": [1.0042855193121334] * 11})
b = pd.DataFrame({"a": [1.0042855193121334] * 11})

print(a.select(pl.col('a').kurtosis()).item())
print(b.a.kurtosis())
print(st.kurtosis([1.0042855193121334] * 11))


print(a.select(pl.col('a').skew()).item())
print(b.a.skew())
print(st.skew([1.0042855193121334] * 11))

and the output is

-2.0
0.0
nan
-1.0
0.0
nan

where, the appropriate output should be either 0.0 or nan.

Changing this floating number to others will cause the normal nan output.

>>> print(pl.DataFrame({"a": [1.0042834] * 11}).select(pl.col('a').kurtosis()).item())
nan

Log output

I did not obtain log here.

Issue description

At this very exact number 1.0042855193121334 replicate exact 11 times, the output of polars' skew and kurtosis seems unreasonable. (no matter setting which value for bias or fisher in skew or kurtosis's parameter)

Expected behavior

The appropriate output should be either 0.0 or nan.

Installed versions

--------Version info---------
Polars:               0.20.15
Index type:           UInt32
Platform:             Linux-5.19.0-1010-nvidia-lowlatency-x86_64-with-glibc2.35
Python:               3.10.13 (main, Feb 27 2024, 12:25:06) [GCC 11.4.0]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          3.0.0
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               2024.2.0
gevent:               <not installed>
hvplot:               <not installed>
matplotlib:           3.8.3
numpy:                1.26.4
openpyxl:             <not installed>
pandas:               2.2.1
pyarrow:              15.0.0
pydantic:             <not installed>
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           <not installed>
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>
@j7168908jx j7168908jx added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Mar 14, 2024
@deanm0000 deanm0000 added P-high Priority: high and removed needs triage Awaiting prioritization by a maintainer labels Mar 18, 2024
@deanm0000 deanm0000 self-assigned this Mar 18, 2024
@deanm0000
Copy link
Collaborator

This should be fixed with #15137

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P-high Priority: high python Related to Python Polars
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants