Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inappropriate result of expr skew and kurtosis for a magic constant inputs #18617

Open
2 tasks done
JackieJin1025 opened this issue Sep 9, 2024 · 1 comment
Open
2 tasks done
Labels
bug Something isn't working needs decision Awaiting decision by a maintainer P-low Priority: low python Related to Python Polars

Comments

@JackieJin1025
Copy link

JackieJin1025 commented Sep 9, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl
a = pl.DataFrame({"a": [1.0042855193121334] * 60})
for l in range(60):
     print(l, a.slice(0, l).select(pl.col('a').kurtosis()).item())

output

image

Log output

I did not obtain log here.

Issue description

expect to constantly get nan for any size, but get None when size is 0 and -2 when size >= 22

Expected behavior

import polars as pl
a = pl.DataFrame({"a": [1.0042855193121334] * 60})
for l in range(60):
      print(l, st.kurtosis(a.slice(0, l).select(pl.col('a')).to_numpy().flatten()))

get nan consistently

Installed versions

--------Version info---------
Polars:               1.5.0
Index type:           UInt32
Platform:             Linux-3.10.107-1-tlinux2_kvm_guest-0056-x86_64-with-glibc2.28
Python:               3.9.9 (main, Apr 24 2023, 09:37:21) 
[GCC 10.2.0]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          2.2.1
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               2023.4.0
gevent:               22.10.2
great_tables:         <not installed>
hvplot:               <not installed>
matplotlib:           3.9.2
nest_asyncio:         1.5.6
numpy:                1.24.2
openpyxl:             3.1.2
pandas:               1.5.3
pyarrow:              15.0.2
pydantic:             1.10.18
pyiceberg:            <not installed>
sqlalchemy:           2.0.24
torch:                1.11.0+cu102
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>

related issue:
#15067

@JackieJin1025 JackieJin1025 added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Sep 9, 2024
@deanm0000
Copy link
Collaborator

hmm I guess the difference between rust's f64::EPSILON=2.220446049250313e-16 and numpy's np.finfo(np.float64).resolution=1e-15 is relevant here. I'm not sure if the right fix would be to just hard code 1e-15 instead of using f64::EPSILON, scale f64::EPSILON by the sample size (somehow), check if all inputs are mostly equal, or something else.

@deanm0000 deanm0000 added P-low Priority: low and removed needs triage Awaiting prioritization by a maintainer labels Sep 9, 2024
@deanm0000 deanm0000 added the needs decision Awaiting decision by a maintainer label Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs decision Awaiting decision by a maintainer P-low Priority: low python Related to Python Polars
Projects
Status: Ready
Development

No branches or pull requests

2 participants