Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent raise of OverflowError with big integers #14992

Open
2 tasks done
jonasbrami opened this issue Mar 11, 2024 · 2 comments
Open
2 tasks done

Inconsistent raise of OverflowError with big integers #14992

jonasbrami opened this issue Mar 11, 2024 · 2 comments
Labels
A-input-parsing Area: parsing input arguments bug Something isn't working P-medium Priority: medium python Related to Python Polars

Comments

@jonasbrami
Copy link

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl
BIG_INT = 25192779970113404096 # Does not fit into UInt64
pl.from_dicts([{'int':1, 'big_int': BIG_INT}])

Log output

(crypto) quenouille@LAPTOP-G5F7OIM4:~/crypto/src$ POLARS_VERBOSE=1 python polars_repo.py 
Traceback (most recent call last):
  File "/home/quenouille/crypto/src/polars_repo.py", line 35, in <module>
    simple_repro()
  File "/home/quenouille/crypto/src/polars_repo.py", line 9, in simple_repro
    pl.from_dicts([{'int':1, 'big_int': BIG_INT}])
  File "/home/quenouille/miniconda3/envs/crypto/lib/python3.10/site-packages/polars/convert.py", line 171, in from_dicts
    return pl.DataFrame(
  File "/home/quenouille/miniconda3/envs/crypto/lib/python3.10/site-packages/polars/dataframe/frame.py", line 378, in __init__
    self._df = sequence_to_pydf(
  File "/home/quenouille/miniconda3/envs/crypto/lib/python3.10/site-packages/polars/_utils/construction.py", line 1024, in sequence_to_pydf
    return _sequence_to_pydf_dispatcher(
  File "/home/quenouille/miniconda3/envs/crypto/lib/python3.10/functools.py", line 889, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/home/quenouille/miniconda3/envs/crypto/lib/python3.10/site-packages/polars/_utils/construction.py", line 1245, in _sequence_of_dict_to_pydf
    pydf = PyDataFrame.read_dicts(
OverflowError: int too big to convert

Issue description

The issue does not happen when removing the first 'int' column.
The issue does not happen when running pl.from_dicts([{'big_int': BIG_INT }]) before pl.from_dicts([{'int':1, 'big_int': BIG_INT}])

Passing a schema does not help:
pl.from_dicts([{'int':1, 'big_int': BIG_INT}], schema={'int': pl.Int64, 'big_int': pl.Float64})

Expected behavior

Should construct a valid dataframe or try to cast to Float64 consistently.
@stinodego

Installed versions

In [4]: pl.show_versions()
--------Version info---------
Polars:               0.20.15
Index type:           UInt32
Platform:             Linux-5.10.16.3-microsoft-standard-WSL2-x86_64-with-glibc2.31
Python:               3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               <not installed>
gevent:               <not installed>
hvplot:               0.9.2
matplotlib:           <not installed>
numpy:                1.26.3
openpyxl:             <not installed>
pandas:               2.2.0
pyarrow:              15.0.0
pydantic:             <not installed>
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           <not installed>
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>

@jonasbrami jonasbrami added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Mar 11, 2024
@jonasbrami jonasbrami changed the title Inconsistent raise of Overflow with big integers Inconsistent raise of OverflowError with big integers Mar 11, 2024
@stinodego stinodego added A-input-parsing Area: parsing input arguments P-medium Priority: medium and removed needs triage Awaiting prioritization by a maintainer labels Mar 11, 2024
@stinodego
Copy link
Member

This is a bug. The reason it will work one way but not the other is that it takes the first non-null value and uses that data type. The BIG_INT has the float64 data type, and then the small int can be read as f64. But it doesn't work the other way around.

This type of logic should not apply here as they are different columns. We should fix this.

@mcrumiller
Copy link
Contributor

The BIG_INT has the float64 data type

I hit this issue when looking into pl.Decimal earlier. I believe in pyo3 there are ways around this by using num_bigint, but allowing this in arbitrary series constructors would really slow things down. It might be worth looking into a special path for larger-than-u64 ints during construction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-input-parsing Area: parsing input arguments bug Something isn't working P-medium Priority: medium python Related to Python Polars
Projects
Status: Ready
Development

No branches or pull requests

3 participants