Inconsistent raise of OverflowError with big integers #14992

jonasbrami · 2024-03-11T16:57:28Z

Checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl
BIG_INT = 25192779970113404096 # Does not fit into UInt64
pl.from_dicts([{'int':1, 'big_int': BIG_INT}])

Log output

(crypto) quenouille@LAPTOP-G5F7OIM4:~/crypto/src$ POLARS_VERBOSE=1 python polars_repo.py 
Traceback (most recent call last):
  File "/home/quenouille/crypto/src/polars_repo.py", line 35, in <module>
    simple_repro()
  File "/home/quenouille/crypto/src/polars_repo.py", line 9, in simple_repro
    pl.from_dicts([{'int':1, 'big_int': BIG_INT}])
  File "/home/quenouille/miniconda3/envs/crypto/lib/python3.10/site-packages/polars/convert.py", line 171, in from_dicts
    return pl.DataFrame(
  File "/home/quenouille/miniconda3/envs/crypto/lib/python3.10/site-packages/polars/dataframe/frame.py", line 378, in __init__
    self._df = sequence_to_pydf(
  File "/home/quenouille/miniconda3/envs/crypto/lib/python3.10/site-packages/polars/_utils/construction.py", line 1024, in sequence_to_pydf
    return _sequence_to_pydf_dispatcher(
  File "/home/quenouille/miniconda3/envs/crypto/lib/python3.10/functools.py", line 889, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/home/quenouille/miniconda3/envs/crypto/lib/python3.10/site-packages/polars/_utils/construction.py", line 1245, in _sequence_of_dict_to_pydf
    pydf = PyDataFrame.read_dicts(
OverflowError: int too big to convert

Issue description

The issue does not happen when removing the first 'int' column.
The issue does not happen when running pl.from_dicts([{'big_int': BIG_INT }]) before pl.from_dicts([{'int':1, 'big_int': BIG_INT}])

Passing a schema does not help:
pl.from_dicts([{'int':1, 'big_int': BIG_INT}], schema={'int': pl.Int64, 'big_int': pl.Float64})

Expected behavior

Should construct a valid dataframe or try to cast to Float64 consistently.
@stinodego

Installed versions

In [4]: pl.show_versions()
--------Version info---------
Polars:               0.20.15
Index type:           UInt32
Platform:             Linux-5.10.16.3-microsoft-standard-WSL2-x86_64-with-glibc2.31
Python:               3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               <not installed>
gevent:               <not installed>
hvplot:               0.9.2
matplotlib:           <not installed>
numpy:                1.26.3
openpyxl:             <not installed>
pandas:               2.2.0
pyarrow:              15.0.0
pydantic:             <not installed>
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           <not installed>
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>

The text was updated successfully, but these errors were encountered:

stinodego · 2024-03-11T17:09:34Z

This is a bug. The reason it will work one way but not the other is that it takes the first non-null value and uses that data type. The BIG_INT has the float64 data type, and then the small int can be read as f64. But it doesn't work the other way around.

This type of logic should not apply here as they are different columns. We should fix this.

mcrumiller · 2024-03-11T17:44:59Z

The BIG_INT has the float64 data type

I hit this issue when looking into pl.Decimal earlier. I believe in pyo3 there are ways around this by using num_bigint, but allowing this in arbitrary series constructors would really slow things down. It might be worth looking into a special path for larger-than-u64 ints during construction.

jonasbrami added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Mar 11, 2024

jonasbrami changed the title ~~Inconsistent raise of Overflow with big integers~~ Inconsistent raise of OverflowError with big integers Mar 11, 2024

stinodego added A-input-parsing Area: parsing input arguments P-medium Priority: medium and removed needs triage Awaiting prioritization by a maintainer labels Mar 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent raise of OverflowError with big integers #14992

Inconsistent raise of OverflowError with big integers #14992

jonasbrami commented Mar 11, 2024

stinodego commented Mar 11, 2024

mcrumiller commented Mar 11, 2024

Inconsistent raise of OverflowError with big integers #14992

Inconsistent raise of OverflowError with big integers #14992

Comments

jonasbrami commented Mar 11, 2024

Checks

Reproducible example

Log output

Issue description

Expected behavior

Installed versions

stinodego commented Mar 11, 2024

mcrumiller commented Mar 11, 2024