Skip to content

[Python] Converting date32/64 to pandas using nanoseconds can silently overflow #36084

@jorisvandenbossche

Description

@jorisvandenbossche

Describe the bug, including details regarding any error messages, version, and platform.

If you specify to convert a date32 or date64 field to numpy/pandas datetime64 (i.e. not datetime.date objects) using date_as_object=False, and your date is out of bounds for the target resolution (at the moment nanoseconds, but with #35656 and recent pandas versions, this will become milliseconds), you silently get mangled values:

>>> pa.array([datetime.date(2400, 1, 1)]).to_pandas(date_as_object=False)
0   1815-06-13 00:25:26.290448384
dtype: datetime64[ns]

This is because we currently simply multiple the values to get nanoseconds, without bounds / overflow checking:

if (type == Type::DATE32) {
// Convert from days since epoch to datetime64[ns]
ConvertDatetimeLikeNanos<int32_t, kNanosecondsInDay>(*data, out_values);

We could maybe use a cast instead? (which already has proper bounds checking):

>>> pa.array([datetime.date(2400, 1, 1)]).cast(pa.timestamp("ns"))
...
ArrowInvalid: Casting from date32[day] to timestamp[ns] would result in out of bounds timestamp: 157054

Component(s)

Python

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions