-
Notifications
You must be signed in to change notification settings - Fork 4k
Open
Description
Describe the bug, including details regarding any error messages, version, and platform.
If you specify to convert a date32 or date64 field to numpy/pandas datetime64 (i.e. not datetime.date objects) using date_as_object=False, and your date is out of bounds for the target resolution (at the moment nanoseconds, but with #35656 and recent pandas versions, this will become milliseconds), you silently get mangled values:
>>> pa.array([datetime.date(2400, 1, 1)]).to_pandas(date_as_object=False)
0 1815-06-13 00:25:26.290448384
dtype: datetime64[ns]
This is because we currently simply multiple the values to get nanoseconds, without bounds / overflow checking:
arrow/python/pyarrow/src/arrow/python/arrow_to_pandas.cc
Lines 1592 to 1594 in b4ac585
| if (type == Type::DATE32) { | |
| // Convert from days since epoch to datetime64[ns] | |
| ConvertDatetimeLikeNanos<int32_t, kNanosecondsInDay>(*data, out_values); |
We could maybe use a cast instead? (which already has proper bounds checking):
>>> pa.array([datetime.date(2400, 1, 1)]).cast(pa.timestamp("ns"))
...
ArrowInvalid: Casting from date32[day] to timestamp[ns] would result in out of bounds timestamp: 157054
Component(s)
Python
Reactions are currently unavailable