Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: dt64 astype silent overflows #55979

Merged
merged 2 commits into from
Nov 16, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -347,6 +347,7 @@ Datetimelike
- Bug in :meth:`Index.is_monotonic_increasing` and :meth:`Index.is_monotonic_decreasing` always caching :meth:`Index.is_unique` as ``True`` when first value in index is ``NaT`` (:issue:`55755`)
- Bug in :meth:`Index.view` to a datetime64 dtype with non-supported resolution incorrectly raising (:issue:`55710`)
- Bug in :meth:`Tick.delta` with very large ticks raising ``OverflowError`` instead of ``OutOfBoundsTimedelta`` (:issue:`55503`)
- Bug in ``.astype`` converting from a higher-resolution ``datetime64`` dtype to a lower-resolution ``datetime64`` dtype (e.g. ``datetime64[us]->datetim64[ms]``) silently overflowing with values near the lower implementation bound (:issue:`55979`)
- Bug in adding or subtracting a :class:`Week` offset to a ``datetime64`` :class:`Series`, :class:`Index`, or :class:`DataFrame` column with non-nanosecond resolution returning incorrect results (:issue:`55583`)
- Bug in addition or subtraction of :class:`BusinessDay` offset with ``offset`` attribute to non-nanosecond :class:`Index`, :class:`Series`, or :class:`DataFrame` column giving incorrect results (:issue:`55608`)
- Bug in addition or subtraction of :class:`DateOffset` objects with microsecond components to ``datetime64`` :class:`Index`, :class:`Series`, or :class:`DataFrame` columns with non-nanosecond resolution (:issue:`55595`)
Expand Down
2 changes: 1 addition & 1 deletion pandas/_libs/tslibs/conversion.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -716,7 +716,7 @@ cdef int64_t parse_pydatetime(
result = _ts.value
else:
if isinstance(val, _Timestamp):
result = (<_Timestamp>val)._as_creso(creso, round_ok=False)._value
result = (<_Timestamp>val)._as_creso(creso, round_ok=True)._value
else:
result = pydatetime_to_dt64(val, dts, reso=creso)
return result
28 changes: 16 additions & 12 deletions pandas/_libs/tslibs/np_datetime.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -365,13 +365,10 @@ cpdef ndarray astype_overflowsafe(
return values

elif from_unit > to_unit:
if round_ok:
# e.g. ns -> us, so there is no risk of overflow, so we can use
# numpy's astype safely. Note there _is_ risk of truncation.
return values.astype(dtype)
else:
iresult2 = astype_round_check(values.view("i8"), from_unit, to_unit)
return iresult2.view(dtype)
iresult2 = _astype_overflowsafe_to_smaller_unit(
values.view("i8"), from_unit, to_unit, round_ok=round_ok
)
return iresult2.view(dtype)

if (<object>values).dtype.byteorder == ">":
# GH#29684 we incorrectly get OutOfBoundsDatetime if we dont swap
Expand Down Expand Up @@ -502,13 +499,20 @@ cdef int op_to_op_code(op):
return Py_GT


cdef ndarray astype_round_check(
cdef ndarray _astype_overflowsafe_to_smaller_unit(
ndarray i8values,
NPY_DATETIMEUNIT from_unit,
NPY_DATETIMEUNIT to_unit
NPY_DATETIMEUNIT to_unit,
bint round_ok,
):
# cases with from_unit > to_unit, e.g. ns->us, raise if the conversion
# involves truncation, e.g. 1500ns->1us
"""
Overflow-safe conversion for cases with from_unit > to_unit, e.g. ns->us.
In addition for checking for overflows (which can occur near the lower
implementation bound, see numpy#22346), this checks for truncation,
e.g. 1500ns->1us.
"""
# e.g. test_astype_ns_to_ms_near_bounds is a case with round_ok=True where
# just using numpy's astype silently fails
cdef:
Py_ssize_t i, N = i8values.size

Expand All @@ -531,7 +535,7 @@ cdef ndarray astype_round_check(
new_value = NPY_DATETIME_NAT
else:
new_value, mod = divmod(value, mult)
if mod != 0:
if not round_ok and mod != 0:
# TODO: avoid runtime import
from pandas._libs.tslibs.dtypes import npy_unit_to_abbrev
from_abbrev = npy_unit_to_abbrev(from_unit)
Expand Down
16 changes: 16 additions & 0 deletions pandas/tests/arrays/test_datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -309,6 +309,22 @@ def test_cmp_dt64_arraylike_tznaive(self, comparison_op):


class TestDatetimeArray:
def test_astype_ns_to_ms_near_bounds(self):
# GH#55979
ts = pd.Timestamp("1677-09-21 00:12:43.145225")
target = ts.as_unit("ms")

dta = DatetimeArray._from_sequence([ts], dtype="M8[ns]")
assert (dta.view("i8") == ts.as_unit("ns").value).all()

result = dta.astype("M8[ms]")
assert result[0] == target

expected = DatetimeArray._from_sequence([ts], dtype="M8[ms]")
assert (expected.view("i8") == target._value).all()

tm.assert_datetime_array_equal(result, expected)

def test_astype_non_nano_tznaive(self):
dti = pd.date_range("2016-01-01", periods=3)

Expand Down
Loading