Skip to content

BUG: Raise OutOfBoundsDatetime in DataFrame.replace when value exceeds datetime64[ns] bounds (GH#61671) #61717

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

iabhi4
Copy link
Contributor

@iabhi4 iabhi4 commented Jun 27, 2025

Fixes a bug where DataFrame.replace would raise a generic AssertionError when trying to replace np.nan in a datetime64[ns] column with an out-of-bounds datetime.datetime object (e.g., datetime(3000, 1, 1)).

This PR fixes that by explicitly raising OutOfBoundsDatetime when the replacement datetime can't safely fit into the datetime64[ns] dtype.

Let me know if you'd like to test other edge cases or if there's a more idiomatic way to handle this!

@iabhi4
Copy link
Contributor Author

iabhi4 commented Jun 27, 2025

looking into the CI failures

@iabhi4
Copy link
Contributor Author

iabhi4 commented Jun 28, 2025

Regarding CI failures —

So after the changes in find_result_type, we're now catching cases like Timestamp('1677-09-21 00:12:43.145224193') early and raising OutOfBoundsDatetime during coercion itself, which makes sense and is in line with what #56410 was aiming for (no silent truncation between datetime units).

Because of that, test_clip_with_timestamps_and_oob_datetimes_non_nano is now failing since it hits the error earlier with, Just wanted to confirm, should I go ahead and update the test to reflect this message? or is the earlier failure point problematic?

Happy to revert or gate the check if needed.

@mroeschke mroeschke requested a review from jbrockmendel June 30, 2025 17:37
if (
is_datetime64_dtype(new_dtype)
and lib.is_scalar(right)
and isinstance(right, (np.datetime64, dt.datetime, Timestamp))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the Timestamp check is redundant bc it is a subclass of datetime

raise OutOfBoundsDatetime(
f"{right!r} overflows datetime64[ns] during dtype inference"
)
except (OverflowError, ValueError) as e:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick can you call this err instead of e. try to avoid 1-letter variable names

and isinstance(right, (np.datetime64, dt.datetime, Timestamp))
):
try:
ts = Timestamp(right)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think ts.as_unit("ns") will be more robust to if there is a timezone

@@ -1358,6 +1359,24 @@ def find_result_type(left_dtype: DtypeObj, right: Any) -> DtypeObj:
dtype, _ = infer_dtype_from(right)
new_dtype = find_common_type([left_dtype, dtype])

# special case: datetime64[ns] inferred but the value may be out of bounds
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add something like "# GH#61671" here

# GH 61671
df = DataFrame([np.nan], dtype="datetime64[ns]")
too_big = datetime(3000, 1, 1)
from pandas.errors import OutOfBoundsDatetime
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this import go at the top of the file

@@ -1430,6 +1430,18 @@ def test_replace_with_nil_na(self):
result = ser.replace("nil", "anything else")
tm.assert_frame_equal(expected, result)

def test_replace_outofbounds_datetime_raises(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you parametrize this over tz=[None, "US/Eastern"]?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: np.nan to datetime assertionerror when too large datetime given
2 participants