Skip to content

Commit

Permalink
BUG: fixed OutOfBoundsDatetime exception when errors=coerce #45319 (#…
Browse files Browse the repository at this point in the history
…47794)

* BUG: fixed OutOfBoundsDatetime exception when errors=coerce #45319

* BUG: Added test and release note #45319

* BUG: Restructured test parameters #45319

* BUG: Restructured test #45319

* BUG: Restructured parameters for test #45319

* BUG: Renamed test and added raise and ignore cases #45319

* BUG: Changed exception case #45319

Co-authored-by: Steven Rotondo <steven.rotondo75@gmail.com>
  • Loading branch information
srotondo and KingOfTheShow69 authored Aug 15, 2022
1 parent d0bd469 commit 60a2d56
Show file tree
Hide file tree
Showing 3 changed files with 37 additions and 1 deletion.
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.5.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -900,6 +900,7 @@ Datetimelike
- Bug in :meth:`DatetimeIndex.resolution` incorrectly returning "day" instead of "nanosecond" for nanosecond-resolution indexes (:issue:`46903`)
- Bug in :class:`Timestamp` with an integer or float value and ``unit="Y"`` or ``unit="M"`` giving slightly-wrong results (:issue:`47266`)
- Bug in :class:`.DatetimeArray` construction when passed another :class:`.DatetimeArray` and ``freq=None`` incorrectly inferring the freq from the given array (:issue:`47296`)
- Bug in :func:`to_datetime` where ``OutOfBoundsDatetime`` would be thrown even if ``errors=coerce`` if there were more than 50 rows (:issue:`45319`)
- Bug when adding a :class:`DateOffset` to a :class:`Series` would not add the ``nanoseconds`` field (:issue:`47856`)
-

Expand Down
6 changes: 5 additions & 1 deletion pandas/core/tools/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -228,7 +228,11 @@ def _maybe_cache(
unique_dates = unique(arg)
if len(unique_dates) < len(arg):
cache_dates = convert_listlike(unique_dates, format)
cache_array = Series(cache_dates, index=unique_dates)
# GH#45319
try:
cache_array = Series(cache_dates, index=unique_dates)
except OutOfBoundsDatetime:
return cache_array
# GH#39882 and GH#35888 in case of None and NaT we get duplicates
if not cache_array.index.is_unique:
cache_array = cache_array[~cache_array.index.duplicated()]
Expand Down
31 changes: 31 additions & 0 deletions pandas/tests/tools/test_to_datetime.py
Original file line number Diff line number Diff line change
Expand Up @@ -2777,3 +2777,34 @@ def test_to_datetime_monotonic_increasing_index(cache):
result = to_datetime(times.iloc[:, 0], cache=cache)
expected = times.iloc[:, 0]
tm.assert_series_equal(result, expected)


@pytest.mark.parametrize(
"series_length",
[40, start_caching_at, (start_caching_at + 1), (start_caching_at + 5)],
)
def test_to_datetime_cache_coerce_50_lines_outofbounds(series_length):
# GH#45319
s = Series(
[datetime.fromisoformat("1446-04-12 00:00:00+00:00")]
+ ([datetime.fromisoformat("1991-10-20 00:00:00+00:00")] * series_length)
)
result1 = to_datetime(s, errors="coerce", utc=True)

expected1 = Series(
[NaT] + ([Timestamp("1991-10-20 00:00:00+00:00")] * series_length)
)

tm.assert_series_equal(result1, expected1)

result2 = to_datetime(s, errors="ignore", utc=True)

expected2 = Series(
[datetime.fromisoformat("1446-04-12 00:00:00+00:00")]
+ ([datetime.fromisoformat("1991-10-20 00:00:00+00:00")] * series_length)
)

tm.assert_series_equal(result2, expected2)

with pytest.raises(OutOfBoundsDatetime, match="Out of bounds nanosecond timestamp"):
to_datetime(s, errors="raise", utc=True)

0 comments on commit 60a2d56

Please sign in to comment.