Closed
Description
I've had an error with hypothesis when running. The bug can be reproduced (I think?, I've not used hypothesis before, so maybe this hash is machine-dependent) by doing this pytest pandas/tests/frame/test_apply.py::TestDataFrameAggregate::test_frequency_is_original --hypothesis-seed=316520601087970019200180394352582921839
The error message is this:
2018-12-06T00:54:48.7685364Z
2018-12-06T00:54:48.7686382Z =================================== FAILURES ===================================
2018-12-06T00:54:48.7688390Z ______________ TestDataFrameAggregate.test_frequency_is_original _______________
2018-12-06T00:54:48.7689359Z [gw0] linux -- Python 3.7.0 /home/vsts/miniconda3/envs/pandas-dev/bin/python
2018-12-06T00:54:48.7689655Z
2018-12-06T00:54:48.7689936Z self = <pandas.tests.frame.test_apply.TestDataFrameAggregate object at 0x7ff32a5c3320>
2018-12-06T00:54:48.7690172Z
2018-12-06T00:54:48.7690377Z @given(index=indices(max_length=5), num_columns=integers(0, 5))
2018-12-06T00:54:48.7690609Z > @settings(deadline=1000)
2018-12-06T00:54:48.7694033Z def test_frequency_is_original(self, index, num_columns):
2018-12-06T00:54:48.7694602Z
2018-12-06T00:54:48.7694893Z pandas/tests/frame/test_apply.py:1160:
2018-12-06T00:54:48.7695224Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
2018-12-06T00:54:48.7695479Z pandas/tests/frame/test_apply.py:836: in indices
2018-12-06T00:54:48.7695755Z dr = date_range(date, periods=periods, freq=freq)
2018-12-06T00:54:48.7696003Z pandas/core/indexes/datetimes.py:1479: in date_range
2018-12-06T00:54:48.7696239Z closed=closed, **kwargs)
2018-12-06T00:54:48.7696669Z pandas/core/arrays/datetimes.py:293: in _generate_range
2018-12-06T00:54:48.7696879Z index = _generate_regular_range(cls, start, end, periods, freq)
2018-12-06T00:54:48.7697124Z pandas/core/arrays/datetimes.py:1716: in _generate_regular_range
2018-12-06T00:54:48.7697338Z values = np.array([x.value for x in xdr], dtype=np.int64)
2018-12-06T00:54:48.7697545Z pandas/core/arrays/datetimes.py:1716: in <listcomp>
2018-12-06T00:54:48.7697786Z values = np.array([x.value for x in xdr], dtype=np.int64)
2018-12-06T00:54:48.7698865Z pandas/tseries/offsets.py:2508: in generate_range
2018-12-06T00:54:48.7699470Z end = start + (periods - 1) * offset
2018-12-06T00:54:48.7700435Z pandas/_libs/tslibs/offsets.pyx:489: in pandas._libs.tslibs.offsets.BaseOffset.__radd__
2018-12-06T00:54:48.7700701Z return self.__add__(other)
2018-12-06T00:54:48.7700985Z pandas/_libs/tslibs/offsets.pyx:362: in pandas._libs.tslibs.offsets._BaseOffset.__add__
2018-12-06T00:54:48.7701242Z return self.apply(other)
2018-12-06T00:54:48.7701830Z pandas/tseries/offsets.py:69: in wrapper
2018-12-06T00:54:48.7702065Z result = func(self, other)
2018-12-06T00:54:48.7702488Z pandas/tseries/offsets.py:527: in apply
2018-12-06T00:54:48.7702884Z result = other + timedelta(days=7 * weeks + days)
2018-12-06T00:54:48.7703610Z pandas/_libs/tslibs/timestamps.pyx:355: in pandas._libs.tslibs.timestamps._Timestamp.__add__
2018-12-06T00:54:48.7703892Z result = Timestamp(self.value + nanos,
2018-12-06T00:54:48.7704154Z pandas/_libs/tslibs/timestamps.pyx:736: in pandas._libs.tslibs.timestamps.Timestamp.__new__
2018-12-06T00:54:48.7704430Z ts = convert_to_tsobject(ts_input, tz, unit, 0, 0, nanosecond or 0)
2018-12-06T00:54:48.7704699Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
2018-12-06T00:54:48.7704909Z
2018-12-06T00:54:48.7705172Z > obj.value = ts
2018-12-06T00:54:48.7705407Z E OverflowError: Python int too large to convert to C long
2018-12-06T00:54:48.7705613Z
2018-12-06T00:54:48.7705877Z pandas/_libs/tslibs/conversion.pyx:297: OverflowError
2018-12-06T00:54:48.7706477Z ---------------------------------- Hypothesis ----------------------------------
2018-12-06T00:54:48.7707486Z You can add @seed(316520601087970019200180394352582921839) to this test or run pytest with --hypothesis-seed=316520601087970019200180394352582921839 to reproduce this failure.
2018-12-06T00:54:48.8610169Z
Reproducing the error
The error stems from a function in pandas/tests/frame/test_apply.py
@composite
def indices(draw, max_length=5):
date = draw(
dates(
min_value=Timestamp.min.ceil("D").to_pydatetime().date(),
max_value=Timestamp.max.floor("D").to_pydatetime().date(),
).map(Timestamp)
)
periods = draw(integers(0, max_length))
freq = draw(sampled_from(list("BDHTS")))
dr = date_range(date, periods=periods, freq=freq)
return pd.DatetimeIndex(list(dr))
This function above is used by hypothesis. It causes a failure when calling date_range when date = Timestamp.max.floor("D").to_pydatetime().date()
and freq in {'B', 'D'}
For example:
>>> date = pd.Timestamp.max.floor("D").to_pydatetime().date() # datetime.date(2262, 4, 11)
>>> freq = 'B'
>>> pd.date_range(date, periods=1, freq=freq)
OverflowError: int too big to convert
>>> freq = 'D'
>>> pd.date_range(date, periods=1, freq=freq)
OutOfBoundsDatetime: Cannot generate range with start=9223286400000000000 and periods=1
You'll notice the error types are different for the two cases. Presumably the first example should have returned a OutOfBoundsDatetime also.