Skip to content

TST: Hypothesis may draw a date outside of date_range's range #24242

Closed
@topper-123

Description

@topper-123

I've had an error with hypothesis when running. The bug can be reproduced (I think?, I've not used hypothesis before, so maybe this hash is machine-dependent) by doing this pytest pandas/tests/frame/test_apply.py::TestDataFrameAggregate::test_frequency_is_original --hypothesis-seed=316520601087970019200180394352582921839

The error message is this:

2018-12-06T00:54:48.7685364Z 
2018-12-06T00:54:48.7686382Z =================================== FAILURES ===================================
2018-12-06T00:54:48.7688390Z ______________ TestDataFrameAggregate.test_frequency_is_original _______________
2018-12-06T00:54:48.7689359Z [gw0] linux -- Python 3.7.0 /home/vsts/miniconda3/envs/pandas-dev/bin/python
2018-12-06T00:54:48.7689655Z 
2018-12-06T00:54:48.7689936Z self = <pandas.tests.frame.test_apply.TestDataFrameAggregate object at 0x7ff32a5c3320>
2018-12-06T00:54:48.7690172Z 
2018-12-06T00:54:48.7690377Z     @given(index=indices(max_length=5), num_columns=integers(0, 5))
2018-12-06T00:54:48.7690609Z >   @settings(deadline=1000)
2018-12-06T00:54:48.7694033Z     def test_frequency_is_original(self, index, num_columns):
2018-12-06T00:54:48.7694602Z 
2018-12-06T00:54:48.7694893Z pandas/tests/frame/test_apply.py:1160: 
2018-12-06T00:54:48.7695224Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
2018-12-06T00:54:48.7695479Z pandas/tests/frame/test_apply.py:836: in indices
2018-12-06T00:54:48.7695755Z     dr = date_range(date, periods=periods, freq=freq)
2018-12-06T00:54:48.7696003Z pandas/core/indexes/datetimes.py:1479: in date_range
2018-12-06T00:54:48.7696239Z     closed=closed, **kwargs)
2018-12-06T00:54:48.7696669Z pandas/core/arrays/datetimes.py:293: in _generate_range
2018-12-06T00:54:48.7696879Z     index = _generate_regular_range(cls, start, end, periods, freq)
2018-12-06T00:54:48.7697124Z pandas/core/arrays/datetimes.py:1716: in _generate_regular_range
2018-12-06T00:54:48.7697338Z     values = np.array([x.value for x in xdr], dtype=np.int64)
2018-12-06T00:54:48.7697545Z pandas/core/arrays/datetimes.py:1716: in <listcomp>
2018-12-06T00:54:48.7697786Z     values = np.array([x.value for x in xdr], dtype=np.int64)
2018-12-06T00:54:48.7698865Z pandas/tseries/offsets.py:2508: in generate_range
2018-12-06T00:54:48.7699470Z     end = start + (periods - 1) * offset
2018-12-06T00:54:48.7700435Z pandas/_libs/tslibs/offsets.pyx:489: in pandas._libs.tslibs.offsets.BaseOffset.__radd__
2018-12-06T00:54:48.7700701Z     return self.__add__(other)
2018-12-06T00:54:48.7700985Z pandas/_libs/tslibs/offsets.pyx:362: in pandas._libs.tslibs.offsets._BaseOffset.__add__
2018-12-06T00:54:48.7701242Z     return self.apply(other)
2018-12-06T00:54:48.7701830Z pandas/tseries/offsets.py:69: in wrapper
2018-12-06T00:54:48.7702065Z     result = func(self, other)
2018-12-06T00:54:48.7702488Z pandas/tseries/offsets.py:527: in apply
2018-12-06T00:54:48.7702884Z     result = other + timedelta(days=7 * weeks + days)
2018-12-06T00:54:48.7703610Z pandas/_libs/tslibs/timestamps.pyx:355: in pandas._libs.tslibs.timestamps._Timestamp.__add__
2018-12-06T00:54:48.7703892Z     result = Timestamp(self.value + nanos,
2018-12-06T00:54:48.7704154Z pandas/_libs/tslibs/timestamps.pyx:736: in pandas._libs.tslibs.timestamps.Timestamp.__new__
2018-12-06T00:54:48.7704430Z     ts = convert_to_tsobject(ts_input, tz, unit, 0, 0, nanosecond or 0)
2018-12-06T00:54:48.7704699Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
2018-12-06T00:54:48.7704909Z 
2018-12-06T00:54:48.7705172Z >   obj.value = ts
2018-12-06T00:54:48.7705407Z E   OverflowError: Python int too large to convert to C long
2018-12-06T00:54:48.7705613Z 
2018-12-06T00:54:48.7705877Z pandas/_libs/tslibs/conversion.pyx:297: OverflowError
2018-12-06T00:54:48.7706477Z ---------------------------------- Hypothesis ----------------------------------
2018-12-06T00:54:48.7707486Z You can add @seed(316520601087970019200180394352582921839) to this test or run pytest with --hypothesis-seed=316520601087970019200180394352582921839 to reproduce this failure.
2018-12-06T00:54:48.8610169Z

Reproducing the error

The error stems from a function in pandas/tests/frame/test_apply.py

@composite
def indices(draw, max_length=5):
    date = draw(
        dates(
            min_value=Timestamp.min.ceil("D").to_pydatetime().date(),
            max_value=Timestamp.max.floor("D").to_pydatetime().date(),
        ).map(Timestamp)
    )
    periods = draw(integers(0, max_length))
    freq = draw(sampled_from(list("BDHTS")))
    dr = date_range(date, periods=periods, freq=freq)
    return pd.DatetimeIndex(list(dr))

This function above is used by hypothesis. It causes a failure when calling date_range when date = Timestamp.max.floor("D").to_pydatetime().date() and freq in {'B', 'D'}

For example:

>>> date = pd.Timestamp.max.floor("D").to_pydatetime().date()  # datetime.date(2262, 4, 11)
>>> freq = 'B'
>>> pd.date_range(date, periods=1, freq=freq)
OverflowError: int too big to convert
>>> freq = 'D'
>>> pd.date_range(date, periods=1, freq=freq)
OutOfBoundsDatetime: Cannot generate range with start=9223286400000000000 and periods=1

You'll notice the error types are different for the two cases. Presumably the first example should have returned a OutOfBoundsDatetime also.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Testingpandas testing functions or related to the test suite

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions