Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH/BUG: pd.date_range() still defaults to nanosecond resolution #59031

Open
jorisvandenbossche opened this issue Jun 17, 2024 · 2 comments
Open
Labels
Non-Nano datetime64/timedelta64 with non-nanosecond resolution
Milestone

Comments

@jorisvandenbossche
Copy link
Member

After #55901, to_datetime with strings will now infer the resolution from the data, but the related pd.date_range to create datetime data still returns nanoseconds:

In [5]: pd.date_range("2012-01-01", periods=3, freq="1min")
Out[5]: 
DatetimeIndex(['2012-01-01 00:00:00', '2012-01-01 00:01:00',
               '2012-01-01 00:02:00'],
              dtype='datetime64[ns]', freq='min')

In [6]: pd.to_datetime(['2012-01-01 00:00:00', '2012-01-01 00:01:00', '2012-01-01 00:02:00'])
Out[6]: 
DatetimeIndex(['2012-01-01 00:00:00', '2012-01-01 00:01:00',
               '2012-01-01 00:02:00'],
              dtype='datetime64[s]', freq=None)

Should we update pd.date_range as well to infer the resulting resolution from the start/stop timestamp and freq ?

(I encountered this inconsistency in the pyarrow tests, where we essentially were using both idioms to create a result and expected data, but so that started failing because of a different dtype. I also opened #58989 for that, but regardless of a possible default resolution, pd.date_range would still need to follow that as well)

@jorisvandenbossche jorisvandenbossche added the Non-Nano datetime64/timedelta64 with non-nanosecond resolution label Jun 17, 2024
@jorisvandenbossche jorisvandenbossche added this to the 3.0 milestone Jun 17, 2024
@tilovashahrin
Copy link
Contributor

Hi @jorisvandenbossche I made a few additions to the date_range function. Let me know what you think!

@ankit-dhokariya
Copy link
Contributor

Hi @jorisvandenbossche , @jbrockmendel . I was trying to work on this issue and had a question about how would defaulting to 'second' resolution impact the OutOfBoundsDatetime exceptions that is raised in pandas/pandas/core/arrays
/_ranges.py

When the resolution is default set to second the stride value is reduced by a factor 10**9 and affects the calculation on

addend = np.uint64(periods) * np.uint64(np.abs(stride))

and no exceptions are raised.

Current functionality:

>>> pd.date_range("1969-05-04", periods=200000000, freq="30000D")
Traceback (most recent call last):
  File "/home/ronny/github/pandas/pandas/core/arrays/_ranges.py", line 127, in _generate_range_overflow_safe
    addend = np.uint64(periods) * np.uint64(np.abs(stride))
FloatingPointError: overflow encountered in scalar multiply

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ronny/github/pandas/pandas/core/indexes/datetimes.py", line 1007, in date_range
    dtarr = DatetimeArray._generate_range(
  File "/home/ronny/github/pandas/pandas/core/arrays/datetimes.py", line 470, in _generate_range
    i8values = generate_regular_range(start, end, periods, freq, unit=unit)
  File "/home/ronny/github/pandas/pandas/core/arrays/_ranges.py", line 78, in generate_regular_range
    e = _generate_range_overflow_safe(b, periods, stride, side="start")
  File "/home/ronny/github/pandas/pandas/core/arrays/_ranges.py", line 129, in _generate_range_overflow_safe
    raise OutOfBoundsDatetime(msg) from err
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Cannot generate range with start=-20908800000000000 and periods=200000000

When default resolution set to second:

>>> pd.date_range("1969-05-04", periods=200000000, freq="30000D")
DatetimeIndex([       '1969-05-04',        '2051-06-23',        '2133-08-12',
                      '2215-10-02',        '2297-11-20',        '2380-01-10',
                      '2462-02-28',        '2544-04-19',        '2626-06-09',
                      '2708-07-29',
               ...
               '16427443189-11-24', '16427443272-01-13', '16427443354-03-04',
               '16427443436-04-23', '16427443518-06-13', '16427443600-08-01',
               '16427443682-09-20', '16427443764-11-09', '16427443846-12-30',
               '16427443929-02-18'],
              dtype='datetime64[s]', length=200000000, freq='30000D')

Is is expected that the bounds increase after defaulting to second resolution? Or the functionality that currently exists needs to be retained?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Non-Nano datetime64/timedelta64 with non-nanosecond resolution
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants