Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: can't round-trip non-nano Timestamp #51060

Open
3 tasks done
MarcoGorelli opened this issue Jan 30, 2023 · 9 comments
Open
3 tasks done

BUG: can't round-trip non-nano Timestamp #51060

MarcoGorelli opened this issue Jan 30, 2023 · 9 comments
Labels
Bug Non-Nano datetime64/timedelta64 with non-nanosecond resolution

Comments

@MarcoGorelli
Copy link
Member

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

In [5]: ts = Timestamp('1900-01-01')

In [6]: Timestamp(ts.value, unit=ts.unit)
Out[6]: Timestamp('1900-01-01 00:00:00')

In [7]: ts = Timestamp('0300-01-01')

In [8]: Timestamp(ts.value, unit=ts.unit)
---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
File ~/pandas-dev/pandas/_libs/tslibs/conversion.pyx:141, in pandas._libs.tslibs.conversion.cast_from_unit()
    140 try:
--> 141     return <int64_t>(base * m) + <int64_t>(frac * m)
    142 except OverflowError as err:

OverflowError: Python int too large to convert to C long

The above exception was the direct cause of the following exception:

OutOfBoundsDatetime                       Traceback (most recent call last)
Cell In[8], line 1
----> 1 Timestamp(ts.value, unit=ts.unit)

File ~/pandas-dev/pandas/_libs/tslibs/timestamps.pyx:1637, in pandas._libs.tslibs.timestamps.Timestamp.__new__()
   1635     raise ValueError("nanosecond must be in 0..999")
   1636 
-> 1637 ts = convert_to_tsobject(ts_input, tzobj, unit, 0, 0, nanosecond)
   1638 
   1639 if ts.value == NPY_NAT:

File ~/pandas-dev/pandas/_libs/tslibs/conversion.pyx:300, in pandas._libs.tslibs.conversion.convert_to_tsobject()
    298     obj.value = NPY_NAT
    299 else:
--> 300     ts = cast_from_unit(ts, unit)
    301     obj.value = ts
    302     pandas_datetime_to_datetimestruct(ts, NPY_FR_ns, &obj.dts)

File ~/pandas-dev/pandas/_libs/tslibs/conversion.pyx:143, in pandas._libs.tslibs.conversion.cast_from_unit()
    141     return <int64_t>(base * m) + <int64_t>(frac * m)
    142 except OverflowError as err:
--> 143     raise OutOfBoundsDatetime(
    144         f"cannot convert input {ts} with the unit '{unit}'"
    145     ) from err

OutOfBoundsDatetime: cannot convert input -52700112000 with the unit 's'

Issue Description

The above throws an error when the timestamp is out-of-nano-bounds

Noticed while trying to address #51024, in

pandas/pandas/core/resample.py

Lines 2160 to 2161 in a82b8e2

fresult = Timestamp(fresult_int)
lresult = Timestamp(lresult_int)

Expected Behavior

Timestamp('0300-01-01')

Installed Versions

INSTALLED VERSIONS

commit : d6afc86
python : 3.8.16.final.0
python-bits : 64
OS : Linux
OS-release : 5.10.102.1-microsoft-standard-WSL2
Version : #1 SMP Wed Mar 2 00:30:59 UTC 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8

pandas : 2.0.0.dev0+1340.gd6afc862a7
numpy : 1.23.5
pytz : 2022.7.1
dateutil : 2.8.2
setuptools : 66.1.1
pip : 22.3.1
Cython : 0.29.32
pytest : 7.2.1
hypothesis : 6.64.0
sphinx : 5.3.0
blosc : 1.11.1
feather : None
xlsxwriter : 3.0.7
lxml.etree : 4.9.2
html5lib : 1.1
pymysql : 1.0.2
psycopg2 : 2.9.5
jinja2 : 3.1.2
IPython : 8.8.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : 1.3.6
brotli :
fastparquet : 2023.1.0
fsspec : 2022.11.0
gcsfs : 2022.11.0
matplotlib : 3.6.3
numba : 0.56.4
numexpr : 2.8.4
odfpy : None
openpyxl : 3.0.10
pandas_gbq : None
pyarrow : 10.0.1
pyreadstat : 1.2.0
pyxlsb : 1.0.10
s3fs : 2022.11.0
scipy : 1.10.0
snappy :
sqlalchemy : 1.4.45
tables : 3.8.0
tabulate : 0.9.0
xarray : 2023.1.0
xlrd : 2.0.1
zstandard : 0.19.0
tzdata : 2022.7
qtpy : None
pyqt5 : None
None

@MarcoGorelli MarcoGorelli added Bug Needs Triage Issue that has not been reviewed by a pandas team member Non-Nano datetime64/timedelta64 with non-nanosecond resolution and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 30, 2023
@jbrockmendel
Copy link
Member

I think it would make sense to infer a Timestamp reso matching "unit" in this case

@MarcoGorelli
Copy link
Member Author

The issue is that when we get here

ts = convert_to_tsobject(ts_input, tzobj, unit, 0, 0, nanosecond)

then we have:

  • ts_input: -52700112000
  • unit: 's'

Then, we get to

elif is_integer_object(ts):
try:
ts = <int64_t>ts
except OverflowError:
# GH#26651 re-raise as OutOfBoundsDatetime
raise OutOfBoundsDatetime(f"Out of bounds nanosecond timestamp {ts}")
if ts == NPY_NAT:
obj.value = NPY_NAT
else:
ts = cast_from_unit(ts, unit)
obj.value = ts
pandas_datetime_to_datetimestruct(ts, NPY_FR_ns, &obj.dts)

and go to cast_from_unit, which tries to convert -52700112000 to nanoseconds. Problem is, that's out of bounds for i64

What do you think the solution is? That cast_from_unit not return nanoseconds, but rather the offset from 1970-01-01 in the resolution closest to unit?

@jbrockmendel
Copy link
Member

id start with something like

in_reso = abbrev_to_npy_unit(unit)
out_reso = get_supported_reso(in_reso)
value = convert_reso(ts, in_reso, out_reso)

would take some more effort to get this working for floats

@MarcoGorelli
Copy link
Member Author

thanks, that works

looks like I need to set unit though, otherwise Timestamp(20130101) would throw

Traceback (most recent call last):
  File "/home/marcogorelli/pandas-dev/t.py", line 19, in <module>
    Timestamp(20130101)
  File "pandas/_libs/tslibs/timestamps.pyx", line 1637, in pandas._libs.tslibs.timestamps.Timestamp.__new__
    ts = convert_to_tsobject(ts_input, tzobj, unit, 0, 0, nanosecond)
  File "pandas/_libs/tslibs/conversion.pyx", line 302, in pandas._libs.tslibs.conversion.convert_to_tsobject
    value = convert_reso(ts, in_reso, out_reso, False)
  File "pandas/_libs/tslibs/np_datetime.pyx", line 590, in pandas._libs.tslibs.np_datetime.convert_reso
    mult = get_conversion_factor(to_reso, from_reso)
  File "pandas/_libs/tslibs/np_datetime.pyx", line 547, in pandas._libs.tslibs.np_datetime.get_conversion_factor
    raise ValueError("unit-less resolutions are not supported")
ValueError: unit-less resolutions are not supported

would in_reso = abbrev_to_npy_unit(unit or 'ns') be OK?

@jbrockmendel
Copy link
Member

would in_reso = abbrev_to_npy_unit(unit or 'ns') be OK?

I think so, yes.

@MarcoGorelli
Copy link
Member Author

would take some more effort to get this working for floats

How do you think this could/should work?

For ints, unit ends up determining the resolution.

But for floats, a more finer resolution might be required - for example, currently, we have

In [15]: Timestamp(1.5, unit='s')
Out[15]: Timestamp('1970-01-01 00:00:01.500000')

but if the resolution were inferred from the unit, we would get

In [13]: Timestamp(1.5, unit='s')
Out[13]: Timestamp('1970-01-01 00:00:01')

And maybe that's OK? Or should non-round floats be disallowed here?

@jbrockmendel
Copy link
Member

If we deprecate allowing non-round floats here (and presumably also in to_datetime etc) is there a nice-ish alternative we can suggest? If so I'd be OK with that; I doubt it is used all that often.

The not-angering-anybody solution would be to add a separate keyword like out_unit but that gets ugly.

@jbrockmendel
Copy link
Member

im playing catch-up on non-nano issues. did we decide anything here?

@MarcoGorelli
Copy link
Member Author

MarcoGorelli commented Mar 25, 2023

I don't think there's any decision here, no

is there a nice-ish alternative we can suggest?

all I can think of is to suggest that they add a timedelta with their desired offset?
I think it's quite ambiguous what 1.5, unit='s' exactly means anyway, and I'm not sure it should be supported

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Non-Nano datetime64/timedelta64 with non-nanosecond resolution
Projects
None yet
Development

No branches or pull requests

2 participants