Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: support non-nanos Timedelta objects in Python C API #54682

Open
jorisvandenbossche opened this issue Aug 22, 2023 · 5 comments
Open

BUG: support non-nanos Timedelta objects in Python C API #54682

jorisvandenbossche opened this issue Aug 22, 2023 · 5 comments
Assignees
Labels
Bug Closing Candidate May be closeable, needs more eyeballs Non-Nano datetime64/timedelta64 with non-nanosecond resolution Timedelta Timedelta data type

Comments

@jorisvandenbossche
Copy link
Member

See apache/arrow#37291 for context.

In the Timedelta constructor, we have:

# For millisecond and second resos, we cannot actually pass int(value) because
# many cases would fall outside of the pytimedelta implementation bounds.
# We pass 0 instead, and override seconds, microseconds, days.
# In principle we could pass 0 for ns and us too.
if reso == NPY_FR_ns:
td_base = _Timedelta.__new__(cls, microseconds=int(value) // 1000)
elif reso == NPY_DATETIMEUNIT.NPY_FR_us:
td_base = _Timedelta.__new__(cls, microseconds=int(value))
elif reso == NPY_DATETIMEUNIT.NPY_FR_ms:
td_base = _Timedelta.__new__(cls, milliseconds=0)
elif reso == NPY_DATETIMEUNIT.NPY_FR_s:
td_base = _Timedelta.__new__(cls, seconds=0)

and then the relevant note in the seconds attribute:

# NB: using the python C-API PyDateTime_DELTA_GET_SECONDS will fail
# (or be incorrect)

So it is a known issue that those C API functions no longer will work with Timedeltas of second or millisecond resolution.

Pyarrow is exactly using those C APIs, which are now silently incorrect for second and millisecond resolution Timedeltas. While pyarrow should certainly adapt to properly support pandas.Timedelta (especially for cases outside the range of datetime.timedelta), I think it might be safer for pandas to ensure those C APIs keep working (I think pyarrow will certainly not be the only one using those).

It seems possible to me to at least set those for the cases where they are within the range of datetime.timedelta (which will be by far the large majority of cases, since it goes up to 999999999 days, which is more than 2 million years if I did the math correctly), at the cost of making the constructor a bit more costly.

cc @jbrockmendel

@jbrockmendel
Copy link
Member

at the cost of making the constructor a bit more costly

No objection assuming this costliness (and added code complexity) is limited.

@BenjaminHelyer
Copy link

Able to reproduce the pyarrow bug & see what you mean in the Pandas code. I'll take a shot at fixing this unless you're already working on it @jorisvandenbossche .

@BenjaminHelyer
Copy link

take

@BenjaminHelyer
Copy link

Confirmed that the C-API functions themselves are incorrect for a subset of instances (as the comment in the code states), done by directly calling these functions on some interesting cases. This further demonstrates that this is Pandas side behavior with those C-API functions, since the C-API functions show the same bug as pyarrow.

Fix TBA.

@BenjaminHelyer
Copy link

Per the discussion in #55213 , it seems its best to deem this fix out of scope for Pandas, given that the ask isn't possible for the general case. @jorisvandenbossche , if you agree, could we close this issue?

@jbrockmendel jbrockmendel added the Closing Candidate May be closeable, needs more eyeballs label Dec 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Closing Candidate May be closeable, needs more eyeballs Non-Nano datetime64/timedelta64 with non-nanosecond resolution Timedelta Timedelta data type
Projects
None yet
3 participants