Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POC/ENH: infer resolution in array_to_datetime #55741

Merged
merged 15 commits into from
Nov 15, 2023

Conversation

jbrockmendel
Copy link
Member

xref #55564 implements the relevant logic in array_to_datetime, does not change any user-facing behavior yet.

Have not yet done any profiling, but we have to expect a slowdown.

@@ -561,6 +600,24 @@ cpdef array_to_datetime(
else:
tz_offset = out_tzoffset_vals.pop()
tz_out = timezone(timedelta(seconds=tz_offset))

if infer_reso:
if state.creso_ever_changed:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what to do here but my first read of state.creso_ever_changed was confusion over the scope of ever; I am inferring from the way the function is currently written that ever refers to the lifetime of this function, but if we have parse states that can persist across multiple function calls that terminology can get a little vague

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we have parse states that can persist across multiple function calls

we do not

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to keep it as part of the state then instead of just local to the function?

To be clear not a hold up on this PR for me. Just a consideration point as this continues to evolve

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bc it gets set inside a state-updating method that i dont want to duplicate in a bunch of places. also there are a couple of different places where we use DatetimeParseState and im trying to iron out the kinks in behavior differences between them.

pandas/_libs/tslib.pyx Show resolved Hide resolved
@jbrockmendel
Copy link
Member Author

@mroeschke ok here? this goes a long way toward #55901

def test_infer_heterogeneous(self):
dtstr = "2023-10-27 18:03:05.678000"

arr = np.array([dtstr, dtstr[:-3], dtstr[:-7], None], dtype=object)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a follow up, it would be good to also add a test for different objects too e.g. np.array([dtstr, pydate, pydatetime, np.datetime64, int])

@mroeschke mroeschke added the Non-Nano datetime64/timedelta64 with non-nanosecond resolution label Nov 15, 2023
@mroeschke mroeschke added this to the 2.2 milestone Nov 15, 2023
@mroeschke mroeschke merged commit 800ae25 into pandas-dev:main Nov 15, 2023
44 checks passed
@mroeschke
Copy link
Member

Thanks @jbrockmendel

@jbrockmendel jbrockmendel deleted the enh-infer-array_to_datetime branch November 15, 2023 21:36
jbrockmendel added a commit to jbrockmendel/pandas that referenced this pull request Nov 15, 2023
* ENH: infer resolution in array_to_datetime

* post-merge fixup

* post-merge fixup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Non-Nano datetime64/timedelta64 with non-nanosecond resolution
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants