-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: Datetime/Timestamp.normalize for timezone naive datetimes #23634
Conversation
Hello @mroeschke! Thanks for submitting the PR.
|
if tz is not None: | ||
tz = maybe_get_tz(tz) | ||
result = _normalize_local(stamps, tz) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this case never reached?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. This case (the naive case) is handled in these two places now:
https://github.com/pandas-dev/pandas/pull/23634/files#diff-231ac35d2116a12844a7cfed02730580R1289
https://github.com/pandas-dev/pandas/pull/23634/files#diff-960da60b33d858481ad8799b0e84764bR835
Codecov Report
@@ Coverage Diff @@
## master #23634 +/- ##
==========================================
+ Coverage 92.23% 92.23% +<.01%
==========================================
Files 161 161
Lines 51408 51414 +6
==========================================
+ Hits 47416 47422 +6
Misses 3992 3992
Continue to review full report at Codecov.
|
pandas/_libs/tslibs/timestamps.pyx
Outdated
@@ -40,6 +40,7 @@ from timezones cimport ( | |||
# Constants | |||
_zero_time = datetime_time(0, 0) | |||
_no_input = object() | |||
cdef int64_t DAY_NS = 86400000000000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have DAY_NS defined in lots of places, can you move to 1
(bamboo-dev) jreback@dev:~/pandas-dev$ grep -r 86400 pandas/_libs/ --include '*.pyx'
pandas/_libs/tslibs/period.pyx: {1, 24, 1440, 86400, 86400000, 86400000000, 86400000000000},
pandas/_libs/tslibs/period.pyx: seconds = unix_date * 86400 + dts.hour * 3600 + dts.min * 60 + dts.sec
pandas/_libs/tslibs/period.pyx: abstime += 86400
pandas/_libs/tslibs/period.pyx: while abstime >= 86400:
pandas/_libs/tslibs/period.pyx: abstime -= 86400
pandas/_libs/tslibs/period.pyx: # abstime >= 0.0 and abstime <= 86400
pandas/_libs/tslibs/conversion.pyx:cdef int64_t DAY_NS = 86400000000000LL
pandas/_libs/tslibs/timedeltas.pyx:cdef int64_t DAY_NS = 86400000000000LL
pandas/_libs/tslibs/timedeltas.pyx: m = 1000000000L * 86400 * 7
pandas/_libs/tslibs/timedeltas.pyx: m = 1000000000L * 86400
pandas/_libs/tslibs/timedeltas.pyx: 86400000000042
pandas/_libs/tslibs/fields.pyx: micros = np.mod(dtindex, 86400000000000, dtype=np.int64) // 1000LL
pandas/_libs/tslibs/src/datetime/np_datetime.c: npy_int64 DAY_NS = 86400000000000LL;
prob should be in np_datetime.pyx (and import from there)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move to the same place you have DAY_SECONDS
# -------------------------------------------------------------- | ||
# Timestamp.normalize | ||
|
||
@pytest.mark.parametrize('arg', ['2013-11-30', '2013-11-30 12:00:00']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a normalize_nat test as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't define normalize
for NaT
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could have one for Timstamp mirroring (another issue). Probably would just return NaT
@jreback gathered some of the |
pandas/_libs/tslibs/conversion.pyx
Outdated
@@ -41,7 +42,6 @@ from nattype cimport NPY_NAT, checknull_with_nat | |||
# ---------------------------------------------------------------------- | |||
# Constants | |||
|
|||
cdef int64_t DAY_NS = 86400000000000LL | |||
cdef int64_t HOURS_NS = 3600000000000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should prob move this one too (future ok)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move this one as well
pandas/_libs/tslibs/conversion.pyx
Outdated
@@ -22,7 +22,8 @@ from np_datetime cimport (check_dts_bounds, | |||
npy_datetime, | |||
dt64_to_dtstruct, dtstruct_to_dt64, | |||
get_datetime64_unit, get_datetime64_value, | |||
pydatetime_to_dt64, NPY_DATETIMEUNIT, NPY_FR_ns) | |||
pydatetime_to_dt64, NPY_DATETIMEUNIT, NPY_FR_ns, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is DAY_S, you mean DAY_NS right? let's write out these constants.
pandas/_libs/tslibs/np_datetime.pyx
Outdated
# ---------------------------------------------------------------------- | ||
# time constants | ||
|
||
cdef int64_t DAY_S = 86400 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's write this out to DAY_SECONDS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the place for these may be ccalendar
pandas/_libs/tslibs/timestamps.pyx
Outdated
@@ -40,6 +40,7 @@ from timezones cimport ( | |||
# Constants | |||
_zero_time = datetime_time(0, 0) | |||
_no_input = object() | |||
cdef int64_t DAY_NS = 86400000000000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move to the same place you have DAY_SECONDS
pandas/core/arrays/datetimes.py
Outdated
@@ -832,7 +832,14 @@ def normalize(self): | |||
'2014-08-01 00:00:00+05:30'], | |||
dtype='datetime64[ns, Asia/Calcutta]', freq=None) | |||
""" | |||
new_values = conversion.normalize_i8_timestamps(self.asi8, self.tz) | |||
if self.tz is None: | |||
not_null = self.notnull() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be notna
? (does DatetimeArray
even have notna or notnull?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It (DatetimeIndex
) apparently has notnull
, but not sure if i should be using notna
or notnull
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use notna
All green after some flaky azure tests. |
pandas/_libs/tslibs/conversion.pyx
Outdated
@@ -41,7 +42,6 @@ from nattype cimport NPY_NAT, checknull_with_nat | |||
# ---------------------------------------------------------------------- | |||
# Constants | |||
|
|||
cdef int64_t DAY_NS = 86400000000000LL | |||
cdef int64_t HOURS_NS = 3600000000000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move this one as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tiny change. ping on green.
pandas/core/arrays/datetimes.py
Outdated
@@ -832,7 +832,14 @@ def normalize(self): | |||
'2014-08-01 00:00:00+05:30'], | |||
dtype='datetime64[ns, Asia/Calcutta]', freq=None) | |||
""" | |||
new_values = conversion.normalize_i8_timestamps(self.asi8, self.tz) | |||
if self.tz is None: | |||
not_null = self.notnull() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use notna
Ping all green @jreback |
thanks @mroeschke |
…fixed * upstream/master: (46 commits) DEPS: bump xlrd min version to 1.0.0 (pandas-dev#23774) BUG: Don't warn if default conflicts with dialect (pandas-dev#23775) BUG: Fixing memory leaks in read_csv (pandas-dev#23072) TST: Extend datetime64 arith tests to array classes, fix several broken cases (pandas-dev#23771) STYLE: Specify bare exceptions in pandas/tests (pandas-dev#23370) ENH: between_time, at_time accept axis parameter (pandas-dev#21799) PERF: Use is_utc check to improve performance of dateutil UTC in DatetimeIndex methods (pandas-dev#23772) CLN: io/formats/html.py: refactor (pandas-dev#22726) API: Make Categorical.searchsorted returns a scalar when supplied a scalar (pandas-dev#23466) TST: Add test case for GH14080 for overflow exception (pandas-dev#23762) BUG: Don't extract header names if none specified (pandas-dev#23703) BUG: Index.str.partition not nan-safe (pandas-dev#23558) (pandas-dev#23618) DEPR: tz_convert in the Timestamp constructor (pandas-dev#23621) PERF: Datetime/Timestamp.normalize for timezone naive datetimes (pandas-dev#23634) TST: Use new arithmetic fixtures, parametrize many more tests (pandas-dev#23757) REF/TST: Add more pytest idiom to parsers tests (pandas-dev#23761) DOC: Add ignore-deprecate argument to validate_docstrings.py (pandas-dev#23650) ENH: update pandas-gbq to 0.8.0, adds credentials arg (pandas-dev#23662) DOC: Improve error message to show correct order (pandas-dev#23652) ENH: Improve error message for empty object array (pandas-dev#23718) ...
git diff upstream/master -u -- "*.py" | flake8 --diff