-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move normalization funcs up to conversion #18086
Conversation
Minor |
Codecov Report
@@ Coverage Diff @@
## master #18086 +/- ##
==========================================
- Coverage 91.27% 91.26% -0.02%
==========================================
Files 163 163
Lines 50120 50120
==========================================
- Hits 45749 45740 -9
- Misses 4371 4380 +9
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #18086 +/- ##
=========================================
Coverage ? 91.38%
=========================================
Files ? 163
Lines ? 50065
Branches ? 0
=========================================
Hits ? 45752
Misses ? 4313
Partials ? 0
Continue to review full report at Codecov.
|
@@ -782,3 +782,130 @@ cdef inline str _render_tstamp(int64_t val): | |||
""" Helper function to render exception messages""" | |||
from pandas._libs.tslib import Timestamp | |||
return str(Timestamp(val)) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason for the localization functions to exists here rather than in timezones.pyx?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really want to limit the points of contact with pandas_datetimestruct
and the ensuing dependencies.
BTW, once this and #18059 are closed we can cut/paste all of Timestamp into its own file. |
rebase |
@@ -355,26 +365,16 @@ cdef inline void _localize_tso(_TSObject obj, object tz): | |||
# static/pytz/dateutil specific code | |||
if is_fixed_offset(tz): | |||
# statictzinfo | |||
if len(deltas) > 0 and obj.value != NPY_NAT: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are eliminating the not len(deltas) case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. get_dst_info
never returns an empty array in the is_fixed_offset
case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More specifically: it always returns a length-1 array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm can u add an assert to that effect then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. As long as we're asserting things, I think it is the case that in _localize_tso the input obj
will always have obj.tzinfo is None
; can we make that official? Even in the docstring would be a win.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
asserts in internal code are a good thing
happy to add; they are just laying out the assumptions explicitly
don’t go crazy but a reader of the code will appreciate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great. That'll close out one of the tslibs-todo questions. The last one since we're on a role: at the moment all calls to tz_convert or tz_convert_single have at least one of tz1 or tz2 being utc. Will that always be the case? If so that'll close [ill have to look it up].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iiuc i think so
we never convert from a time zone to another one directly
always pass thru utc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent. I'll ping you on the appropriate bug-issue to close that out; don't want to derail this.
pandas/_libs/tslibs/conversion.pyx
Outdated
return dtstruct_to_dt64(dts) | ||
|
||
|
||
def dates_normalized(ndarray[int64_t] stamps, tz=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have date_normalized AND date_normalized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep! One is a "normal this date" and the other is "is this date normalized?" As usual I'm not averse to changing that, but prefer to do so separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just change it to is_date_normalized then
much more clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
pandas/_libs/tslib.pyx
Outdated
tz_convert_single) | ||
from tslibs.conversion import (tz_localize_to_utc, | ||
tz_convert_single, date_normalize) | ||
from tslibs.conversion import pydt_to_i8, tz_convert # noqa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason these are noqa? simply import these where needed
pandas/_libs/tslibs/conversion.pyx
Outdated
return dtstruct_to_dt64(dts) | ||
|
||
|
||
def are_dates_normalized(ndarray[int64_t] stamps, tz=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_dates_normalized
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plural
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it doesn’t follow our convention
technically this could be called
is_date_array_normalized
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
pandas/_libs/tslib.pyx
Outdated
|
||
return ts.value | ||
|
||
|
||
def i8_to_pydt(int64_t i8, object tzinfo=None): | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should prob move this one to conversion
i8_to_pydt to be with the corresponding inverse (and import back here if needed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving. AFAIK this isn't used anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok can clean that up later
pandas/_libs/tslibs/conversion.pyx
Outdated
""" | ||
Convert to int64 representation compatible with numpy datetime64; converts | ||
to UTC | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can type int64_t
pandas/_libs/tslibs/conversion.pyx
Outdated
@cython.wraparound(False) | ||
@cython.boundscheck(False) | ||
def date_normalize(ndarray[int64_t] stamps, tz=None): | ||
cdef: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can u add a doc string
pandas/_libs/tslibs/conversion.pyx
Outdated
@cython.wraparound(False) | ||
@cython.boundscheck(False) | ||
cdef _normalize_local(ndarray[int64_t] stamps, object tz): | ||
cdef: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
pandas/_libs/tslibs/conversion.pyx
Outdated
return dtstruct_to_dt64(dts) | ||
|
||
|
||
def are_dates_normalized(ndarray[int64_t] stamps, tz=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
pandas/_libs/tslibs/conversion.pyx
Outdated
|
||
def i8_to_pydt(int64_t i8, object tzinfo=None): | ||
""" | ||
Inverse of pydt_to_i8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add parameters and returns to the doc-strings
pandas/_libs/tslibs/conversion.pyx
Outdated
""" | ||
Inverse of pydt_to_i8 | ||
""" | ||
from pandas import Timestamp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this return a datetime.datetime
? (exactly, and not a Timestamp)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess to make the name more accurate. This is never used, only moved because you asked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is not used, then just remove
""" | ||
Normalize each of the (nanosecond) timestamps in the given array by | ||
rounding down to the beginning of the day (i.e. midnight). If `tz` | ||
is not None, then this is midnight for this timezone. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add parameters to doc-strring
Normalize each of the (nanosecond) timestamps in the given array by | ||
rounding down to the beginning of the day (i.e. midnight) for the | ||
given timezone `tz`. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
|
||
@cython.wraparound(False) | ||
@cython.boundscheck(False) | ||
cdef ndarray[int64_t] _normalize_local(ndarray[int64_t] stamps, object tz): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you changed the name? ok to de-privatize and rename to normalize_local_timestamps. I prefer more readable function names fyi.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't change the name, no.
return result | ||
|
||
|
||
cdef inline int64_t _normalized_stamp(pandas_datetimestruct *dts) nogil: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doc-string and de-privatize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is private. Not importable.
Check if all of the given (nanosecond) timestamps are normalized to | ||
midnight, i.e. hour == minute == second == 0. If the optional timezone | ||
`tz` is not None, then this is midnight for this timezone. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add parameters in doc-string
good name
On rebasing I noticed a bunch of old scripts removed. Nice! |
76866bd
to
e7456ce
Compare
…libs-conversion11
Just tried rebasing. It seemed to work locally, but pushing didn't trigger CI. So did "git pull upstream master" too. I don't think there's any problems, just a heads up. |
of course it does. you need to further you really really should squash / fixup these, no need for all of these commits |
@@ -1,3 +1,4 @@ | |||
# -*- coding: utf-8 -*- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't add extraneous things, ok for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a dummy commit to force the CI on a TestClipboard failure.
pandas/_libs/tslibs/conversion.pyx
Outdated
""" | ||
Inverse of pydt_to_i8 | ||
""" | ||
from pandas import Timestamp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is not used, then just remove
@@ -90,6 +89,35 @@ cdef class _TSObject: | |||
return self.value | |||
|
|||
|
|||
cpdef int64_t pydt_to_i8(object pydt) except? -1: | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like it has exactly 2 uses, I would much rather
Timestamp(...).value
and remove this routine (and remove the tests for it)
(pandas) bash-3.2$ grep -r pydt_to_i8 pandas/ --include '*.py' --include '*.pyx'
pandas//_libs/tslib.pyx:cpdef pydt_to_i8(object pydt):
pandas//_libs/tslib.pyx: Inverse of pydt_to_i8
pandas//_libs/tslibs/offsets.pyx:from pandas._libs.tslib import pydt_to_i8, monthrange
pandas//_libs/tslibs/offsets.pyx: i8 = pydt_to_i8(dt)
pandas//core/indexes/datetimes.py: return np.int64(libts.pydt_to_i8(key)).view(_NS_DTYPE)
pandas//tests/scalar/test_timestamp.py: assert tslib.pydt_to_i8(result) == expected
pandas//tests/scalar/test_timestamp.py: assert tslib.pydt_to_i8(result) == expected
pandas//tests/scalar/test_timestamp.py: assert tslib.pydt_to_i8(result) == expected_tz
pandas//tests/scalar/test_timestamp.py: assert tslib.pydt_to_i8(result) == expected_tz
pandas//tests/scalar/test_timestamp.py: assert tslib.pydt_to_i8(result) == expected_utc
pandas//tests/scalar/test_timestamp.py: assert tslib.pydt_to_i8(result) == expected
pandas//tests/scalar/test_timestamp.py: assert tslib.pydt_to_i8(result) == expected
pandas//tests/scalar/test_timestamp.py: assert tslib.pydt_to_i8(result) == expected_tz
pandas//tests/scalar/test_timestamp.py: assert tslib.pydt_to_i8(result) == expected_tz
pandas//tests/scalar/test_timestamp.py: assert tslib.pydt_to_i8(result) == expected_utc
Removed pydt_to_i8 per request. Holding off on removing pydt_to_i8 pending profiling the effects in offsets.pyx. |
typo! but ok. |
ping on green. |
thanks! |
We're going to need
date_normalize
upstream oftslib
before long, so this moves it to conversion.Simplifies repeated checking in
localize_tso
, closes #17944