Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move normalization funcs up to conversion #18086

Merged
merged 16 commits into from
Nov 12, 2017

Conversation

jbrockmendel
Copy link
Member

We're going to need date_normalize upstream of tslib before long, so this moves it to conversion.

Simplifies repeated checking in localize_tso, closes #17944

@gfyoung gfyoung added the Clean label Nov 3, 2017
@gfyoung
Copy link
Member

gfyoung commented Nov 3, 2017

Minor flake8 error. Otherwise, seems reasonable to me.

@codecov
Copy link

codecov bot commented Nov 3, 2017

Codecov Report

Merging #18086 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18086      +/-   ##
==========================================
- Coverage   91.27%   91.26%   -0.02%     
==========================================
  Files         163      163              
  Lines       50120    50120              
==========================================
- Hits        45749    45740       -9     
- Misses       4371     4380       +9
Flag Coverage Δ
#multiple 89.07% <ø> (ø) ⬆️
#single 40.32% <ø> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.75% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b4375bd...af55349. Read the comment docs.

@codecov
Copy link

codecov bot commented Nov 3, 2017

Codecov Report

❗ No coverage uploaded for pull request base (master@cd80f08). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #18086   +/-   ##
=========================================
  Coverage          ?   91.38%           
=========================================
  Files             ?      163           
  Lines             ?    50065           
  Branches          ?        0           
=========================================
  Hits              ?    45752           
  Misses            ?     4313           
  Partials          ?        0
Flag Coverage Δ
#multiple 89.19% <100%> (?)
#single 40.36% <50%> (?)
Impacted Files Coverage Δ
pandas/core/indexes/datetimes.py 95.48% <100%> (ø)
pandas/tseries/frequencies.py 96.01% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cd80f08...3fa05bd. Read the comment docs.

@@ -782,3 +782,130 @@ cdef inline str _render_tstamp(int64_t val):
""" Helper function to render exception messages"""
from pandas._libs.tslib import Timestamp
return str(Timestamp(val))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason for the localization functions to exists here rather than in timezones.pyx?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really want to limit the points of contact with pandas_datetimestruct and the ensuing dependencies.

@jbrockmendel
Copy link
Member Author

BTW, once this and #18059 are closed we can cut/paste all of Timestamp into its own file.

@jreback
Copy link
Contributor

jreback commented Nov 7, 2017

rebase

@@ -355,26 +365,16 @@ cdef inline void _localize_tso(_TSObject obj, object tz):
# static/pytz/dateutil specific code
if is_fixed_offset(tz):
# statictzinfo
if len(deltas) > 0 and obj.value != NPY_NAT:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are eliminating the not len(deltas) case

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. get_dst_info never returns an empty array in the is_fixed_offset case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More specifically: it always returns a length-1 array.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm can u add an assert to that effect then

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. As long as we're asserting things, I think it is the case that in _localize_tso the input obj will always have obj.tzinfo is None; can we make that official? Even in the docstring would be a win.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asserts in internal code are a good thing
happy to add; they are just laying out the assumptions explicitly
don’t go crazy but a reader of the code will appreciate

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. That'll close out one of the tslibs-todo questions. The last one since we're on a role: at the moment all calls to tz_convert or tz_convert_single have at least one of tz1 or tz2 being utc. Will that always be the case? If so that'll close [ill have to look it up].

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iiuc i think so
we never convert from a time zone to another one directly
always pass thru utc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent. I'll ping you on the appropriate bug-issue to close that out; don't want to derail this.

return dtstruct_to_dt64(dts)


def dates_normalized(ndarray[int64_t] stamps, tz=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have date_normalized AND date_normalized?

Copy link
Member Author

@jbrockmendel jbrockmendel Nov 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! One is a "normal this date" and the other is "is this date normalized?" As usual I'm not averse to changing that, but prefer to do so separately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just change it to is_date_normalized then
much more clear

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

tz_convert_single)
from tslibs.conversion import (tz_localize_to_utc,
tz_convert_single, date_normalize)
from tslibs.conversion import pydt_to_i8, tz_convert # noqa
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason these are noqa? simply import these where needed

return dtstruct_to_dt64(dts)


def are_dates_normalized(ndarray[int64_t] stamps, tz=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_dates_normalized

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plural

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn’t follow our convention

technically this could be called
is_date_array_normalized

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same


return ts.value


def i8_to_pydt(int64_t i8, object tzinfo=None):
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should prob move this one to conversion

i8_to_pydt to be with the corresponding inverse (and import back here if needed)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving. AFAIK this isn't used anywhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok can clean that up later

"""
Convert to int64 representation compatible with numpy datetime64; converts
to UTC
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can type int64_t

@cython.wraparound(False)
@cython.boundscheck(False)
def date_normalize(ndarray[int64_t] stamps, tz=None):
cdef:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u add a doc string

@cython.wraparound(False)
@cython.boundscheck(False)
cdef _normalize_local(ndarray[int64_t] stamps, object tz):
cdef:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

return dtstruct_to_dt64(dts)


def are_dates_normalized(ndarray[int64_t] stamps, tz=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same


def i8_to_pydt(int64_t i8, object tzinfo=None):
"""
Inverse of pydt_to_i8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add parameters and returns to the doc-strings

"""
Inverse of pydt_to_i8
"""
from pandas import Timestamp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this return a datetime.datetime? (exactly, and not a Timestamp)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess to make the name more accurate. This is never used, only moved because you asked.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is not used, then just remove

"""
Normalize each of the (nanosecond) timestamps in the given array by
rounding down to the beginning of the day (i.e. midnight). If `tz`
is not None, then this is midnight for this timezone.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add parameters to doc-strring

Normalize each of the (nanosecond) timestamps in the given array by
rounding down to the beginning of the day (i.e. midnight) for the
given timezone `tz`.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same


@cython.wraparound(False)
@cython.boundscheck(False)
cdef ndarray[int64_t] _normalize_local(ndarray[int64_t] stamps, object tz):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you changed the name? ok to de-privatize and rename to normalize_local_timestamps. I prefer more readable function names fyi.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't change the name, no.

return result


cdef inline int64_t _normalized_stamp(pandas_datetimestruct *dts) nogil:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc-string and de-privatize

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is private. Not importable.

Check if all of the given (nanosecond) timestamps are normalized to
midnight, i.e. hour == minute == second == 0. If the optional timezone
`tz` is not None, then this is midnight for this timezone.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add parameters in doc-string

good name

@jbrockmendel
Copy link
Member Author

On rebasing I noticed a bunch of old scripts removed. Nice!

@jbrockmendel jbrockmendel mentioned this pull request Nov 11, 2017
4 tasks
@jbrockmendel
Copy link
Member Author

Just tried rebasing. It seemed to work locally, but pushing didn't trigger CI. So did "git pull upstream master" too. I don't think there's any problems, just a heads up.

@jreback
Copy link
Contributor

jreback commented Nov 12, 2017

Just tried rebasing. It seemed to work locally, but pushing didn't trigger CI. So did "git pull upstream master" too. I don't think there's any problems, just a heads up.

of course it does. you need to git push yourbranch yourupstream -f

further you really really should squash / fixup these, no need for all of these commits
git rebase -i yourupstream

@@ -1,3 +1,4 @@
# -*- coding: utf-8 -*-
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't add extraneous things, ok for now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a dummy commit to force the CI on a TestClipboard failure.

"""
Inverse of pydt_to_i8
"""
from pandas import Timestamp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is not used, then just remove

@@ -90,6 +89,35 @@ cdef class _TSObject:
return self.value


cpdef int64_t pydt_to_i8(object pydt) except? -1:
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like it has exactly 2 uses, I would much rather
Timestamp(...).value and remove this routine (and remove the tests for it)

(pandas) bash-3.2$ grep -r pydt_to_i8 pandas/ --include '*.py' --include '*.pyx'
pandas//_libs/tslib.pyx:cpdef pydt_to_i8(object pydt):
pandas//_libs/tslib.pyx:    Inverse of pydt_to_i8
pandas//_libs/tslibs/offsets.pyx:from pandas._libs.tslib import pydt_to_i8, monthrange
pandas//_libs/tslibs/offsets.pyx:        i8 = pydt_to_i8(dt)
pandas//core/indexes/datetimes.py:    return np.int64(libts.pydt_to_i8(key)).view(_NS_DTYPE)
pandas//tests/scalar/test_timestamp.py:                assert tslib.pydt_to_i8(result) == expected
pandas//tests/scalar/test_timestamp.py:                assert tslib.pydt_to_i8(result) == expected
pandas//tests/scalar/test_timestamp.py:                    assert tslib.pydt_to_i8(result) == expected_tz
pandas//tests/scalar/test_timestamp.py:                    assert tslib.pydt_to_i8(result) == expected_tz
pandas//tests/scalar/test_timestamp.py:                    assert tslib.pydt_to_i8(result) == expected_utc
pandas//tests/scalar/test_timestamp.py:                assert tslib.pydt_to_i8(result) == expected
pandas//tests/scalar/test_timestamp.py:                assert tslib.pydt_to_i8(result) == expected
pandas//tests/scalar/test_timestamp.py:                assert tslib.pydt_to_i8(result) == expected_tz
pandas//tests/scalar/test_timestamp.py:                assert tslib.pydt_to_i8(result) == expected_tz
pandas//tests/scalar/test_timestamp.py:                assert tslib.pydt_to_i8(result) == expected_utc

@jbrockmendel
Copy link
Member Author

Removed pydt_to_i8 per request. Holding off on removing pydt_to_i8 pending profiling the effects in offsets.pyx.

@jreback
Copy link
Contributor

jreback commented Nov 12, 2017

Removed pydt_to_i8 per request. Holding off on removing pydt_to_i8 pending profiling the effects in offsets.pyx.

typo! but ok.

@jreback jreback added this to the 0.22.0 milestone Nov 12, 2017
@jreback
Copy link
Contributor

jreback commented Nov 12, 2017

ping on green.

@jreback
Copy link
Contributor

jreback commented Nov 12, 2017

thanks!

@jreback jreback merged commit 9e3ad63 into pandas-dev:master Nov 12, 2017
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
@jbrockmendel jbrockmendel deleted the tslibs-conversion11 branch December 8, 2017 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

localize_tso unnecessary calls?
3 participants