-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tslibs resolution #18034
Tslibs resolution #18034
Conversation
@jbrockmendel : I presume this is largely copy+paste? Also, a quick |
Yes.
Will do. |
Codecov Report
@@ Coverage Diff @@
## master #18034 +/- ##
==========================================
- Coverage 91.43% 91.36% -0.08%
==========================================
Files 164 164
Lines 50090 49869 -221
==========================================
- Hits 45798 45561 -237
- Misses 4292 4308 +16
Continue to review full report at Codecov.
|
|
your benchmarks look odd |
No argument there. I'll rebase and re-run. |
…libs-resolution
|
Clipboard again, is there a game plan for this error? |
pandas/tseries/frequencies.py
Outdated
from pandas._libs.tslib import Timedelta | ||
from pandas._libs.tslibs.frequencies import ( # noqa | ||
get_freq_code, _base_and_stride, _period_str_to_code, | ||
_INVALID_FREQ_ERROR, opattern, _lite_rule_alias, _dont_uppercase, | ||
_period_code_map, _reverse_period_code_map) | ||
from pandas._libs.tslibs.resolution import ( # noqa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the noqa?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's public-facing so I didn't want to break anything downstream. Remove unused imports?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, what exactly is public facing of these? I am ok with simply removing anything _leading, what is not here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the noqa is for DAYS
, _maybe_add_count
, _weekday_rule_aliases
, get_freq_group
. I think the private ones here can be removed without breaking anything. DAYS
and get_freq_group
are imported by and a couple other places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok let's remove any non-used, you can put the non-private ones in a whatsnew note
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just pushed. Wasn't sure about wording in whatsnew note for the list DAYS
.
return (freq // 1000) * 1000 | ||
|
||
|
||
class Resolution(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is because this is not a cdef class; make a _Resolution class then provide the public interface via Resolution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alternately you can just move the constants here. note should this class be a Singleton pattern? do we even instantiate it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note should this class be a Singleton pattern?
Haven't given it any thought. Maybe the idea was to make it easily subclassable?
do we even instantiate it?
Nope, looks like everywhere it shows up its just Resolution.get_freq_group(...)
or similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok so it was for grouping things / sharing code.
maybe add to the list to either make this a Singleton, or better yet use instances of these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
pandas/_libs/tslibs/resolution.pyx
Outdated
# Frequency Inference | ||
|
||
|
||
def unique_deltas(ndarray[int64_t] arr): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this actually used elsewhere? if so should fix that, this should be a cdef method of _FrequencyInferer (or at the very lease cdef only)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, just in resolution. Will cdef for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related: there's an import of pandas.core.algorithms.unique
. Do you know if there is a cython version of that we can use instead? The uses are all for fairly small integer indexes:
if len(unique(self.fields['M'])) > 1:
weekdays = unique(self.index.weekday)
week_of_months = unique((self.index.day - 1) // 7)
In fact, if these are all being called on Index
objects, then the Int64Index.unique
method might make sense.
pandas/tseries/frequencies.py
Outdated
from pandas._libs.tslib import Timedelta | ||
from pandas._libs.tslibs.frequencies import ( # noqa | ||
get_freq_code, _base_and_stride, _period_str_to_code, | ||
_INVALID_FREQ_ERROR, opattern, _lite_rule_alias, _dont_uppercase, | ||
_period_code_map, _reverse_period_code_map) | ||
from pandas._libs.tslibs.resolution import ( # noqa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok let's remove any non-used, you can put the non-private ones in a whatsnew note
cdef int RESO_HR = 5 | ||
cdef int RESO_DAY = 6 | ||
|
||
_ONE_MICRO = 1000L |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be de-privatized (add to list or do here ok)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to the list it goes
rebase |
…libs-resolution
…libs-resolution
Hello @jbrockmendel! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on November 13, 2017 at 00:01 Hours UTC |
doc/source/whatsnew/v0.22.0.txt
Outdated
@@ -45,7 +45,7 @@ Other API Changes | |||
- :class:`Timestamp` will no longer silently ignore unused or invalid `tz` or `tzinfo` keyword arguments (:issue:`17690`) | |||
- :class:`Timestamp` will no longer silently ignore invalid `freq` arguments (:issue:`5168`) | |||
- :class:`CacheableOffset` and :class:`WeekDay` are no longer available in the `tseries.offsets` module (:issue:`17830`) | |||
|
|||
- :function:`tseries.frequencies.get_freq_group` and :list:`tseries.frequencies.DAYS` can now be accessed in `_libs.tslibs.resolution` (:issue:`18034`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just say they are removed from the private API, no need to point to a private file
pandas/tests/scalar/test_period.py
Outdated
@@ -13,7 +13,8 @@ | |||
from pandas._libs import tslib, period as libperiod | |||
from pandas._libs.tslibs.parsing import DateParseError | |||
from pandas import Period, Timestamp, offsets | |||
from pandas.tseries.frequencies import DAYS, MONTHS | |||
from pandas._libs.tslibs.resolution import DAYS | |||
from pandas.tseries.frequencies import MONTHS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why doesn't MONTHS simply exist in resolution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alias naming, will change
@@ -21,7 +21,8 @@ | |||
from pandas.core.base import SpecificationError, AbstractMethodError | |||
from pandas.errors import UnsupportedFunctionCall | |||
from pandas.core.groupby import DataError | |||
from pandas.tseries.frequencies import MONTHS, DAYS | |||
from pandas._libs.tslibs.resolution import DAYS | |||
from pandas.tseries.frequencies import MONTHS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
assert frequencies.get_freq_group(offsets.Week()) == 4000 | ||
assert frequencies.get_freq_group(offsets.Week(weekday=1)) == 4000 | ||
assert frequencies.get_freq_group(offsets.Week(weekday=5)) == 4000 | ||
assert resolution.get_freq_group('A') == 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
screaming for parametization (later is ok)
If we want to finish the day with a win, this and #18086 are probably about ready. |
rebase this |
doc/source/whatsnew/v0.22.0.txt
Outdated
@@ -45,7 +45,7 @@ Other API Changes | |||
- :class:`Timestamp` will no longer silently ignore unused or invalid `tz` or `tzinfo` keyword arguments (:issue:`17690`) | |||
- :class:`Timestamp` will no longer silently ignore invalid `freq` arguments (:issue:`5168`) | |||
- :class:`CacheableOffset` and :class:`WeekDay` are no longer available in the `tseries.offsets` module (:issue:`17830`) | |||
|
|||
- :function:`tseries.frequencies.get_freq_group` and :list:`tseries.frequencies.DAYS` are removed from the private API (:issue:`18034`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private -> public
…libs-resolution
I think next to go in. |
Awesome. I'll exercise some restraint and not immediately replace. |
Travis failure is a timeout in TestClipboard. I can only imagine the maintainers find this even more annoying than I do. Still: argh |
actually I merged a commit, which may fix this (but you rebase after). don't push again. let me review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small changes
doc/source/whatsnew/v0.22.0.txt
Outdated
@@ -45,6 +45,7 @@ Other API Changes | |||
- :class:`Timestamp` will no longer silently ignore unused or invalid ``tz`` or ``tzinfo`` keyword arguments (:issue:`17690`) | |||
- :class:`Timestamp` will no longer silently ignore invalid ``freq`` arguments (:issue:`5168`) | |||
- :class:`CacheableOffset` and :class:`WeekDay` are no longer available in the ``pandas.tseries.offsets`` module (:issue:`17830`) | |||
- :function:`tseries.frequencies.get_freq_group` and :list:`tseries.frequencies.DAYS` are removed from the public API (:issue:`18034`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these tags are not going to work, nor are these in the api.rst so they won't reference anything, just use tseries.frequencies.get_freq_group()
etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Does this take precedence over "don't push again"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no actually push again with this fix
everything else ok
# ---------------------------------------------------------------------- | ||
|
||
cpdef resolution(ndarray[int64_t] stamps, tz=None): | ||
cdef: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add to list to doc-string things
ok ping on green. |
needs a rebase. |
note that if you actually rebase on top of master (as opposed to merge master), then this becomes really simple to see where you are. you prob want to make a single commit when you do this as the conflicts are easier to resolve. or you can use an ide. It is much much easier to see what is going on w/o all of these constant merges. I would highly recommend you follow this workflow. since you rebase fairly frequently this is actually pretty easy to do. you don't actually need to squash (though again would recommend having logical commits). |
…libs-resolution
lgtm. ping on green. |
Ping |
thanks @jbrockmendel |
Note: "here" refers to So I tried using unique_1d directly and performance was hurt noticeably. Not a huge shock since unique_1d is in py-land. So as an experiment I took unique_1d, cut out a bunch of type-checking because the use case here is always a ndarray[int64_t], and implemented what was left directly in tslibs.resolution. The surprise was that performance was still noticeably worse than under the status quo. So is there a chance that this is performant logic here after all? |
One more step towards getting
tseries.frequencies
into cython (and getting rid of non-cython dependencies of tslib/period). Also moves towards de-coupling period from tslib.git diff upstream/master -u -- "*.py" | flake8 --diff