Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REF: move range-generation functions to EA mixin classes #22016

Merged
merged 8 commits into from
Jul 26, 2018

Conversation

jbrockmendel
Copy link
Member

With the DatetimeArray range-generating functions moved, we are finally able to move shift, and in turn __add__, __sub__, etc.

Two non-trivial changes made during the move process, see inline comments.

Upcoming commits will add docstrings and port tests. ATM many tests are a PITA bc the Array contructors don't know how to handle lists of strings. Trying to find a way to implement that in __new__ without having it become a total mess.

@pep8speaks
Copy link

pep8speaks commented Jul 21, 2018

Hello @jbrockmendel! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on July 25, 2018 at 19:12 Hours UTC

start = start.tz_localize(tz)
end = end.tz_localize(tz)
arr = np.linspace(start.value, end.value, periods)
index = cls._simple_new(arr.astype('M8[ns]'), freq=None, tz=tz)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of two things changed from the existing version. ATM this is:

                index = tools.to_datetime(np.linspace(start.value,
                                                      end.value, periods),
                                          utc=True)
                index = index.tz_convert(tz)

Changed to avoid the dependency on tools.

dates = list(xdr)
# utc = len(dates) > 0 and dates[0].tzinfo is not None
values = np.array([x.value for x in dates])
data = cls._simple_new(values, freq=freq, tz=tz)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the second change from the existing version. ATM this reads:

        # utc = len(dates) > 0 and dates[0].tzinfo is not None
        data = tools.to_datetime(dates)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can go ahead and blow away the commented line of code above and just directly use xdr:

values = np.array([x.value for x in xdr])

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, will do. I'd like to track down the source of the comment and make sure it isn't something important.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm this list conversion is problematic

@codecov
Copy link

codecov bot commented Jul 21, 2018

Codecov Report

Merging #22016 into master will increase coverage by <.01%.
The diff coverage is 94.64%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #22016      +/-   ##
==========================================
+ Coverage   92.02%   92.03%   +<.01%     
==========================================
  Files         170      170              
  Lines       50707    50743      +36     
==========================================
+ Hits        46661    46699      +38     
+ Misses       4046     4044       -2
Flag Coverage Δ
#multiple 90.44% <94.64%> (+0.01%) ⬆️
#single 42.35% <56.85%> (+0.03%) ⬆️
Impacted Files Coverage Δ
pandas/core/indexes/timedeltas.py 91.16% <100%> (-0.23%) ⬇️
pandas/core/indexes/base.py 96.37% <100%> (ø) ⬆️
pandas/core/arrays/period.py 91.22% <100%> (+0.03%) ⬆️
pandas/core/indexes/datetimes.py 95.68% <100%> (+0.55%) ⬆️
pandas/core/indexes/datetimelike.py 97.77% <100%> (+0.42%) ⬆️
pandas/core/arrays/datetimes.py 95.46% <91.2%> (-0.85%) ⬇️
pandas/core/arrays/timedeltas.py 92.53% <95%> (+0.77%) ⬆️
pandas/core/arrays/datetimelike.py 95.06% <96.46%> (+0.57%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a6c7387...e33ff43. Read the comment docs.

@gfyoung gfyoung added Refactor Internal refactoring of code ExtensionArray Extending pandas with custom dtypes or arrays. labels Jul 23, 2018
"""
if is_period_dtype(cls):
# Frequency validation is not meaningful for Period Array/Index
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

be explicit about the return

raise TypeError("cannot add {dtype}-dtype to {cls}"
.format(dtype=other.dtype,
cls=type(self).__name__))
elif is_categorical_dtype(other):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better is is_extension_array_dtype

raise TypeError("cannot subtract {dtype}-dtype from {cls}"
.format(dtype=other.dtype,
cls=type(self).__name__))
elif is_categorical_dtype(other):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

dates = list(xdr)
# utc = len(dates) > 0 and dates[0].tzinfo is not None
values = np.array([x.value for x in dates])
data = cls._simple_new(values, freq=freq, tz=tz)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm this list conversion is problematic



def _infer_tz_from_endpoints(start, end, tz):
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for sure add doc-strings in future

@jbrockmendel
Copy link
Member Author

hmm this list conversion is problematic

Why?

@jreback
Copy link
Contributor

jreback commented Jul 24, 2018

hmm this list conversion is problematic

you are turning an array into a list, only to turn it back to an array

@jbrockmendel
Copy link
Member Author

you are turning an array into a list, only to turn it back to an array

It's the same in the existing implementation; we can open an issue if there's a real perf impact.

@jreback
Copy link
Contributor

jreback commented Jul 24, 2018

it's the same in the existing implementation; we can open an issue if there's a real perf impact.

ok if that's true, then this is prob an overblow concern. pls give a quick test to see if this is actually an issue.

rebase in an y event.

@jreback jreback added this to the 0.24.0 milestone Jul 24, 2018
@@ -754,44 +593,6 @@ def isin(self, values):

return algorithms.isin(self.asi8, values.asi8)

def shift(self, n, freq=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep a stub calling the super method here as well? (as you did with the arithmetic methods or with _generate_range)
I know it is not strictly needed here, but I have the feeling that will make it more clear that then in the future we need to update this from calling super class method to calling the composited values method (although I suppose we have tests for it that will also be a reminder to do it)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For _generate_range and a few others it is needed because the Index method needs to set result.name.

There are a handful of other things things in the same situation as shift (e.g. several properties that are overriden by cache_readonlys) that will need to be edited if/when the change from subclassing to composition is made. I'll make a note not to forget shift when the time comes.

@jorisvandenbossche jorisvandenbossche changed the title move range-generation functions to EA mixin classes REF: move range-generation functions to EA mixin classes Jul 26, 2018
@jreback jreback merged commit a1a2f66 into pandas-dev:master Jul 26, 2018
@jreback
Copy link
Contributor

jreback commented Jul 26, 2018

thanks @jbrockmendel, pls follow up with @jorisvandenbossche comments in next PR

@jbrockmendel jbrockmendel deleted the nocache2 branch July 26, 2018 16:22
@jbrockmendel
Copy link
Member Author

@TomAugspurger I think with this PeriodArrayMixin has everything needed for comparisons+arithmetic, so PeriodBlock should be viable.

@jorisvandenbossche
Copy link
Member

@jbrockmendel you are planning to do a PR for that?

@jbrockmendel
Copy link
Member Author

you are planning to do a PR for that?

IIRC Tom wanted to be informed when the arith/compare parts were ready so he could forge ahead with PeriodBlock.

My current focus is on fixing DataFrame/Series ops inconsistencies by having DataFrame dispatch to Series (#22068, #22019) and getting thorough tests for these new EA subclasses (#22033)

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jul 26, 2018 via email

@jorisvandenbossche
Copy link
Member

@jbrockmendel since you are now very familiar with code, I would say, certainly go ahead (if you have time of course)

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Refactor Internal refactoring of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants