-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split fastpath IntegerArray constructor and general purpose constructor #22070
Split fastpath IntegerArray constructor and general purpose constructor #22070
Conversation
looks interesting - will look tomorrow |
+1 on having a fastpath-like constructor, +1 on it not being accessed via a We don't need to require it of downstream EA authors, but for pandas-internal EAs we should have some kind of consistent patterns for EA constructors. Since |
@jbrockmendel good questions.
For |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks pretty good to me, onboard with this change. may need some doc updates in whatsnew (and add this PR number to the original whatsnew).
@@ -76,7 +76,7 @@ def construct_from_string(cls, string): | |||
"'{}'".format(cls, string)) | |||
|
|||
|
|||
def to_integer_array(values, dtype=None): | |||
def integer_array(values, dtype=None, copy=False): | |||
""" | |||
Infer and return an integer array of the values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you update doc-string
pandas/core/arrays/integer.py
Outdated
self._data, self._mask = coerce_to_array( | ||
values, dtype=dtype, mask=mask, copy=copy) | ||
if not (isinstance(values, np.ndarray) | ||
and np.issubdtype(values.dtype, np.integer)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is is_integer_dtype
pandas/core/arrays/integer.py
Outdated
if not (isinstance(values, np.ndarray) | ||
and np.issubdtype(values.dtype, np.integer)): | ||
raise TypeError("values should be integer numpy array") | ||
if not (isinstance(mask, np.ndarray) and mask.dtype == np.bool_): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use is_bool_dtype
@@ -94,7 +94,8 @@ def to_integer_array(values, dtype=None): | |||
------ | |||
TypeError if incompatible types | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there still a ref to to_integer_array? (I see it in the diff, but I also see that you changed it above)
return type(self)(self._data[item], | ||
mask=self._mask[item], | ||
dtype=self.dtype) | ||
return type(self)(self._data[item], self._mask[item]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback additional question: what do you find of writing IntegerArray(...)
instead of type(self)(...)
?
Python perfectly allows that (and is the same here, as we don't subclass this one further), and I personally find that easier to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like using type()
as I expect this to be subclasses for BooleanArray
, though it could be that this not needed for that, but want to keep open that possiblitiy.
Codecov Report
@@ Coverage Diff @@
## master #22070 +/- ##
==========================================
- Coverage 92.05% 92.05% -0.01%
==========================================
Files 169 169
Lines 50709 50719 +10
==========================================
+ Hits 46679 46688 +9
- Misses 4030 4031 +1
Continue to review full report at Codecov.
|
So the failures is because I now changed the code so that a list of numpy scalars is always coerced to int64 (before it took the type of the scalars). |
@jreback can you have a look at this again? |
yes |
ping |
thanks @jorisvandenbossche nice improvement |
Thanks! |
Related to the discussion we were having on the PR: #21160 (@jreback @shoyer).
Opening this PR for discussion, to show what I meant there, to hopefully make it more concrete discussion.
Briefly the changes / rationale:
coerce_to_array
:IntegerArray
is the fastpath if you know you have a propervalues
andmask
integer_array
as general purpose constructor function, that eg parses lists, detects NaNs etc (exact API to discuss of course)fastpath
keyword to the constructor, but IMO we should not repeat that pattern in new code (it is overloading the main constructor by combining things).Note that this is not yet a fully implemented PR:
coerce_to_array
can be simplified a little bit, as currently themask
argument is nowhere used (unless we foresee use cases for that?)