-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Series.map may raise TypeError in Categorical or DatetimeTz #12532
Conversation
if needs_i8_conversion(values.dtype): | ||
boxer = i8_boxer(values) | ||
values = lib.map_infer(values, boxer) | ||
not_ndarray = (is_categorical_dtype(self.dtype) or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use is_internal_type
(this gets sparse as well, but I think thats correct here as well), prob not tested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually should prob rename is_internal_type
-> is_extension_type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, renamed.
2aa510a
to
7b6c178
Compare
Let me confirm the result of
and current impl are as below and has inconsistencies. I think option #1 is preferable. map (option 1)It raises
apply (option 2)no changes from current master
|
yeah I think if you can construct a |
5a3f5eb
to
b3c1caf
Compare
@jreback OK, both categorical |
else: | ||
map_f = lib.map_infer | ||
values = _values_from_object(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wish we could just make this simpler. IOW the difference between extension and non-extension types is glaring here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, how about moving boxing logic to Block
? At a glance, i8_boxer
is only used few times in series.py
and frame.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that would be better, you would have to dispatch to a new method on block, but it would be much cleaner. Ok with this (minor comment). and can do that later (or can refactor here). lmk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, let me do it separately because it is likely to need API discussion (#12741).
categories=new_categories, | ||
ordered=self.ordered) | ||
except ValueError: | ||
return np.take(new_categories, self._codes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be:
Index(new_categories).take(self._codes)
? maybe that's the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the result should be the same. I chose numpy logic to omit Index
creation, but Index.take
is preferable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its fine, more of a style things, but yes should be the same
351786e
to
3abeb73
Compare
if is_extension_type(self.dtype): | ||
values = self._values | ||
if na_action is not None: | ||
raise NotImplementedError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, testing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -468,6 +468,24 @@ def take(self, indices, axis=0, allow_fill=True, fill_value=None): | |||
na_value=-1) | |||
return self._create_from_codes(taken) | |||
|
|||
def map(self, mapper): | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe in another PR can consolidate the doc-strings of map
into base (when we combine them all there)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left it as it is, because Categorical.map
needs some supplementary info. Using Appender
doesn't make it very simple.
9b15940
to
031c8e6
Compare
this looks fine. maybe we can come back and try to simplify logic even more w.r.t. the extension types. we have lots of if/then's handing around extension/ndarrays. An idea (maybe not trivial), is to have an But that's for another day. @kawochen any comments on this code? |
this lgtm. pls rebase |
Rebased and now green. |
thanks! |
git diff upstream/master | flake8 --diff
Needs to decide what categorical dtype
map
should return. Even thoughapply
returnsobject
dtype, but returningcategory
is more consistent?