BUG: Series.map may raise TypeError in Categorical or DatetimeTz #12532

sinhrks · 2016-03-05T15:07:41Z

closes Pandas datetime64 series no longer has map function when localized #12473
tests added / passed
passes git diff upstream/master | flake8 --diff
whatsnew entry

Needs to decide what categorical dtype map should return. Even though apply returns object dtype, but returning category is more consistent?

jreback · 2016-03-12T17:57:42Z

pandas/core/series.py

-        if needs_i8_conversion(values.dtype):
-            boxer = i8_boxer(values)
-            values = lib.map_infer(values, boxer)
+        not_ndarray = (is_categorical_dtype(self.dtype) or


use is_internal_type (this gets sparse as well, but I think thats correct here as well), prob not tested

actually should prob rename is_internal_type -> is_extension_type

OK, renamed.

sinhrks · 2016-03-15T07:41:46Z

Let me confirm the result of map/apply against categorical dtype. I think there are 3 options.

Keep categorical dtype. If number of category is changed, raise error.
Coerce to normal (object) dtype befoe map/apply
Try 1st option, then 2nd option if catches error

and current impl are as below and has inconsistencies. I think option #1 is preferable.

map (option 1)

It raises AttributeError on current master.

s = pd.Series([1, 2, 3], dtype='category')
s.map(lambda x: x)
# 0    1
# 1    2
# 2    3
# dtype: category
# Categories (3, int64): [1, 2, 3]

apply (option 2)

no changes from current master

s.apply(lambda x: x)

jreback · 2016-03-15T13:49:28Z

yeah I think if you can construct a category with exactly the same dtype (IOW, categories & ordered) then do it. Otherwise just coerce to object.

sinhrks · 2016-03-20T13:34:33Z

@jreback OK, both categorical apply/map are fixed to return category if possible, otherwise coerce to appropriate dtype.

jreback · 2016-03-22T13:16:33Z

pandas/core/series.py

        else:
-            map_f = lib.map_infer
+            values = _values_from_object(self)


wish we could just make this simpler. IOW the difference between extension and non-extension types is glaring here.

Hmm, how about moving boxing logic to Block? At a glance, i8_boxer is only used few times in series.py and frame.py.

yeah, that would be better, you would have to dispatch to a new method on block, but it would be much cleaner. Ok with this (minor comment). and can do that later (or can refactor here). lmk.

OK, let me do it separately because it is likely to need API discussion (#12741).

jreback · 2016-03-24T16:37:14Z

pandas/core/categorical.py

+                                          categories=new_categories,
+                                          ordered=self.ordered)
+        except ValueError:
+            return np.take(new_categories, self._codes)


shouldn't this be:

Index(new_categories).take(self._codes)? maybe that's the same

Yes, the result should be the same. I chose numpy logic to omit Index creation, but Index.take is preferable?

its fine, more of a style things, but yes should be the same

jreback · 2016-03-30T13:05:24Z

pandas/core/series.py

+        if is_extension_type(self.dtype):
+            values = self._values
+            if na_action is not None:
+                raise NotImplementedError


hmm, testing this?

tested in here.

https://github.com/pydata/pandas/pull/12532/files#diff-3c2759e1313d1a5f367b4b73020a18e7R302

jreback · 2016-03-31T14:06:32Z

pandas/indexes/category.py

@@ -468,6 +468,24 @@ def take(self, indices, axis=0, allow_fill=True, fill_value=None):
                                           na_value=-1)
        return self._create_from_codes(taken)

+    def map(self, mapper):
+        """


maybe in another PR can consolidate the doc-strings of map into base (when we combine them all there)

Left it as it is, because Categorical.map needs some supplementary info. Using Appender doesn't make it very simple.

sinhrks · 2016-04-08T22:18:04Z

@jreback Because #12798 relies on this, can we consider to merge this in prior to fixing whole Index.map API ( #12756)?

jreback · 2016-04-09T15:04:10Z

this looks fine. maybe we can come back and try to simplify logic even more w.r.t. the extension types.

we have lots of if/then's handing around extension/ndarrays. An idea (maybe not trivial), is to have an Array internally which is really an extension type that simply passes thru everything, then we can define all of the methods on this type (and just pass thru everything else), e.g. map, .value_counts etc. This gets into libpandas territory (well it actually makes it simplerI think) as everything would then have a uniform API (from pandas perspective). The implementation can then be different.

But that's for another day.

@kawochen any comments on this code?

jreback · 2016-04-17T15:55:37Z

this lgtm. pls rebase

sinhrks · 2016-04-17T22:23:51Z

Rebased and now green.

jreback · 2016-04-18T17:17:48Z

thanks!

sinhrks added Dtype Conversions Unexpected or buggy dtype conversions Timezones Timezone data dtype Categorical Categorical Data Type labels Mar 5, 2016

sinhrks added this to the 0.18.1 milestone Mar 5, 2016

sinhrks force-pushed the datetime_map branch from 5c42d20 to 183f5b7 Compare March 5, 2016 15:37

jreback reviewed Mar 12, 2016
View reviewed changes

sinhrks force-pushed the datetime_map branch 2 times, most recently from 2aa510a to 7b6c178 Compare March 14, 2016 06:49

sinhrks force-pushed the datetime_map branch 3 times, most recently from 5a3f5eb to b3c1caf Compare March 20, 2016 12:00

sinhrks force-pushed the datetime_map branch from b3c1caf to e0b4465 Compare March 22, 2016 11:08

jreback reviewed Mar 22, 2016
View reviewed changes

sinhrks force-pushed the datetime_map branch from e0b4465 to f5b93e5 Compare March 24, 2016 11:50

jreback reviewed Mar 24, 2016
View reviewed changes

sinhrks force-pushed the datetime_map branch 2 times, most recently from 351786e to 3abeb73 Compare March 29, 2016 20:34

sinhrks mentioned this pull request Mar 30, 2016

CLN: Move i8_boxer logic to BlockManager #12741

Closed

jreback reviewed Mar 30, 2016
View reviewed changes

jreback mentioned this pull request Mar 31, 2016

DataFrame indices can't map through dictionaries or series #12756

Closed

sinhrks force-pushed the datetime_map branch from 3abeb73 to a320fcc Compare March 31, 2016 13:56

jreback reviewed Mar 31, 2016
View reviewed changes

sinhrks force-pushed the datetime_map branch from a320fcc to 93ff9f1 Compare April 3, 2016 16:08

sinhrks force-pushed the datetime_map branch 2 times, most recently from 9b15940 to 031c8e6 Compare April 8, 2016 21:22

sinhrks mentioned this pull request Apr 8, 2016

API: map() on Index returns an Index, not array #12798

Closed

4 tasks

jreback mentioned this pull request Apr 11, 2016

cross section coercion with output iterating #12859

Closed

sinhrks force-pushed the datetime_map branch from 031c8e6 to d1b4933 Compare April 16, 2016 11:33

BUG: Series.map may raise TypeError in Categorical or DatetimeTz

027689e

sinhrks force-pushed the datetime_map branch from d1b4933 to 027689e Compare April 17, 2016 15:59

jreback closed this in 4c84f2d Apr 18, 2016

sinhrks deleted the datetime_map branch April 18, 2016 20:30

Uh oh!

BUG: Series.map may raise TypeError in Categorical or DatetimeTz #12532

BUG: Series.map may raise TypeError in Categorical or DatetimeTz #12532

Uh oh!

Conversation

sinhrks commented Mar 5, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sinhrks commented Mar 15, 2016

map (option 1)

apply (option 2)

Uh oh!

jreback commented Mar 15, 2016

Uh oh!

sinhrks commented Mar 20, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sinhrks commented Apr 8, 2016

Uh oh!

jreback commented Apr 9, 2016

Uh oh!

jreback commented Apr 17, 2016

Uh oh!

sinhrks commented Apr 17, 2016

Uh oh!

jreback commented Apr 18, 2016

Uh oh!

Uh oh!