Skip to content

Series.map on a Categorical with a function only processes each unique value once #15706

Open
@Kodiologist

Description

@Kodiologist

Code Sample

>>> import pandas as pd
>>> pd.Series(list("cabaa")).map(print)
c
a
b
a
a
0    None
1    None
2    None
3    None
4    None
dtype: object
>>> pd.Series(list("cabaa"), dtype = "category").map(print)
a
b
c
/usr/local/lib/python3.6/site-packages/pandas/core/categorical.py:952: FutureWarning: 
Setting NaNs in `categories` is deprecated and will be removed in a future version of pandas.
  ordered=self.ordered)
0    None
1    None
2    None
3    None
4    None
dtype: object

Problem description

Either Categorical Series should act the same as other Series, or this special behavior for Categoricals should be mentioned in the documentation for Series.map.

A limitation of the current behavior for Categoricals is that the function is never called for any nulls (codes equal to -1).

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.0.final.0
python-bits: 64
OS: Linux
OS-release: 4.8.0-37-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.19.2
nose: None
pip: 9.0.1
setuptools: 28.8.0
Cython: None
numpy: 1.12.0
scipy: 0.18.1
statsmodels: 0.8.0
xarray: None
IPython: None
sphinx: None
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 2.0.0
openpyxl: None
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions