Skip to content

.remove_category(np.nan) fails on Categorical with floats #10156

Closed
@wikiped

Description

@wikiped

Trying to remove a nan category from Categorical series fails if categories are made of floats.
In the docs it says:

Note: As integer Series can’t include NaN, the categories were converted to object.

So it is probably linked to this with float remaining float and nan != nan.

If this is intended behavior perhaps would be useful to add this to the docs?

import pandas as pd
df = pd.DataFrame({'a': pd.Categorical([1,2,3]),
                   'b': pd.Categorical(list('abc')),
                   'c': pd.Categorical([1.1,2.1,3.1])})
for col in df.columns:
    df[col].cat.add_categories(pd.np.nan, inplace=True)
    print df[col]
    df[col].cat.remove_categories(pd.np.nan)

0    1
1    2
2    3
Name: a, dtype: category
Categories (4, object): [1, 2, 3, NaN]
0    a
1    b
2    c
Name: b, dtype: category
Categories (4, object): [a, b, c, NaN]
0    1.1
1    2.1
2    3.1
Name: c, dtype: category
Categories (4, float64): [1.1, 2.1, 3.1, NaN]

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
d:\Anaconda\envs\py2k\lib\site-packages\pandas\core\categorical.pyc in _delegate_method(self, name, *args, **kwargs)
   1643         from pandas import Series
   1644         method = getattr(self.categorical, name)
-> 1645         res = method(*args, **kwargs)
   1646         if not res is None:
   1647             return Series(res, index=self.index)

d:\Anaconda\envs\py2k\lib\site-packages\pandas\core\categorical.pyc in remove_categories(self, removals, inplace)
    753         not_included = removals - set(self._categories)
    754         if len(not_included) != 0:
--> 755             raise ValueError("removals must all be in old categories: %s" % str(not_included))
    756         new_categories = [ c for c in self._categories if c not in removals ]
    757         return self.set_categories(new_categories, ordered=self.ordered, rename=False,

ValueError: removals must all be in old categories: set([nan])

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.9.final.0
python-bits: 64
pandas: 0.16.1
numpy: 1.9.2
    ...

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions