Skip to content

DataFrame.fillna() fails with categorical columns present when trying to fill missing values in numeric columns with NaNs #24079

Open
@dragoljub

Description

Code Sample, a copy-pastable example if possible

import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(0, 3, (3, 3)), columns=list('ABC')).astype(np.float32)
df.B.iloc[1] = None
df.A = df.A.astype('category')

df.fillna(-9999)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-8-dcdff14fcba0> in <module>
----> 1 df.fillna(-9999)

D:\Python37\lib\site-packages\pandas\core\frame.py in fillna(self, value, method, axis, inplace, limit, downcast, **kwargs)
   3788                      self).fillna(value=value, method=method, axis=axis,
   3789                                   inplace=inplace, limit=limit,
-> 3790                                   downcast=downcast, **kwargs)
   3791 
   3792     @Appender(_shared_docs['replace'] % _shared_doc_kwargs)

D:\Python37\lib\site-packages\pandas\core\generic.py in fillna(self, value, method, axis, inplace, limit, downcast)
   5425                 new_data = self._data.fillna(value=value, limit=limit,
   5426                                              inplace=inplace,
-> 5427                                              downcast=downcast)
   5428             elif isinstance(value, DataFrame) and self.ndim == 2:
   5429                 new_data = self.where(self.notna(), value)

D:\Python37\lib\site-packages\pandas\core\internals.py in fillna(self, **kwargs)
   3706 
   3707     def fillna(self, **kwargs):
-> 3708         return self.apply('fillna', **kwargs)
   3709 
   3710     def downcast(self, **kwargs):

D:\Python37\lib\site-packages\pandas\core\internals.py in apply(self, f, axes, filter, do_integrity_check, consolidate, **kwargs)
   3579 
   3580             kwargs['mgr'] = self
-> 3581             applied = getattr(b, f)(**kwargs)
   3582             result_blocks = _extend_blocks(applied, result_blocks)
   3583 

D:\Python37\lib\site-packages\pandas\core\internals.py in fillna(self, value, limit, inplace, downcast, mgr)
   2004                mgr=None):
   2005         values = self.values if inplace else self.values.copy()
-> 2006         values = values.fillna(value=value, limit=limit)
   2007         return [self.make_block_same_class(values=values,
   2008                                            placement=self.mgr_locs,

D:\Python37\lib\site-packages\pandas\util\_decorators.py in wrapper(*args, **kwargs)
    176                 else:
    177                     kwargs[new_arg_name] = new_arg_value
--> 178             return func(*args, **kwargs)
    179         return wrapper
    180     return _deprecate_kwarg

D:\Python37\lib\site-packages\pandas\core\arrays\categorical.py in fillna(self, value, method, limit)
   1754             elif is_hashable(value):
   1755                 if not isna(value) and value not in self.categories:
-> 1756                     raise ValueError("fill value must be in categories")
   1757 
   1758                 mask = values == -1

ValueError: fill value must be in categories

Problem description

fillna() fails on a data frame that has categorical columns without any missing values and numeric columns that have some missing values. It appears that fillna is attempting to fill categorical columns even though they have no missing values.

Expected Output

I expected the missing value in the numeric column to be filled but the categorical columns without any missing values to be untouched.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.1.final.0
python-bits: 64
OS: Windows
OS-release: 2008ServerR2
machine: AMD64
processor: Intel64 Family 6 Model 63 Stepping 2, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.4
pytest: 4.0.0
pip: 18.1
setuptools: 40.6.2
Cython: 0.29
numpy: 1.15.4
scipy: 1.1.0
pyarrow: 0.11.1
xarray: 0.11.0
IPython: 7.1.1
sphinx: 1.8.2
patsy: 0.5.1
dateutil: 2.7.5
pytz: 2018.7
blosc: 1.6.2
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.8
feather: None
matplotlib: 3.0.2
openpyxl: 2.5.10
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: None
lxml: 4.2.5
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.14
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: 0.1.6
pandas_gbq: None
pandas_datareader: None

Metadata

Assignees

No one assigned

    Labels

    BugCategoricalCategorical Data TypeMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions