Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: missing vals not replaceable in categorical #40657

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.2.4.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ Fixed regressions
- Fixed regression in :meth:`DataFrame.to_json` raising ``AttributeError`` when run on PyPy (:issue:`39837`)
- Fixed regression in :meth:`DataFrame.where` not returning a copy in the case of an all True condition (:issue:`39595`)
- Fixed regression in :meth:`DataFrame.replace` raising ``IndexError`` when ``regex`` was a multi-key dictionary (:issue:`39338`)
- Fixed regression in :meth:`Series.replace` and :meth:`DataFrame.replace` not replacing missing values for :class:`CategoricalDtype` (:issue:`40472`)
-

.. ---------------------------------------------------------------------------
Expand Down
14 changes: 13 additions & 1 deletion pandas/core/arrays/categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -2350,7 +2350,7 @@ def replace(self, to_replace, value, inplace: bool = False):
# other cases, like if both to_replace and value are list-like or if
# to_replace is a dict, are handled separately in NDFrame
for replace_value, new_value in replace_dict.items():
if new_value == replace_value:
if new_value == replace_value or (isna(replace_value) and isna(new_value)):
continue
if replace_value in cat.categories:
if isna(new_value):
Expand All @@ -2365,6 +2365,18 @@ def replace(self, to_replace, value, inplace: bool = False):
else:
categories[index] = new_value
cat.rename_categories(categories, inplace=True)

# GH-40472: make sure missing values can also be replaced
elif isna(replace_value) and (cat._codes == -1).any():
if new_value in cat.categories:
categories = cat.categories.tolist()
value_index = categories.index(new_value)
else:
cat.add_categories(new_value, inplace=True)
value_index = len(cat.categories) - 1

cat._codes[cat._codes == -1] = value_index

if not inplace:
return cat

Expand Down
13 changes: 13 additions & 0 deletions pandas/tests/frame/methods/test_replace.py
Original file line number Diff line number Diff line change
Expand Up @@ -1652,6 +1652,19 @@ def test_replace_dict_category_type(self, input_category_df, expected_category_d

tm.assert_frame_equal(result, expected)

# Replace with an existing category and one which will add a new category
@pytest.mark.parametrize("new_value", ["c", "b"])
def test_replace_categorical_missing_vals(
self, frame_or_series, unique_nulls_fixture, new_value
):
# GH-40472
obj = frame_or_series([unique_nulls_fixture, "b"], dtype="category")

result = obj.replace({unique_nulls_fixture: new_value})
expected = frame_or_series([new_value, "b"], dtype="category")

tm.assert_equal(result, expected)

def test_replace_with_compiled_regex(self):
# https://github.com/pandas-dev/pandas/issues/35680
df = DataFrame(["a", "b", "c"])
Expand Down