Skip to content

Commit

Permalink
Backport PR pandas-dev#57329: REGR: CategoricalIndex.difference with …
Browse files Browse the repository at this point in the history
…null values
  • Loading branch information
lukemanley authored and meeseeksmachine committed Feb 10, 2024
1 parent 0443427 commit 6982e0d
Show file tree
Hide file tree
Showing 3 changed files with 24 additions and 2 deletions.
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.2.1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ Fixed regressions
- Fixed regression in :func:`wide_to_long` raising an ``AttributeError`` for string columns (:issue:`57066`)
- Fixed regression in :meth:`.DataFrameGroupBy.idxmin`, :meth:`.DataFrameGroupBy.idxmax`, :meth:`.SeriesGroupBy.idxmin`, :meth:`.SeriesGroupBy.idxmax` ignoring the ``skipna`` argument (:issue:`57040`)
- Fixed regression in :meth:`.DataFrameGroupBy.idxmin`, :meth:`.DataFrameGroupBy.idxmax`, :meth:`.SeriesGroupBy.idxmin`, :meth:`.SeriesGroupBy.idxmax` where values containing the minimum or maximum value for the dtype could produce incorrect results (:issue:`57040`)
- Fixed regression in :meth:`CategoricalIndex.difference` raising ``KeyError`` when other contains null values other than NaN (:issue:`57318`)
- Fixed regression in :meth:`DataFrame.loc` raising ``IndexError`` for non-unique, masked dtype indexes where result has more than 10,000 rows (:issue:`57027`)
- Fixed regression in :meth:`DataFrame.sort_index` not producing a stable sort for a index with duplicates (:issue:`57151`)
- Fixed regression in :meth:`DataFrame.to_dict` with ``orient='list'`` and datetime or timedelta types returning integers (:issue:`54824`)
Expand Down
7 changes: 5 additions & 2 deletions pandas/core/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -3663,9 +3663,12 @@ def difference(self, other, sort=None):

def _difference(self, other, sort):
# overridden by RangeIndex
this = self
if isinstance(self, ABCCategoricalIndex) and self.hasnans and other.hasnans:
this = this.dropna()
other = other.unique()
the_diff = self[other.get_indexer_for(self) == -1]
the_diff = the_diff if self.is_unique else the_diff.unique()
the_diff = this[other.get_indexer_for(this) == -1]
the_diff = the_diff if this.is_unique else the_diff.unique()
the_diff = _maybe_try_sort(the_diff, sort)
return the_diff

Expand Down
18 changes: 18 additions & 0 deletions pandas/tests/indexes/categorical/test_setops.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
import numpy as np
import pytest

from pandas import (
CategoricalIndex,
Index,
)
import pandas._testing as tm


@pytest.mark.parametrize("na_value", [None, np.nan])
def test_difference_with_na(na_value):
# GH 57318
ci = CategoricalIndex(["a", "b", "c", None])
other = Index(["c", na_value])
result = ci.difference(other)
expected = CategoricalIndex(["a", "b"], categories=["a", "b", "c"])
tm.assert_index_equal(result, expected)

0 comments on commit 6982e0d

Please sign in to comment.