Skip to content

Commit

Permalink
make CategoricalIndex.__contains__ compatible with np<1.13
Browse files Browse the repository at this point in the history
  • Loading branch information
tp committed Jun 13, 2018
1 parent f856075 commit e616770
Show file tree
Hide file tree
Showing 3 changed files with 22 additions and 10 deletions.
4 changes: 3 additions & 1 deletion doc/source/whatsnew/v0.23.2.txt
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,9 @@ Fixed Regressions
Performance Improvements
~~~~~~~~~~~~~~~~~~~~~~~~

-
- Improved performance of membership checks in :class:`CategoricalIndex`
(i.e. ``x in ci``-style checks are much faster). :meth:`CategoricalIndex.contains`
is likewise much faster (:issue:`21369`)
-

Documentation Changes
Expand Down
3 changes: 0 additions & 3 deletions doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -65,9 +65,6 @@ Performance Improvements
~~~~~~~~~~~~~~~~~~~~~~~~

- Improved performance of :func:`Series.describe` in case of numeric dtpyes (:issue:`21274`)
- Improved performance of membership checks in :class:`CategoricalIndex`
(i.e. ``x in ci``-style checks are much faster). :meth:`CategoricalIndex.contains`
is likewise much faster (:issue:`21369`)

.. _whatsnew_0240.docs:

Expand Down
25 changes: 19 additions & 6 deletions pandas/core/indexes/category.py
Original file line number Diff line number Diff line change
Expand Up @@ -324,18 +324,31 @@ def _reverse_indexer(self):
@Appender(_index_shared_docs['__contains__'] % _index_doc_kwargs)
def __contains__(self, key):
hash(key)
if isna(key):

if isna(key): # is key NaN?
return self.isna().any()
elif self.categories._defer_to_indexing: # e.g. Interval values

# is key in self.categories? Then get its location.
# If not (i.e. KeyError), it logically can't be in self either
try:
loc = self.categories.get_loc(key)
return np.isin(self.codes, loc).any()
elif key in self.categories:
return self.categories.get_loc(key) in self._engine
else:
except KeyError:
return False

# loc is the location of key in self.categories, but also the value
# for key in self.codes and in self._engine. key may be in categories,
# but still not in self, check this. Example:
# 'b' in CategoricalIndex(['a'], categories=['a', 'b']) # False
if is_scalar(loc):
return loc in self._engine
else:
# if self.categories is IntervalIndex, loc is an array
# check if any scalar of the array is in self._engine
return any(loc_ in self._engine for loc_ in loc)

@Appender(_index_shared_docs['contains'] % _index_doc_kwargs)
def contains(self, key):
hash(key)
return key in self

def __array__(self, dtype=None):
Expand Down

0 comments on commit e616770

Please sign in to comment.