Skip to content

Commit

Permalink
improve performance of Series.searchsorted
Browse files Browse the repository at this point in the history
  • Loading branch information
tp committed Jul 24, 2018
1 parent b975455 commit 96b79cc
Show file tree
Hide file tree
Showing 3 changed files with 26 additions and 3 deletions.
19 changes: 19 additions & 0 deletions asv_bench/benchmarks/series_methods.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,25 @@ def time_dropna(self, dtype):
self.s.dropna()


class SearchSorted(object):

goal_time = 0.2
params = ['int8', 'int16', 'int32', 'int64',
'uint8', 'uint16', 'uint32', 'uint64',
'float16', 'float32', 'float64',
'str']
param_names = ['dtype']

def setup(self, dtype):
N = 10**5
data = np.array([1] * N + [2] * N + [3] * N).astype(dtype)
self.s = Series(data)

def time_searchsorted(self, dtype):
key = '2' if dtype == 'str' else 2
self.s.searchsorted(key)


class Map(object):

goal_time = 0.2
Expand Down
3 changes: 2 additions & 1 deletion doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -401,7 +401,8 @@ Performance Improvements
- Very large improvement in performance of slicing when the index is a :class:`CategoricalIndex`,
both when indexing by label (using .loc) and position(.iloc).
Likewise, slicing a ``CategoricalIndex`` itself (i.e. ``ci[100:200]``) shows similar speed improvements (:issue:`21659`)
- Improved performance of :func:`Series.describe` in case of numeric dtpyes (:issue:`21274`)
- Improved performance of :func:`Series.searchsorted` (:issue:`22034`)
- Improved performance of :func:`Series.describe` in case of numeric dtypes (:issue:`21274`)
- Improved performance of :func:`pandas.core.groupby.GroupBy.rank` when dealing with tied rankings (:issue:`21237`)
- Improved performance of :func:`DataFrame.set_index` with columns consisting of :class:`Period` objects (:issue:`21582`,:issue:`21606`)
- Improved performance of membership checks in :class:`Categorical` and :class:`CategoricalIndex`
Expand Down
7 changes: 5 additions & 2 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -2077,8 +2077,11 @@ def __rmatmul__(self, other):
def searchsorted(self, value, side='left', sorter=None):
if sorter is not None:
sorter = ensure_platform_int(sorter)
return self._values.searchsorted(Series(value)._values,
side=side, sorter=sorter)
if not is_extension_type(self._values):
value = np.asarray(value, dtype=self._values.dtype)
value = value[..., np.newaxis] if value.ndim == 0 else value

return self._values.searchsorted(value, side=side, sorter=sorter)

# -------------------------------------------------------------------
# Combination
Expand Down

0 comments on commit 96b79cc

Please sign in to comment.