Skip to content

[PERF] Get rid of MultiIndex conversion in IntervalIndex.intersection #26225

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 33 commits into from
Jun 6, 2019
Merged
Changes from 1 commit
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
a5a1272
Gid rid of MultiIndex conversion in IntervalIndex.intersection
makbigc Apr 21, 2019
3cd095a
Add benchmark for IntervalIndex.intersection
makbigc Apr 27, 2019
0486a4e
clear code
makbigc Apr 27, 2019
09c89f1
Add whatsnew note
makbigc Apr 27, 2019
841a0b7
Modity the case for duplicate index
makbigc May 1, 2019
8b22623
Combine the set operation to find indexer into one
makbigc May 1, 2019
32d4005
Move setops tests to test_setops.py and add two tests
makbigc May 1, 2019
d502fcb
Remove relundant line
makbigc May 1, 2019
8ec6366
Remove duplicate line in whatsnew note
makbigc May 1, 2019
6000904
Isort interval/test_setops.py
makbigc May 1, 2019
7cb7d2c
Split the intersection into two sub-functions
makbigc May 1, 2019
bcf36bb
Functionalize some indexes
makbigc May 5, 2019
745c0bb
Remove relundant lines in whatsnew
makbigc May 5, 2019
ff8bb97
Fixturize the sort parameter
makbigc May 6, 2019
17d775f
Factor out the check and decorate the setops
makbigc May 7, 2019
03a989a
Add docstring to two subfunction
makbigc May 8, 2019
b36cbc8
Add intersection into _index_shared_docs
makbigc May 8, 2019
1cdb170
Isort and change the decorator's name
makbigc May 10, 2019
18c2d37
Remove object inheritance
makbigc May 11, 2019
d229677
merge master
makbigc May 14, 2019
35594b0
Add docstring to setop_check
makbigc May 16, 2019
0834206
Merge master again
makbigc May 16, 2019
3cf5be8
merge again
makbigc May 23, 2019
9cf9b7e
complete merge
makbigc May 23, 2019
ab67edd
2nd approach
makbigc May 25, 2019
402b09c
Add a new benchmark
makbigc May 25, 2019
b4f130d
Fix linting issue
makbigc May 25, 2019
3ff4c64
Change the decorator name to SetopCheck
makbigc May 26, 2019
3db3130
Amend and add test for a more corner case
makbigc May 28, 2019
1f25adb
Merge commit to resolve conflict
makbigc May 28, 2019
4a9cd29
merge master
makbigc May 29, 2019
1467e94
merge
makbigc Jun 4, 2019
ea2550a
merge again
makbigc Jun 6, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
Gid rid of MultiIndex conversion in IntervalIndex.intersection
  • Loading branch information
makbigc committed May 10, 2019
commit a5a1272b7b5e9f0eb774ce746e729dedd4862c89
59 changes: 58 additions & 1 deletion pandas/core/indexes/interval.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
is_interval_dtype, is_list_like, is_number, is_object_dtype, is_scalar)
from pandas.core.dtypes.missing import isna

import pandas.core.algorithms as algos
from pandas.core.arrays.interval import IntervalArray, _interval_shared_docs
import pandas.core.common as com
import pandas.core.indexes.base as ibase
Expand Down Expand Up @@ -1090,6 +1091,63 @@ def equals(self, other):
def overlaps(self, other):
return self._data.overlaps(other)

def intersection2(self, other, sort=False):
other = self._as_like_interval_index(other)

# GH 19016: ensure set op will not return a prohibited dtype
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I it worth factoring out the this check from here and _setop to a helper routine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I add the decorator for the check when more setops in IntervalIndex's own form is implemented?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you can do it here would be great; this is pretty common code

subtypes = [self.dtype.subtype, other.dtype.subtype]
common_subtype = find_common_type(subtypes)
if is_object_dtype(common_subtype):
msg = ('can only do intersection between two IntervalIndex '
'objects that have compatible dtypes')
raise TypeError(msg)

try:
lindexer = other.left.get_indexer(self.left)
rindexer = other.right.get_indexer(self.right)
except Exception:
# duplicates
lindexer = algos.unique1d(
other.left.get_indexer_non_unique(self.left)[0])
rindexer = algos.unique1d(
other.right.get_indexer_non_unique(self.right)[0])

match = (lindexer == rindexer) & (lindexer != -1)
indexer = lindexer.take(match.nonzero()[0])
taken = other.take(indexer)

return taken

def intersection(self, other, sort=False):
other = self._as_like_interval_index(other)

# GH 19016: ensure set op will not return a prohibited dtype
subtypes = [self.dtype.subtype, other.dtype.subtype]
common_subtype = find_common_type(subtypes)
if is_object_dtype(common_subtype):
msg = ('can only do intersection between two IntervalIndex '
'objects that have compatible dtypes')
raise TypeError(msg)

try:
lindexer = self.left.get_indexer(other.left)
rindexer = self.right.get_indexer(other.right)
except Exception:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would prefer to check for duplicates rather than catch an exception

# duplicates
lindexer = algos.unique1d(
self.left.get_indexer_non_unique(other.left)[0])
rindexer = algos.unique1d(
self.right.get_indexer_non_unique(other.right)[0])

match = (lindexer == rindexer) & (lindexer != -1)
indexer = lindexer.take(match.nonzero()[0])
taken = self.take(indexer)

if sort is None:
taken = taken.sort_values()

return taken

def _setop(op_name, sort=None):
def func(self, other, sort=sort):
other = self._as_like_interval_index(other)
Expand Down Expand Up @@ -1125,7 +1183,6 @@ def is_all_dates(self):
return False

union = _setop('union')
intersection = _setop('intersection', sort=False)
difference = _setop('difference')
symmetric_difference = _setop('symmetric_difference')

Expand Down