[PERF/ENH] `Index.intersection` does more hashing work than necessary

Index intersection performs an inner merge of the unique values of the left and right indices (the unique is done so that indices with repeated values don't blow up the memory footprint). This does a full hash of both indices, then the merge (hashing again). Finally, if requested, the result is sorted.

This could be replaced, I think with positive performance effect by either:

- `leftsemi` join + `drop_duplicates`
- `libcudf.search.contains` + `apply_boolean_mask` + `drop_duplicates`

One would have to think through the consequences of either of these wrt any ordering guarantees we might want when `sort=False` (possibly gated behind pandas-compat mode).

This applies _mutatis mutandis_ to `MultiIndex.intersection` too.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PERF/ENH] `Index.intersection` does more hashing work than necessary #14487

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[PERF/ENH] Index.intersection does more hashing work than necessary #14487

Description