Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Index.join different sorting behavior in libjoin faspath #54646

Open
jbrockmendel opened this issue Aug 20, 2023 · 6 comments
Open

BUG: Index.join different sorting behavior in libjoin faspath #54646

jbrockmendel opened this issue Aug 20, 2023 · 6 comments
Labels
Bug Index Related to the Index class or subclasses Internals Related to non-user accessible pandas implementation Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@jbrockmendel
Copy link
Member

jbrockmendel commented Aug 20, 2023

In Index we have a _can_use_libjoin check that determined if we can use the libjoin fastpaths (AFAICT these are just fastpaths and the behavior is supposed to be identical to the non-fastpaths). There are a few cases where _can_use_libjoin is not strict enough and we end up making copies in order to use the fastpaths, which negates the benefits (havent actually done any measurements, just assuming). In particular MultiIndex and RangeIndex cases.

Patching can_use_libjoin to return False in MultiIndex and RangeIndex cases breaks a bunch of tests bc the join results cease to be ordered. Needs further investigation.

@jbrockmendel jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 20, 2023
@lukemanley
Copy link
Member

Any idea if #54611 helps here?

@jbrockmendel
Copy link
Member Author

Based on a quick look I’m optimistic

@lithomas1 lithomas1 added Internals Related to non-user accessible pandas implementation Reshaping Concat, Merge/Join, Stack/Unstack, Explode Index Related to the Index class or subclasses and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 23, 2023
@jbrockmendel
Copy link
Member Author

@lukemanley was this closed by #54765?

@lukemanley
Copy link
Member

@lukemanley was this closed by #54765?

#54765 addressed the MultiIndex case. The RangeIndex case is still outstanding. I don't recall the specifics, but I think the RangeIndex case required a bit more refactoring.

@rhshadrach
Copy link
Member

Is this related to this comment?

elif how == "outer":
# TODO: sort=True here for backwards compat. It may
# be better to use the sort parameter passed into join
join_index = self.union(other)

Bumped into the comment above and will link to this issue if it is.

@jbrockmendel
Copy link
Member Author

definitely looks related, yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Index Related to the Index class or subclasses Internals Related to non-user accessible pandas implementation Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests

4 participants