Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: merge not always following documented sort behavior #54611

Merged
merged 7 commits into from
Aug 23, 2023

Conversation

lukemanley
Copy link
Member

@lukemanley lukemanley commented Aug 18, 2023

When merging dataframes there are a number of different code paths hit depending on arguments passed (e.g. how, sort, on index vs columns) as well as left/right characteristics (e.g. unique, monotonic)

The resulting sort behavior is not always consistent and does not always align with documented behavior.

The docs state:

sort: bool, default False
    Sort the join keys lexicographically in the result DataFrame. If False, the order 
    of the join keys depends on the join type (how keyword).

...

left: preserve the order of the left keys
right: preserve the order of the right keys
outer: sort keys lexicographically
inner: preserve the order of the left keys

This MR aims to fix the sort behavior for cases where it does not follow documented behavior and add tests to validate sort behavior across a wide range of arguments.

NOTE: a few existing tests that relied on incorrect sort behavior were updated.

@lukemanley lukemanley added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Aug 18, 2023
@lukemanley lukemanley added this to the 2.2 milestone Aug 18, 2023
@@ -189,7 +189,7 @@ Groupby/resample/rolling

Reshaping
^^^^^^^^^
-
- Bug in :func:`merge` not following documented sort behavior in certain cases (:issue:`54611`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to have this in the "notable bug fix" section since it fixes a lot of issues!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to the notable section

@mroeschke mroeschke merged commit f9f1643 into pandas-dev:main Aug 23, 2023
33 checks passed
@mroeschke
Copy link
Member

Nice! Thanks @lukemanley

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
2 participants