Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

changing order of keys? #24730

Closed
epierson9 opened this issue Jan 11, 2019 · 5 comments · Fixed by #54611
Closed

changing order of keys? #24730

epierson9 opened this issue Jan 11, 2019 · 5 comments · Fixed by #54611
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@epierson9
Copy link

Code Sample, a copy-pastable example if possible

import pandas as pd
import sys

print("pandas version is", pd.__version__)
print("Python version is", sys.version)
x = pd.DataFrame({'a':[1, 2, 1, 2, 1, 2, 1, 2], 'b':range(8)})
y = pd.DataFrame({'a':[1, 2]})

print("original x")
print(x)
print("\n\nmerged dataframe after inner merge")
print(pd.merge(x, y, how='inner', on=['a']))
print("\n\n***merged dataframe after left merge")
print(pd.merge(x, y, how='left', on=['a']))

Problem description

Based on the pandas documentation, I was expecting the order of the keys in the left dataframe (x) to be preserved in both cases. The documentation says:

left: use only keys from left frame, similar to a SQL left outer join; preserve key order
inner: use intersection of keys from both frames, similar to a SQL inner join; preserve the order of the left keys

Instead, the output is:

pandas version is 0.23.4
Python version is 3.5.2 (default, Nov 23 2017, 16:37:01)
[GCC 5.4.0 20160609]
original x
   a  b
0  1  0
1  2  1
2  1  2
3  2  3
4  1  4
5  2  5
6  1  6
7  2  7


merged dataframe after inner merge
   a  b
0  1  0
1  1  2
2  1  4
3  1  6
4  2  1
5  2  3
6  2  5
7  2  7


***merged dataframe after left merge
   a  b
0  1  0
1  2  1
2  1  2
3  2  3
4  1  4
5  2  5
6  1  6
7  2  7

Is this intended behavior? If so, the documentation seems a bit confusing?

@gfyoung gfyoung added Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff DataFrame DataFrame data structure Docs labels Jan 11, 2019
@gfyoung
Copy link
Member

gfyoung commented Jan 11, 2019

cc @jreback

@jreback
Copy link
Contributor

jreback commented Jan 13, 2019

inner does look a bit odd, it could be sorting.

we do have a sort=False (the default) this does control in the case of left but not inner, so maybe this is not being respected

In [14]: pd.merge(x, y, how='left', on=['a'], sort=True)                                                                                                                                                                                                                
Out[14]: 
   a  b
0  1  0
1  1  2
2  1  4
3  1  6
4  2  1
5  2  3
6  2  5
7  2  7

In [15]: pd.merge(x, y, how='left', on=['a'], sort=False)                                                                                                                                                                                                               
Out[15]: 
   a  b
0  1  0
1  2  1
2  1  2
3  2  3
4  1  4
5  2  5
6  1  6
7  2  7
In [17]: pd.merge(x, y, how='inner', on=['a'], sort=True)                                                                                                                                                                                                               
Out[17]: 
   a  b
0  1  0
1  1  2
2  1  4
3  1  6
4  2  1
5  2  3
6  2  5
7  2  7

In [18]: pd.merge(x, y, how='inner', on=['a'], sort=False)                                                                                                                                                                                                              
Out[18]: 
   a  b
0  1  0
1  1  2
2  1  4
3  1  6
4  2  1
5  2  3
6  2  5
7  2  7

@epierson9 welcome for a deeper look.

@ischurov
Copy link
Contributor

ischurov commented Mar 5, 2019

Is it the same issue as #18776 ?

@gfyoung
Copy link
Member

gfyoung commented Mar 6, 2019

@ischurov : Good question! It looks related.

@mroeschke mroeschke added the Bug label Apr 20, 2020
@DanielFEvans
Copy link
Contributor

Does Pandas have a policy on noting long-standing bugs like this one in the docs? It would save a slight amount of developer anguish to know that a workaround is needed from the start!

However, remembering to remove any "known bug" references in documentation would be quite an overhead, so I'd guess that it's probably avoided.

@mroeschke mroeschke added Reshaping Concat, Merge/Join, Stack/Unstack, Explode and removed Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff DataFrame DataFrame data structure Docs labels Jun 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants