Closed
Description
It seems to me that join operation fails if the index is not of unique values. The particular circumastance that I observed this was with multi-index:
df1.set_index( [ 'col1', 'col2', 'col3' ], inplace=True )
df2.join ( df1, on=['cola', 'colb', 'colc' ], how='left' )
I understand that the above join operation is not well-defined for non-unique index values, but pandas gives wrong values even for unique matches. ( no warnings, error messages whatsoever )
In case checking for index integrity has a heavy performance cost, it should be documented that this method fails if the index is not unique. ( or alternatively have the optional argument to enforce integrity check )
I could get correct join by doing below:
df1.drop_duplicates( cols=[ 'col1', 'col2', 'col3' ], inplace=True )
df1.set_index( [ 'col1', 'col2', 'col3' ], inplace=True )
df2.join ( df1, on=['cola', 'colb', 'colc' ], how='left' )