Skip to content

left join fails in case of non-unique indices #5391

Closed
@behzadnouri

Description

@behzadnouri

It seems to me that join operation fails if the index is not of unique values. The particular circumastance that I observed this was with multi-index:

df1.set_index( [ 'col1', 'col2', 'col3' ], inplace=True )
df2.join ( df1, on=['cola', 'colb', 'colc' ], how='left' )

I understand that the above join operation is not well-defined for non-unique index values, but pandas gives wrong values even for unique matches. ( no warnings, error messages whatsoever )

In case checking for index integrity has a heavy performance cost, it should be documented that this method fails if the index is not unique. ( or alternatively have the optional argument to enforce integrity check )

I could get correct join by doing below:

df1.drop_duplicates( cols=[ 'col1', 'col2', 'col3' ], inplace=True )
df1.set_index( [ 'col1', 'col2', 'col3' ], inplace=True )
df2.join ( df1, on=['cola', 'colb', 'colc' ], how='left' )

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions