Skip to content

ENH: allow preserving one of the indexes when merging two DataFrames #46882

Open
@multimeric

Description

@multimeric

Is your feature request related to a problem?

I want to be able to merge two DataFrames, but keep the index of the left one in the final result:

>>> import pandas as pd
>>> import string
>>> df1 = pd.DataFrame({"a": range(5), "b": range(10, 15)}, index=list(string.ascii_lowercase[:5]))
>>> df2 = pd.DataFrame({"a": range(5), "c": list(string.ascii_uppercase[:5])})
>>> df1
   a   b
a  0  10
b  1  11
c  2  12
d  3  13
e  4  14
>>> df2
   a  c
0  0  A
1  1  B
2  2  C
3  3  D
4  4  E

The current merge behaviour is to just drop the index entirely:

>>> df1.merge(df2, on="a")
   a   b  c
0  0  10  A
1  1  11  B
2  2  12  C
3  3  13  D
4  4  14  E

Describe the solution you'd like

We add a new parameter preserve_index to merge, which takes either "left", "right", or None

DataFrame.merge(preserve_index="left")

In my above example, this would work like:

>>> df1.merge(df2, on="a", preserve_index="left")
   a   b  c
a  0  10  A
b  1  11  B
c  2  12  C
d  3  13  D
e  4  14  E

API breaking implications

None. This is a new parameter, and if it is not provided the API is identical.

Describe alternatives you've considered

It is already possible to work around this by resetting the index and then setting it as an index again, as described here but this is:

  • More verbose
  • Not intuitive or clear to users (hence the StackOverflow question's popularity)
  • Probably less efficient

Metadata

Metadata

Assignees

Labels

EnhancementNeeds DiscussionRequires discussion from core team before further actionReshapingConcat, Merge/Join, Stack/Unstack, Explode

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions