Skip to content

Merged index is different if left-hand dataframe is empty #22921

@carlfischerjba

Description

@carlfischerjba

Code Sample

import pandas as pd

pd.__version__
# u'0.23.4'

a = pd.DataFrame(columns=['refA', 'dataA'], data=[[1, 'bla']])
b = pd.DataFrame(columns=['refB', 'dataB'], data=[[2, 'pff']]).set_index('refB')
m = a.merge(b, left_on='refA', right_index=True)

# merged dataframe contains only column from dataframe `a`
m.index
# Index([], dtype='object')
m.columns
# Index([u'refA', u'dataA', u'dataB'], dtype='object')
m
# Empty DataFrame
# Columns: [refA, dataA, dataB]
# Index: []


a = pd.DataFrame(columns=['refA', 'dataA'])
b = pd.DataFrame(columns=['refB', 'dataB'], data=[[1, 'pff']]).set_index('refB')
m = a.merge(b, left_on='refA', right_index=True)

# merged dataframe contains column from dataframe `a` and index from dataframe `b`
m.index
# Index([], dtype='object', name=u'refB')
m.columns
# Index([u'refA', u'dataA', u'dataB'], dtype='object')
m
# Empty DataFrame
# Columns: [refA, dataA, dataB]
# Index: []

Problem description

When merging an empty dataframe with another dataframe on column and index respectively, the output contains both the index and the column. If the left dataframe is non-empty, then only the column from the left dataframe is carried through while the index from the right dataframe disappears.

Expected Output

Consistency between the case where the left dataframe is empty and non-empty, e.g. don't include the index from the righthand dataframe in the merged dataframe. This avoids having to explicitly handle cases where one dataframe is empty in the calling code.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------

commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.4
pytest: 3.0.5
pip: 18.0
setuptools: 39.2.0
Cython: 0.25.2
numpy: 1.14.2
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 5.1.0
sphinx: 1.7.6
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2018.3
blosc: None
bottleneck: 1.2.1
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.2
bs4: 4.5.3
html5lib: None
sqlalchemy: 1.1.5
pymysql: None
psycopg2: 2.7.4 (dt dec pq3 ext lo64)
jinja2: 2.9.4
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions