-
-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Description
Code Sample
import pandas as pd
pd.__version__
# u'0.23.4'
a = pd.DataFrame(columns=['refA', 'dataA'], data=[[1, 'bla']])
b = pd.DataFrame(columns=['refB', 'dataB'], data=[[2, 'pff']]).set_index('refB')
m = a.merge(b, left_on='refA', right_index=True)
# merged dataframe contains only column from dataframe `a`
m.index
# Index([], dtype='object')
m.columns
# Index([u'refA', u'dataA', u'dataB'], dtype='object')
m
# Empty DataFrame
# Columns: [refA, dataA, dataB]
# Index: []
a = pd.DataFrame(columns=['refA', 'dataA'])
b = pd.DataFrame(columns=['refB', 'dataB'], data=[[1, 'pff']]).set_index('refB')
m = a.merge(b, left_on='refA', right_index=True)
# merged dataframe contains column from dataframe `a` and index from dataframe `b`
m.index
# Index([], dtype='object', name=u'refB')
m.columns
# Index([u'refA', u'dataA', u'dataB'], dtype='object')
m
# Empty DataFrame
# Columns: [refA, dataA, dataB]
# Index: []
Problem description
When merging an empty dataframe with another dataframe on column and index respectively, the output contains both the index and the column. If the left dataframe is non-empty, then only the column from the left dataframe is carried through while the index from the right dataframe disappears.
Expected Output
Consistency between the case where the left dataframe is empty and non-empty, e.g. don't include the index from the righthand dataframe in the merged dataframe. This avoids having to explicitly handle cases where one dataframe is empty in the calling code.
Output of pd.show_versions()
commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.23.4
pytest: 3.0.5
pip: 18.0
setuptools: 39.2.0
Cython: 0.25.2
numpy: 1.14.2
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 5.1.0
sphinx: 1.7.6
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2018.3
blosc: None
bottleneck: 1.2.1
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.2
bs4: 4.5.3
html5lib: None
sqlalchemy: 1.1.5
pymysql: None
psycopg2: 2.7.4 (dt dec pq3 ext lo64)
jinja2: 2.9.4
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None