Description
Code Sample, a copy-pastable example if possible
import pandas as pd
i = pd.MultiIndex.from_tuples([('a', 'b'), ('d', 'e')])
df = pd.DataFrame([[0, 7], [3, 4]], index=i, columns=['x', 'y'])
print(df)
# x y
# a b 0 7
# d e 3 4
i2 = pd.MultiIndex.from_tuples([('a', 'b'), ('d', 'e'), ('h', 'i')])
# same behavior
#i2 = pd.MultiIndex.from_tuples([('a', 'b', 'c'), ('d', 'e', 'f'), ('h', 'i', 'j')])
print(df.reindex(i2, axis=0, method='ffill'))
# x y
# a b 3.0 4.0
# d e NaN NaN
# h i 0.0 7.0
Problem description
The reindexing operation above introduces a row to the MultiIndex
. When no fill method is specified the new row is added and filled with NA as expected.
When ffill
is specified the behavior is not explainable for me. The index is updated as expected but a NA row is added in the middle of the existing data.
Expected Output
Not sure if ffill
for MultiIndexes is designed like this, but I was hoping for
# x y
# a b 3.0 4.0
# d e 0.0 7.0
# h i 0.0 7.0
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
pandas: 0.23.4
pytest: 3.7.1
pip: 18.1
setuptools: 39.0.1
Cython: None
numpy: 1.15.0
scipy: None
pyarrow: 0.11.1
xarray: None
IPython: 6.5.0
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: 1.6.1
bottleneck: 1.2.1
tables: None
numexpr: 2.6.8
feather: None
matplotlib: 3.0.2
openpyxl: 2.5.5
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: 1.2.10
pymysql: None
psycopg2: 2.7.5 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: 0.1.6
pandas_gbq: None
pandas_datareader: None