Closed
Description
xref #7895
When reindexing on a multiindex with method='ffill'
or method='bfill'
, it would be very useful to be able to restrict the fill to certain groups/levels of the index.
For example, consider the following:
In [1]: dates = pd.date_range(start=pd.Timestamp('20080102'),
periods=3, freq='7D')
In [2]: names = ['jane', 'john']
In [3]: index = pd.MultiIndex.from_product([names, dates],
names=['name', 'date'])
In [4]: df = pd.DataFrame(index=index,
data={'best_score': [1, 2, 3, None, 5, 6]})
In [5]: df
Out[5]:
best_score
name date
jane 2008-01-02 1
2008-01-09 2
2008-01-16 3
john 2008-01-02 NaN
2008-01-09 5
2008-01-16 6
In [7]: new_dates = [pd.Timestamp('20080101'), pd.Timestamp('20080117')]
In [8]: new_index = pd.MultiIndex.from_tuples([('jane', new_dates[1]),
('john', new_dates[0]),
('john', new_dates[1])],
names=['name', 'date'])
In [9]: df.reindex(new_index, method='ffill')
Out[9]:
best_score
name date
jane 2008-01-17 3
john 2008-01-01 3
2008-01-17 6
Clearly john
's score on 2008-01-01
is not 3, it's NaN. What would be great (ignoring the awful argument name) is something like:
df.reindex(new_index, method='ffill', fill_group_level=['name'])
best_score
name date
jane 2008-01-17 3
john 2008-01-01 NaN
2008-01-17 6
This generalizes to indexes with more than two levels. In effect, it amounts to being able to specify a set of boundaries for the ffill/bfill based on changes in level values. I don't think this can be done in a straightforward way with a groupby(level='name')
because the values of the index in the second level are not the same for every group.