Skip to content

limiting reindex with MultiIndex ffill/bfill within levels. #10347

Closed
@bwillers

Description

@bwillers

xref #7895

When reindexing on a multiindex with method='ffill' or method='bfill', it would be very useful to be able to restrict the fill to certain groups/levels of the index.

For example, consider the following:

In [1]: dates = pd.date_range(start=pd.Timestamp('20080102'), 
                              periods=3, freq='7D')
In [2]: names = ['jane', 'john']
In [3]: index = pd.MultiIndex.from_product([names, dates], 
                                           names=['name', 'date'])
In [4]: df = pd.DataFrame(index=index, 
                          data={'best_score': [1, 2, 3, None, 5, 6]})
In [5]: df
Out[5]:
                 best_score
name date
jane 2008-01-02           1
     2008-01-09           2
     2008-01-16           3
john 2008-01-02         NaN
     2008-01-09           5
     2008-01-16           6

In [7]: new_dates = [pd.Timestamp('20080101'), pd.Timestamp('20080117')]
In [8]: new_index = pd.MultiIndex.from_tuples([('jane', new_dates[1]),
                                               ('john', new_dates[0]), 
                                               ('john', new_dates[1])], 
                                              names=['name', 'date'])

In [9]: df.reindex(new_index, method='ffill')
Out[9]:
                 best_score
name date
jane 2008-01-17           3
john 2008-01-01           3
     2008-01-17           6

Clearly john's score on 2008-01-01 is not 3, it's NaN. What would be great (ignoring the awful argument name) is something like:

df.reindex(new_index, method='ffill', fill_group_level=['name'])

                 best_score
name date
jane 2008-01-17           3
john 2008-01-01           NaN
     2008-01-17           6

This generalizes to indexes with more than two levels. In effect, it amounts to being able to specify a set of boundaries for the ffill/bfill based on changes in level values. I don't think this can be done in a straightforward way with a groupby(level='name') because the values of the index in the second level are not the same for every group.

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementIndexingRelated to indexing on series/frames, not to indexes themselvesMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateMultiIndex

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions