Skip to content

resample().interpolate() should not fill pre-existing NaNs #17868

@shoyer

Description

@shoyer

Currently, interpolate() as part of resample() currently fills in all existing NaN values:

import pandas as pd

all_times = pd.date_range('2016-01-01', '2016-01-8')
times = all_times[1:3].append(all_times[4:-2])  # time coord with some missing days
s = pd.Series(range(len(times)), index=times)
>>> s
2016-01-02    0
2016-01-03    1
2016-01-05    2
2016-01-06    3
dtype: int64

>>> s.reindex(all_times)
2016-01-01    NaN
2016-01-02    0.0
2016-01-03    1.0
2016-01-04    NaN
2016-01-05    2.0
2016-01-06    3.0
2016-01-07    NaN
2016-01-08    NaN
Freq: D, dtype: float64

>>> s.reindex(all_times).resample('12H').interpolate()
2016-01-01 00:00:00     NaN
2016-01-01 12:00:00     NaN
2016-01-02 00:00:00    0.00
2016-01-02 12:00:00    0.50
2016-01-03 00:00:00    1.00
2016-01-03 12:00:00    1.25
2016-01-04 00:00:00    1.50
2016-01-04 12:00:00    1.75
2016-01-05 00:00:00    2.00
2016-01-05 12:00:00    2.50
2016-01-06 00:00:00    3.00
2016-01-06 12:00:00    3.00
2016-01-07 00:00:00    3.00
2016-01-07 12:00:00    3.00
2016-01-08 00:00:00    3.00
Freq: 12H, dtype: float64

This is inconsistent with the other fill methods, which only fill in NaNs introduced by upsampling:

>>> s.reindex(all_times).resample('12H').ffill() 
2016-01-01 00:00:00    NaN
2016-01-01 12:00:00    NaN
2016-01-02 00:00:00    0.0
2016-01-02 12:00:00    0.0
2016-01-03 00:00:00    1.0
2016-01-03 12:00:00    1.0
2016-01-04 00:00:00    NaN
2016-01-04 12:00:00    NaN
2016-01-05 00:00:00    2.0
2016-01-05 12:00:00    2.0
2016-01-06 00:00:00    3.0
2016-01-06 12:00:00    3.0
2016-01-07 00:00:00    NaN
2016-01-07 12:00:00    NaN
2016-01-08 00:00:00    NaN
Freq: 12H, dtype: float64

I'd like to see resample's interpolate() switch its behavior to no longer fill pre-existing NaNs. If those NaNs are not meaningful, it is straightforward to .drop() them first if necessary. This behavior violates the model that upsampling should only depend on what index values are present, not the data values.

Some variation of this have come up in several other issues:

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateResampleresample method

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions