-
-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Open
Labels
BugMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolatenp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateResampleresample methodresample method
Description
Currently, interpolate() as part of resample() currently fills in all existing NaN values:
import pandas as pd
all_times = pd.date_range('2016-01-01', '2016-01-8')
times = all_times[1:3].append(all_times[4:-2]) # time coord with some missing days
s = pd.Series(range(len(times)), index=times)
>>> s
2016-01-02 0
2016-01-03 1
2016-01-05 2
2016-01-06 3
dtype: int64
>>> s.reindex(all_times)
2016-01-01 NaN
2016-01-02 0.0
2016-01-03 1.0
2016-01-04 NaN
2016-01-05 2.0
2016-01-06 3.0
2016-01-07 NaN
2016-01-08 NaN
Freq: D, dtype: float64
>>> s.reindex(all_times).resample('12H').interpolate()
2016-01-01 00:00:00 NaN
2016-01-01 12:00:00 NaN
2016-01-02 00:00:00 0.00
2016-01-02 12:00:00 0.50
2016-01-03 00:00:00 1.00
2016-01-03 12:00:00 1.25
2016-01-04 00:00:00 1.50
2016-01-04 12:00:00 1.75
2016-01-05 00:00:00 2.00
2016-01-05 12:00:00 2.50
2016-01-06 00:00:00 3.00
2016-01-06 12:00:00 3.00
2016-01-07 00:00:00 3.00
2016-01-07 12:00:00 3.00
2016-01-08 00:00:00 3.00
Freq: 12H, dtype: float64
This is inconsistent with the other fill methods, which only fill in NaNs introduced by upsampling:
>>> s.reindex(all_times).resample('12H').ffill()
2016-01-01 00:00:00 NaN
2016-01-01 12:00:00 NaN
2016-01-02 00:00:00 0.0
2016-01-02 12:00:00 0.0
2016-01-03 00:00:00 1.0
2016-01-03 12:00:00 1.0
2016-01-04 00:00:00 NaN
2016-01-04 12:00:00 NaN
2016-01-05 00:00:00 2.0
2016-01-05 12:00:00 2.0
2016-01-06 00:00:00 3.0
2016-01-06 12:00:00 3.0
2016-01-07 00:00:00 NaN
2016-01-07 12:00:00 NaN
2016-01-08 00:00:00 NaN
Freq: 12H, dtype: float64
I'd like to see resample's interpolate()
switch its behavior to no longer fill pre-existing NaNs. If those NaNs are not meaningful, it is straightforward to .drop()
them first if necessary. This behavior violates the model that upsampling should only depend on what index values are present, not the data values.
Some variation of this have come up in several other issues:
Metadata
Metadata
Assignees
Labels
BugMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolatenp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateResampleresample methodresample method