Closed
Description
Hello,
Please consider the following code :
import pandas as pd
import numpy as ny
dates = pd.date_range("2015-01-01", periods=10, freq="D")
ts = pd.TimeSeries(data=range(10), index=dates, dtype=ny.float64)
ts_mean = pd.rolling_mean(ts, 5)
print(ts)
2015-01-01 0
2015-01-02 1
2015-01-03 2
2015-01-04 3
2015-01-05 4
2015-01-06 5
2015-01-07 6
2015-01-08 7
2015-01-09 8
2015-01-10 9
Freq: D, dtype: float64
print(ts_mean)
2015-01-01 NaN
2015-01-02 NaN
2015-01-03 NaN
2015-01-04 NaN
2015-01-05 2
2015-01-06 3
2015-01-07 4
2015-01-08 5
2015-01-09 6
2015-01-10 7
Freq: D, dtype: float64
For the last date (2015-01-10), you should obtain 7, which corresponds to [5, 6, 7, 8, 9] mean value.
Now, please replace the 2015-01-03 value by -9+33 extreme value.
dates = pd.date_range("2015-01-01", periods=10, freq="D")
ts = pd.TimeSeries(data=range(10), index=dates, dtype=ny.float64)
ts[2] = -9e+33
print(ts)
2015-01-01 0.000000e+00
2015-01-02 1.000000e+00
2015-01-03 -9.000000e+33
2015-01-04 3.000000e+00
2015-01-05 4.000000e+00
2015-01-06 5.000000e+00
2015-01-07 6.000000e+00
2015-01-08 7.000000e+00
2015-01-09 8.000000e+00
2015-01-10 9.000000e+00
Freq: D, dtype: float64
And compute rolling_mean again :
ts_mean = pd.rolling_mean(ts, 5)
print(ts_mean)
2015-01-01 NaN
2015-01-02 NaN
2015-01-03 NaN
2015-01-04 NaN
2015-01-05 -1.800000e+33
2015-01-06 -1.800000e+33
2015-01-07 -1.800000e+33
2015-01-08 0.000000e+00
2015-01-09 1.000000e+00
2015-01-10 2.000000e+00
Freq: D, dtype: float64
As you can see, from the 2015-01-08, computation returns an incorrect result i.e [1, 2, 3] instead of [5, 6, 7]. The extreme value has introduced some perturbations in following date computation.
Best regards,