-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rolling() gives values different from pd.rolling() #5877
Comments
Adding a few extra observations: ds_ex.rolling(time=3).mean().pr.values
df_ex.rolling(window=3).mean().values.T have a similar behaviour, in that once again But when I switch to other operations, like
array([ nan, nan, 0. , 0. , 0. , whereas
gives array([[ nan, nan, 0.00000000e+00, 0.00000000e+00, |
Yup - just followed your suggestion and:
and now the array([ nan, nan, 0. , 0. , 0. , could you elaborate more on the issue? is this because of some bouncing between precisions across packages? Thanks tho! |
AFAIK bottleneck uses a less precise algorithm for sums than numpy (pydata/bottleneck#379). However, I don't know why this yields 0 at the beginning but not at the end. A slightly more minimal example: import bottleneck as bn
import numpy as np
import pandas as pd
data = np.array(
[
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.31,
0.91999996,
8.3,
1.42,
0.03,
1.22,
0.09999999,
0.14,
0.13,
0.0,
0.12,
0.03,
2.53,
0.0,
0.19999999,
0.19999999,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
],
dtype="float32",
)
bn.move_sum(data, window=3)
pd.Series(data).rolling(3).mean()
np.convolve(data, np.ones(3), 'valid') / 3 |
I am not sure this is a bug - but it clearly doesn't give the results the user would expect.
The rolling sum of zeros gives me values that are not zeros
it gives me this result:
array([ nan, nan, 0.0000000e+00, 0.0000000e+00,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 3.1000000e-01,
1.2300000e+00, 9.5300007e+00, 1.0640000e+01, 9.7500000e+00,
2.6700001e+00, 1.3500001e+00, 1.4600002e+00, 3.7000012e-01,
2.7000013e-01, 2.5000012e-01, 1.5000013e-01, 2.6800001e+00,
2.5600002e+00, 2.7300003e+00, 4.0000033e-01, 4.0000033e-01,
2.0000035e-01, 3.5762787e-07, 3.5762787e-07, 3.5762787e-07,
3.5762787e-07, 3.5762787e-07, 3.5762787e-07, 3.5762787e-07,
3.5762787e-07, 3.5762787e-07, 3.5762787e-07], dtype=float32)
Note the non zero values - the non zero value changes depending on whether i use float64 or float32 as precision of my data. So this seems to be a precision related issue (although the first values are correctly set to zero), in fact other sums of values are not exactly what they should be.
The small difference at the 8th/9th decimal position can be expected due to precision, but the fact that the 0s become non zeros is problematic imho, especially if not documented. Oftentimes zero in geoscience data can mean a very specific thing (i.e. zero rainfall will be characterized differently than non-zero).
in pandas this instead works:
gives me
array([[ nan, nan, 0. , 0. , 0. ,
0. , 0. , 0.31 , 1.22999996, 9.53000015,
10.6400001 , 9.75000015, 2.66999999, 1.35000001, 1.46000002,
0.36999998, 0.27 , 0.24999999, 0.15 , 2.67999997,
2.55999997, 2.72999996, 0.39999998, 0.39999998, 0.19999999,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ]])
What you expected to happen:
the sum of zeros should be zero.
If this cannot be achieved/expected because of precision issues, it should be documented.
Anything else we need to know?:
I discovered this behavior in my old environments, but I created a new ad hoc environment with the latest versions, and it does the same thing.
Environment:
INSTALLED VERSIONS
commit: None
python: 3.9.7 (default, Sep 16 2021, 08:50:36)
[Clang 10.0.0 ]
python-bits: 64
OS: Darwin
OS-release: 17.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None
xarray: 0.19.0
pandas: 1.3.3
numpy: 1.21.2
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 58.0.4
pip: 21.2.4
conda: None
pytest: None
IPython: 7.28.0
sphinx: None
The text was updated successfully, but these errors were encountered: