-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Experimental changes in rolling_var
related to #7900
#7916
Conversation
@@ -1161,6 +1161,7 @@ def roll_var(ndarray[double_t] input, int win, int minp, int ddof=1): | |||
Numerically stable implementation using Welford's method. | |||
""" | |||
cdef double val, prev, mean_x = 0, ssqdm_x = 0, nobs = 0, delta | |||
cdef double rep = NaN, nrep = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are nobs and nrep doubles?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nobs
was a double before my changes (see here). I seem to recall that I tried changing it to Py_ssize_t
and there was some performance degradation, so I let it be. But I may very well be making this up, don't remember the exact details...
Since the effect of this new modifications on performance have to be measured, I'll try things both ways and add some information on the commit message if it stays a double.
I have refactored the code quite a bit, see what you think of this, @seth-p. The good news is that the performance differences seems negligible, and may even favor the new implementation, which I really find hard to understand. Following are timings over a random array of
|
can you add a test that validates this new behaviour (well not new, but 'correct')? |
It may take a couple of days until I get around to it, but if we think this is worth adding I will definitely write some tests for it. |
sure no problem I think definitely worthwhile avoiding pathological cases (because of rounding / precision issues) if possible (as u have shown) |
Agreed, I think it's worth handling degenerate cases accurately. |
if nobs == 1: | ||
# pathological case | ||
val = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps val = 0 if (nobs > ddof) else NaN
?
Added logic to `rolling_var` to detect windows where all non-NaN values are identical. Need to assess both correctness and performance impact.
how's this coming? |
@jaimefrio status? |
I was having some rebase problems last night, but it is mostly ready, Will get back to it tonight and push a hopefully final version. |
Couldn't solve my rebase issues, so created a new PR #8271. Closing this |
Added logic to
rolling_var
to detect windows where all non-NaNvalues are identical.
Need to assess both correctness and performance impact.