-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Use Welford's method in stats.moments.rolling_var #6817
Conversation
Some proof of the numerical stability claims:
Before this PR:
After this PR:
|
And some timings:
So it is about 25% slower, but I guess correctness has its price... |
since this would replace roll_var just delete the original (and call this roll_var) pls add the examples above as a test (which should fail with the current one ) but pass with the new one you will prob need to compare with np.allclose to avoid testing for exactness |
otoh does it make sense to provide this as an option (maybe method=...) so can change in a new version? it seems to me that is a direct replacement |
I don't think it should be possible for the user to revert back to the old method: although the PR is labelled ENH, it is in a way closer to a bug fix, you really shouldn't be computing the variance the way it was being done. So I think the 25% slowdown is a reasonable price to pay. Will add the tests and remove the old code. |
that sounds ok |
ping when ready for merge |
@jaimefrio hows this coming? |
@jaimefrio ping when you have updated...like to get in 0.14 |
@jreback Pushed a new commit with a simple, hard coded test. I am not very familiar with your testing setup, so I kind of hacked the test in. Do let me kow if you'd rather have it done some other way. On a separate note, the numerical stability of the
|
that test looks fine pls open a new issue for the skew/Kurt stability issues |
Updated the release notes, and opened issue #6929 |
pls rebase this |
rebased |
hmm....travis is not running this....odd..... can you make sure that the travis hook is set? |
This PR implements a modified version of Welford's method to compute the rolling variance. Instead of keeping track of the sum and sum of the squares of the items in the window, it tracks the mean and the sum of squared differences from the mean. This turns out to be (much) more numerically stable. The formulas to update these two variables when adding or removing an item from the sequence are well known, see e.g. http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance The formulas used when both adding one and removing one item I have not seen explicitly worked out anywhere, but are not too hard to come up with if you put pen to (a lot of) paper.
Travis is running it four times now, two in |
that's right it runs for each commit both on your private branchand pandas branch - the green check comes when it passes on master so ping when that's green |
All green and ready to go. |
ENH: Use Welford's method in stats.moments.rolling_var
thanks! great contribution! |
This PR implements a modified version of Welford's method to compute
the rolling variance. Instead of keeping track of the sum and sum of
the squares of the items in the window, it tracks the mean and the sum
of squared differences from the mean. This turns out to be (much) more
numerically stable.
The formulas to update these two variables when adding or removing an
item from the sequence are well known, see e.g.
http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
The formulas used when both adding one and removing one item I have
not seen explicitly worked out anywhere, but are not too hard to come
up with if you put pen to (a lot of) paper.