WIP: Experimental changes in `rolling_var` related to #7900 #7916

jaimefrio · 2014-08-04T05:47:44Z

Added logic to rolling_var to detect windows where all non-NaN
values are identical.

Need to assess both correctness and performance impact.

seth-p · 2014-08-04T06:18:20Z

pandas/algos.pyx

@@ -1161,6 +1161,7 @@ def roll_var(ndarray[double_t] input, int win, int minp, int ddof=1):
    Numerically stable implementation using Welford's method.
    """
    cdef double val, prev, mean_x = 0, ssqdm_x = 0, nobs = 0, delta
+    cdef double rep = NaN, nrep = 0


Why are nobs and nrep doubles?

nobs was a double before my changes (see here). I seem to recall that I tried changing it to Py_ssize_t and there was some performance degradation, so I let it be. But I may very well be making this up, don't remember the exact details...

Since the effect of this new modifications on performance have to be measured, I'll try things both ways and add some information on the commit message if it stays a double.

jaimefrio · 2014-08-05T00:33:45Z

I have refactored the code quite a bit, see what you think of this, @seth-p. The good news is that the performance differences seems negligible, and may even favor the new implementation, which I really find hard to understand. Following are timings over a random array of size double values, using a window of window items. Timed values are the minimum of enough repeats to take 0.1 sec. total time:

                                       master         new
size:     100, window:    10, time: 4.06e-05 sec. 4.06e-05 sec. 
size:     100, window:    30, time: 4.23e-05 sec. 4.19e-05 sec. 
size:     300, window:    10, time: 4.68e-05 sec. 4.56e-05 sec. 
size:     300, window:    30, time: 4.39e-05 sec. 3.98e-05 sec. 
size:     300, window:   100, time: 5.34e-05 sec. 4.23e-05 sec. 
size:    1000, window:    10, time: 9.48e-05 sec. 8.66e-05 sec. 
size:    1000, window:    30, time: 8.01e-05 sec. 9.36e-05 sec. 
size:    1000, window:   100, time: 7.76e-05 sec. 1.22e-04 sec. 
size:    1000, window:   300, time: 7.72e-05 sec. 1.08e-04 sec. 
size:    3000, window:    10, time: 1.34e-04 sec. 1.63e-04 sec. 
size:    3000, window:    30, time: 1.77e-04 sec. 1.38e-04 sec. 
size:    3000, window:   100, time: 1.67e-04 sec. 1.34e-04 sec. 
size:    3000, window:   300, time: 1.63e-04 sec. 1.59e-04 sec. 
size:    3000, window:  1000, time: 1.29e-04 sec. 1.47e-04 sec. 
size:   10000, window:    10, time: 4.34e-04 sec. 4.12e-04 sec. 
size:   10000, window:    30, time: 4.39e-04 sec. 3.18e-04 sec. 
size:   10000, window:   100, time: 3.55e-04 sec. 2.82e-04 sec. 
size:   10000, window:   300, time: 3.59e-04 sec. 2.89e-04 sec. 
size:   10000, window:  1000, time: 4.66e-04 sec. 3.09e-04 sec. 
size:   10000, window:  3000, time: 4.58e-04 sec. 3.64e-04 sec. 
size:   30000, window:    10, time: 1.25e-03 sec. 1.02e-03 sec. 
size:   30000, window:    30, time: 8.98e-04 sec. 7.80e-04 sec. 
size:   30000, window:   100, time: 8.39e-04 sec. 1.09e-03 sec. 
size:   30000, window:   300, time: 8.08e-04 sec. 8.19e-04 sec. 
size:   30000, window:  1000, time: 1.06e-03 sec. 6.99e-04 sec. 
size:   30000, window:  3000, time: 7.99e-04 sec. 6.68e-04 sec. 
size:   30000, window: 10000, time: 6.61e-04 sec. 6.29e-04 sec. 
size:  100000, window:    10, time: 2.46e-03 sec. 2.25e-03 sec. 
size:  100000, window:    30, time: 2.83e-03 sec. 2.29e-03 sec. 
size:  100000, window:   100, time: 2.63e-03 sec. 2.44e-03 sec. 
size:  100000, window:   300, time: 2.50e-03 sec. 2.54e-03 sec. 
size:  100000, window:  1000, time: 2.41e-03 sec. 2.61e-03 sec. 
size:  100000, window:  3000, time: 2.63e-03 sec. 2.34e-03 sec. 
size:  100000, window: 10000, time: 2.38e-03 sec. 2.24e-03 sec. 
size:  300000, window:    10, time: 8.60e-03 sec. 8.85e-03 sec. 
size:  300000, window:    30, time: 8.44e-03 sec. 7.96e-03 sec. 
size:  300000, window:   100, time: 8.74e-03 sec. 8.11e-03 sec. 
size:  300000, window:   300, time: 8.72e-03 sec. 8.40e-03 sec. 
size:  300000, window:  1000, time: 8.45e-03 sec. 8.09e-03 sec. 
size:  300000, window:  3000, time: 8.76e-03 sec. 8.42e-03 sec. 
size:  300000, window: 10000, time: 9.08e-03 sec. 8.51e-03 sec. 
size: 1000000, window:    10, time: 3.17e-02 sec. 3.06e-02 sec. 
size: 1000000, window:    30, time: 3.21e-02 sec. 2.97e-02 sec. 
size: 1000000, window:   100, time: 3.40e-02 sec. 2.82e-02 sec. 
size: 1000000, window:   300, time: 3.17e-02 sec. 3.05e-02 sec. 
size: 1000000, window:  1000, time: 2.93e-02 sec. 3.12e-02 sec. 
size: 1000000, window:  3000, time: 2.94e-02 sec. 2.75e-02 sec. 
size: 1000000, window: 10000, time: 3.34e-02 sec. 2.81e-02 sec.

jreback · 2014-08-05T17:05:10Z

can you add a test that validates this new behaviour (well not new, but 'correct')?

jaimefrio · 2014-08-05T17:25:26Z

It may take a couple of days until I get around to it, but if we think this is worth adding I will definitely write some tests for it.

jreback · 2014-08-05T17:26:48Z

sure no problem

I think definitely worthwhile avoiding pathological cases (because of rounding / precision issues) if possible (as u have shown)

seth-p · 2014-08-05T17:30:50Z

Agreed, I think it's worth handling degenerate cases accurately.

seth-p · 2014-08-06T14:47:25Z

pandas/algos.pyx

            if nobs == 1:
+                # pathological case
                val = 0


Perhaps val = 0 if (nobs > ddof) else NaN?

Added logic to `rolling_var` to detect windows where all non-NaN values are identical. Need to assess both correctness and performance impact.

jreback · 2014-09-10T00:16:44Z

how's this coming?

jreback · 2014-09-14T14:14:56Z

@jaimefrio status?

jaimefrio · 2014-09-14T15:37:06Z

I was having some rebase problems last night, but it is mostly ready, Will get back to it tonight and push a hopefully final version.

jaimefrio · 2014-09-15T06:24:00Z

Couldn't solve my rebase issues, so created a new PR #8271. Closing this

jaimefrio mentioned this pull request Aug 4, 2014

BUG: Inconsistencies in calculating second moments of a single value #7900

Closed

seth-p reviewed Aug 4, 2014
View reviewed changes

jreback added Algos labels Aug 5, 2014

jreback added this to the 0.15.0 milestone Aug 5, 2014

seth-p reviewed Aug 6, 2014
View reviewed changes

pandas/algos.pyx

if nobs == 1:

# pathological case

val = 0

Copy link

Contributor

seth-p Aug 6, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps val = 0 if (nobs > ddof) else NaN?

WIP: Experimental changes in rolling_var related to pandas-dev#7900

a0100e6

Added logic to `rolling_var` to detect windows where all non-NaN values are identical. Need to assess both correctness and performance impact.

seth-p mentioned this pull request Sep 10, 2014

API/BUG/ENH: ewmvar/cov debiasing factors; add 'adjust' to ewmvar/std/vol/cov/corr; ewm*() min_periods #7926

Merged

jaimefrio closed this Sep 15, 2014

jaimefrio deleted the zero_var_detection branch September 16, 2014 17:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

WIP: Experimental changes in `rolling_var` related to #7900 #7916

WIP: Experimental changes in `rolling_var` related to #7900 #7916

Uh oh!

jaimefrio commented Aug 4, 2014

Uh oh!

seth-p Aug 4, 2014

Uh oh!

jaimefrio Aug 4, 2014

Uh oh!

jaimefrio commented Aug 5, 2014

Uh oh!

jreback commented Aug 5, 2014

Uh oh!

jaimefrio commented Aug 5, 2014

Uh oh!

jreback commented Aug 5, 2014

Uh oh!

seth-p commented Aug 5, 2014

Uh oh!

seth-p Aug 6, 2014

Uh oh!

jreback commented Sep 10, 2014

Uh oh!

jreback commented Sep 14, 2014

Uh oh!

jaimefrio commented Sep 14, 2014

Uh oh!

jaimefrio commented Sep 15, 2014

Uh oh!

Uh oh!

Uh oh!

WIP: Experimental changes in rolling_var related to #7900 #7916

WIP: Experimental changes in rolling_var related to #7900 #7916

Uh oh!

Conversation

jaimefrio commented Aug 4, 2014

Uh oh!

seth-p Aug 4, 2014

Choose a reason for hiding this comment

Uh oh!

jaimefrio Aug 4, 2014

Choose a reason for hiding this comment

Uh oh!

jaimefrio commented Aug 5, 2014

Uh oh!

jreback commented Aug 5, 2014

Uh oh!

jaimefrio commented Aug 5, 2014

Uh oh!

jreback commented Aug 5, 2014

Uh oh!

seth-p commented Aug 5, 2014

Uh oh!

seth-p Aug 6, 2014

Choose a reason for hiding this comment

Uh oh!

jreback commented Sep 10, 2014

Uh oh!

jreback commented Sep 14, 2014

Uh oh!

jaimefrio commented Sep 14, 2014

Uh oh!

jaimefrio commented Sep 15, 2014

Uh oh!

Uh oh!

WIP: Experimental changes in `rolling_var` related to #7900 #7916

WIP: Experimental changes in `rolling_var` related to #7900 #7916