big speedup for "Last N" statistics #41
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Use O(1) algorithms to compute the "Last N" min/max/mean/stddev statistics.
I often use
--lastwith a high N, e.g.prettyping -i 0.1 --last 6000 ...to watch for microbursts of loss or latency during the last 10 minutes.Currently, those "Last N" statistics (min/max/mean/mean-absolute-dev) are computed by passing through the whole Last N arrays each time, which becomes very slow and CPU intensive for high values of
--last.Luckily, there are relatively simple constant-time algorithms for these, which this PR switches too.
(It also needs to change from Mean Absolute Deviation to Standard Deviation, because I'm not aware of an online, constant-time algo for Mean Absolute Deviation. But Standard Deviation will be very close anyways, and that's how most other
pingtools do it.)Tested fairly extensively for correctness with nawk (macOS), gawk, mawk, and Busybox awk.
Benchmarking shows dramatic speedups as
--lastgrows.