Work on performance issues in summary statistics due to using ArrayBase::sum #35
Description
The summary statistics methods use ArrayBase::sum
(directly or indirectly) in anticipation of pairwise summation (rust-ndarray/ndarray#577), which provides improved accuracy over naive summation using fold
. However, to do this, some of the methods have unnecessary allocations or other performance issues.
For example, harmonic_mean
is implemented like this:
self.map(|x| x.recip()).mean().map(|x| x.recip())
It's implemented this way to take advantage of .mean()
(which is implemented in terms of .sum()
), but this approach requires a temporary allocation for the result of self.map
.
summary_statistics::means::moments
has a similar issue:
for k in 2..=order {
moments.push(a.map(|x| x.powi(k)).sum() / n_elements)
}
It's implemented this way to take advantage of .sum()
. However, this implementation requires a temporary allocation for the result of a.map
. Additionally, it would probably be faster to make the loop over k
be the innermost loop to improve the locality of reference.
We should be able to resolve these issues with a lazy version of map
combined with a pairwise summation method on that lazy map
. Something like jturner314/nditer would work once it's stable.
[Edit: This issue also appears in the entropy methods.]