Skip to content

Work on performance issues in summary statistics due to using ArrayBase::sum #35

Open
@jturner314

Description

The summary statistics methods use ArrayBase::sum (directly or indirectly) in anticipation of pairwise summation (rust-ndarray/ndarray#577), which provides improved accuracy over naive summation using fold. However, to do this, some of the methods have unnecessary allocations or other performance issues.

For example, harmonic_mean is implemented like this:

self.map(|x| x.recip()).mean().map(|x| x.recip())

It's implemented this way to take advantage of .mean() (which is implemented in terms of .sum()), but this approach requires a temporary allocation for the result of self.map.

summary_statistics::means::moments has a similar issue:

for k in 2..=order {
    moments.push(a.map(|x| x.powi(k)).sum() / n_elements)
}

It's implemented this way to take advantage of .sum(). However, this implementation requires a temporary allocation for the result of a.map. Additionally, it would probably be faster to make the loop over k be the innermost loop to improve the locality of reference.

We should be able to resolve these issues with a lazy version of map combined with a pairwise summation method on that lazy map. Something like jturner314/nditer would work once it's stable.

[Edit: This issue also appears in the entropy methods.]

Metadata

Assignees

No one assigned

    Labels

    EnhancementNew feature or requestOn HoldIssues we can't act upon due to external constraints (e.g. missing Rust feature, other libraries..)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions