Skip to content

Iterator::fold is a little slow compared to bare loop #76725

Closed
@mlodato517

Description

@mlodato517

Background

I was writing a small project to compare the memory impact of iterators. I decided to also investigate the runtime impact and see how close Rust came to providing various zero cost abstractions to iteration. I compared three types of iteration:

fn filter_map_filter(nums: &[u64]) -> Vec<u64> {
  nums.iter().filter(...).map(...).filter(...).collect()
}

fn fold(nums: &[u64]) -> Vec<u64> {
  nums.iter().fold(Vec::new(), ...)
}

fn raw(nums: &[u64]) -> Vec<u64> {
  let mut result = Vec::new();
  for n in nums { ... }
  result
}

and saw this Criterion output plot:
criterion-plot

The outliers to the right are the timings of the fold functions while the two pairs on the left are the filter_map_filter and raw versions. Rust did a great job ensuring .filter.map.filter was the same speed as a raw loop but .fold seemed to be lacking.

Quick Investigation

Looking at the source for .fold the accum is reassigned with the result of each invocation of f. I quickly tested if this could be improved with a &mut instead in this PR. The result was surprising (to me):

criterion-plot-mut-ref

The "custom fold" method was faster than all the other options (which doesn't make a ton of sense to me but that's what y'all are here for!).

Path Forward

I initially was going to suggest adding some sort of fold_mut or some better named method to allow for this faster fold iterator. This could be a performance improvement in some areas and could also improve the syntax when the closure couldn't "easily" return the new accumulator:

iter.fold(Vec::new(), |v, x| {
  v.push(x);
  v // this line is a little weird
})

I made a branch for this if we want to head in that direction (the tests are slim, the benchmarks are probably overkill, the stability is missing, and the docs are probably slim and improperly formatted but it's a start!) and saw some improvements in the benchmarks I added:

benchmarks

Now I'm not sure if this "fold_mut" path is the right way to go - I'm not sure if it's awkward or dangerous. It seems similar to Ruby's each_with_object so there's maybe something there. It could also be that with some compiler witchcraft we can just make fold a "true" zero cost abstraction.

In any case, thought I'd post here instead of making a PR so we could decided if there should be any PR and I'm happy to help with whatever path forward we choose!

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-iteratorsArea: IteratorsI-slowIssue: Problems and improvements with respect to performance of generated code.T-libsRelevant to the library team, which will review and decide on the PR/issue.T-libs-apiRelevant to the library API team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions