Description
Background
I was writing a small project to compare the memory impact of iterators. I decided to also investigate the runtime impact and see how close Rust came to providing various zero cost abstractions to iteration. I compared three types of iteration:
fn filter_map_filter(nums: &[u64]) -> Vec<u64> {
nums.iter().filter(...).map(...).filter(...).collect()
}
fn fold(nums: &[u64]) -> Vec<u64> {
nums.iter().fold(Vec::new(), ...)
}
fn raw(nums: &[u64]) -> Vec<u64> {
let mut result = Vec::new();
for n in nums { ... }
result
}
and saw this Criterion output plot:
The outliers to the right are the timings of the fold
functions while the two pairs on the left are the filter_map_filter
and raw
versions. Rust did a great job ensuring .filter.map.filter
was the same speed as a raw loop but .fold
seemed to be lacking.
Quick Investigation
Looking at the source for .fold
the accum
is reassigned with the result of each invocation of f
. I quickly tested if this could be improved with a &mut
instead in this PR. The result was surprising (to me):
The "custom fold" method was faster than all the other options (which doesn't make a ton of sense to me but that's what y'all are here for!).
Path Forward
I initially was going to suggest adding some sort of fold_mut
or some better named method to allow for this faster fold
iterator. This could be a performance improvement in some areas and could also improve the syntax when the closure couldn't "easily" return the new accumulator:
iter.fold(Vec::new(), |v, x| {
v.push(x);
v // this line is a little weird
})
I made a branch for this if we want to head in that direction (the tests are slim, the benchmarks are probably overkill, the stability is missing, and the docs are probably slim and improperly formatted but it's a start!) and saw some improvements in the benchmarks I added:
Now I'm not sure if this "fold_mut
" path is the right way to go - I'm not sure if it's awkward or dangerous. It seems similar to Ruby's each_with_object
so there's maybe something there. It could also be that with some compiler witchcraft we can just make fold
a "true" zero cost abstraction.
In any case, thought I'd post here instead of making a PR so we could decided if there should be any PR and I'm happy to help with whatever path forward we choose!