Description
Problem: slow because of repetitive heap-allocation
The 5 iterators below generate Vec<...>
items.
multi_cartesian_product(self)
combinations(self, k: usize)
combinations_with_replacement(self, k: usize)
permutations(self, k: usize)
powerset(self)
It can be slow, because of the repetitive allocations, even if the user does not need to permanently store each permutation.
I think that in most cases, the user uses each vector to produce something else and then each vector is discarded. In that case, because those iterators generate (up to) many many items (product of the lengths, factorial-related, powers of two), it creates many temporary vectors which can slow down code execution.
Proposed solution: map slices
An alternative iterator could own one single (internal/private) vector and only give a reference to it to a function implementing FnMut(&[...]) -> R
. Those iterators would then generate R
items instead of vectors.
// current methods unchanged, new methods:
multi_cartesian_product_map<R, F>(self, func: F)
combinations_map<R, F>(self, k: usize, func: F)
combinations_with_replacement_map<R, F>(self, k: usize, func: F)
permutations_map<R, F>(self, k: usize, func: F)
powerset_map<R, F>(self, func: F)
// all with F: FnMut(&[...]) -> R
Usage examples
it.combinations(k)
and it.combinations_map(k, ToOwned::to_owned)
would produce the same elements, but the first should be used!
it.combinations(k).map(closure)
and it.combinations_map(k, closure)
could produce the same elements if it does not require a vector. For what I tested, it's then roughly 4 to 5 times faster to go through all combinations, unless the iterator has very few elements (then .map
should be used).
In it.permutations(5).filter_map(|perm: Vec<_>| some_option)
, if some_option
does not need perm to be an owned vector then it could become it.permutations_map(5, |perm: &[_]| some_option).flatten()
where there is no heap-allocation for each permutation.
Implementation (commits here)
New module vec_items
(behind use_alloc
feature) with:
pub trait VecItems<T> { type Output; fn new_item<I: Iterator<Item = T>>(&mut self, iter: I) -> Self::Output; }
pub struct CollectToVec;
implementingVecItems
withVec<T>
output and justiter.collect()
(so a new vector each time, as it's currently done).pub struct MapSlice<F, T> { func: F, vec: Vec<T> }
implementingVecItems
withR
output whereF: FnMut(&[T]) -> R
and the internal vector is cleared but not dropped after use (and its length will not change often if ever).
E.g. for combinations_map
:
- In
Itertools
trait:#[cfg(feature = "use_alloc")] fn combinations_map<R, F: FnMut(&[Self::Item]) -> R>(self, k: usize, f: F) -> CombinationsMap<Self, F> where ...
- in
src/combinations.rs
:- Convert
Combinations<I>
toCombinationsBase<I, F>
with the new fieldmanager: F
- New (fused)iterator constraint
F: VecItems<I::Item>
and item typeF::Output
- New
pub type Combinations<I> = CombinationsBase<I, CollectToVec>;
- New
pub type CombinationsMap<I, F> = CombinationsBase<I, MapSlice<F, <I as Iterator>::Item>>;
- New
fn combinations_map<I, F>(iter: I, k: usize, f: F) -> CombinationsMap<I, F>
- Convert
- Test to compare with
combinations
.
Very similar for the 4 others. powerset[_map]
is a bit simpler because it just uses combinations[_map]
internally ; and it was simpler for multi_cartesian_product
than anticipated.
Closing thoughts
Avoid cloning in MapSlice
case? I think(?) it might be possible to only have references: MapSlice { func: F, vec: Vec<&T> }
. Then F: FnMut(&[&T]) -> R
and no cloning. But I struggle with the lifetimes (or more probably some hard barrier). But is it worth it? Probably not if it's just integers, but otherwise?
The documentation I did not wrote yet would suggest to use the non _map
variants if the user needs owned vectors and not slices (or if there is very little elements expected).
PS: I got the idea while replacing (0..nb).permutations(nb)
(after using flamegraph
and heaptrack
) by a _map
version of mine of permutohedron::Heap
to avoid any (internal) heap-allocation while keeping a nice iterator structure. And for 8! == 40320
times, that was also 5 times faster.