Skip to content

Iterators generating (many) vectors #722

Closed
@Philippe-Cholet

Description

@Philippe-Cholet

Problem: slow because of repetitive heap-allocation

The 5 iterators below generate Vec<...> items.

multi_cartesian_product(self)
combinations(self, k: usize)
combinations_with_replacement(self, k: usize)
permutations(self, k: usize)
powerset(self)

It can be slow, because of the repetitive allocations, even if the user does not need to permanently store each permutation.

I think that in most cases, the user uses each vector to produce something else and then each vector is discarded. In that case, because those iterators generate (up to) many many items (product of the lengths, factorial-related, powers of two), it creates many temporary vectors which can slow down code execution.

Proposed solution: map slices

An alternative iterator could own one single (internal/private) vector and only give a reference to it to a function implementing FnMut(&[...]) -> R. Those iterators would then generate R items instead of vectors.

// current methods unchanged, new methods:
multi_cartesian_product_map<R, F>(self, func: F)
combinations_map<R, F>(self, k: usize, func: F)
combinations_with_replacement_map<R, F>(self, k: usize, func: F)
permutations_map<R, F>(self, k: usize, func: F)
powerset_map<R, F>(self, func: F)
// all with  F: FnMut(&[...]) -> R

Usage examples

it.combinations(k) and it.combinations_map(k, ToOwned::to_owned) would produce the same elements, but the first should be used!

it.combinations(k).map(closure) and it.combinations_map(k, closure) could produce the same elements if it does not require a vector. For what I tested, it's then roughly 4 to 5 times faster to go through all combinations, unless the iterator has very few elements (then .map should be used).

In it.permutations(5).filter_map(|perm: Vec<_>| some_option), if some_option does not need perm to be an owned vector then it could become it.permutations_map(5, |perm: &[_]| some_option).flatten() where there is no heap-allocation for each permutation.

Implementation (commits here)

New module vec_items (behind use_alloc feature) with:

  • pub trait VecItems<T> { type Output; fn new_item<I: Iterator<Item = T>>(&mut self, iter: I) -> Self::Output; }
  • pub struct CollectToVec; implementing VecItems with Vec<T> output and just iter.collect() (so a new vector each time, as it's currently done).
  • pub struct MapSlice<F, T> { func: F, vec: Vec<T> } implementing VecItems with R output where F: FnMut(&[T]) -> R and the internal vector is cleared but not dropped after use (and its length will not change often if ever).

E.g. for combinations_map:

  • In Itertools trait: #[cfg(feature = "use_alloc")] fn combinations_map<R, F: FnMut(&[Self::Item]) -> R>(self, k: usize, f: F) -> CombinationsMap<Self, F> where ...
  • in src/combinations.rs:
    • Convert Combinations<I> to CombinationsBase<I, F> with the new field manager: F
    • New (fused)iterator constraint F: VecItems<I::Item> and item type F::Output
    • New pub type Combinations<I> = CombinationsBase<I, CollectToVec>;
    • New pub type CombinationsMap<I, F> = CombinationsBase<I, MapSlice<F, <I as Iterator>::Item>>;
    • New fn combinations_map<I, F>(iter: I, k: usize, f: F) -> CombinationsMap<I, F>
  • Test to compare with combinations.

Very similar for the 4 others. powerset[_map] is a bit simpler because it just uses combinations[_map] internally ; and it was simpler for multi_cartesian_product than anticipated.

Closing thoughts

Avoid cloning in MapSlice case? I think(?) it might be possible to only have references: MapSlice { func: F, vec: Vec<&T> }. Then F: FnMut(&[&T]) -> R and no cloning. But I struggle with the lifetimes (or more probably some hard barrier). But is it worth it? Probably not if it's just integers, but otherwise?

The documentation I did not wrote yet would suggest to use the non _map variants if the user needs owned vectors and not slices (or if there is very little elements expected).

PS: I got the idea while replacing (0..nb).permutations(nb) (after using flamegraph and heaptrack) by a _map version of mine of permutohedron::Heap to avoid any (internal) heap-allocation while keeping a nice iterator structure. And for 8! == 40320 times, that was also 5 times faster.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions