Skip to content

Iterators generating (many) vectors (v2) #922

Open
@Philippe-Cholet

Description

@Philippe-Cholet

My #722 was rejected because it was not truly user-convenient nor truly performant but only in the middle.

Observation

But it can be more performant in some cases without sacrificing user convenience as I realized (while reviewing #914) that some iterator methods do not really need to give away all generated vectors:

  • We can do it with ONE vector item max: count, last, nth, find, (if double-ended: nth_back, rfind).
    And cmp, partial_cmp on which lt, le, gt, ge all rely, eq on which ne rely but I never saw the point of those methods and never used them.
  • We can do it with TWO vector items max (one for the result, one for the iteration): max_by on which max rely, max_by_key, min_by on which min rely, min_by_key.

Nightly (just to have an idea of what might come): one item (advance[_back]_by, try_find), two items (is_sorted, is_sorted_by).

(Except nth[_back],) all of those usually rely on fold/try_fold which requires to own every item.

Implementation idea

But we could implement a private lightweight try_fold that only take references to an internal item that we would carefully update step-by-step.
Then we can easily enough specialize multiple iterator methods, like find that probably has the wider usage (e.g. used by .filter(..)).

After some experiments...

/*private*/ trait LightweightIterator: Iterator + Sized {
    // I'm not sure we need `B`: a lightweight `try_for_each` might be enough. But pass an argument `()` is free!
    fn lw_try_fold<B, F, E>(&mut self, init: B, f: F) -> LoopExit<Self::Item, B, E>
    // ^^^               ^                               ^^^^^^^^ ^^^^^^^^^^^^
    where F: FnMut(B, &Self::Item) -> Result<B, E>;
    //                ^               ^^^^^^^^^^^^

    // Then it would provide "lw_" versions of:
    // count last nth find (cmp partial_cmp eq?) max_by max_by_key min_by min_by_key!
}

// `try_fold` basically returns `Result<B, E>`. With an additional internal item `T`:
enum LoopExit<T, B, E> {
    /// Iteration has ended.
    End {
        last_item: Option<T>,
        accum: B,
    },
    /// The function failed.
    Early {
        failed_item: T,
        err: E,
    },
}

Our iterators returning vectors are Combinations, Powerset, CombinationsWithReplacement, Permutations and MultiProduct.

  • count is already implemented for them.
  • They are not double-ended so we hopefully don't need a DoubleEndedLightweightIterator for nth_back and rfind.

Usage example

impl<I> LightweightIterator for Combinations<I> where ... {
    fn lw_try_fold<B, F, E>(&mut self, init: B, f: F) -> LoopExit<Self::Item, B, E>
    where F: FnMut(B, &Self::Item) -> Result<B, E> { ... }
}

impl<I> Iterator for Combinations<I> where ... {
    fn find<P>(&mut self, predicate: P) -> Option<Self::Item>
    where
        P: FnMut(&Self::Item) -> bool,
    {
        self.lw_find(predicate) // easy
    }
    // then at least... (max|min)_by[_key]
}

Plan

  • Specialize Combinations::{find, (max|min)_by[_key]}
    • Update specialization tests to consider more methods.
    • Update specialization benchmarks to consider more methods.
    • Define a private trait LightweightIterator.
    • Update CONTRIBUTING.md: Implement LightweightIterator for iterators generating vectors.
    • Implement LightweightIterator for Combinations and specialize some iterator methods.
    • Enjoy benchmarks
  • Implement LightweightIterator for Powerset and specialize some iterator methods.
  • Implement LightweightIterator for CombinationsWithReplacement and specialize some iterator methods.
  • Implement LightweightIterator for Permutations and specialize some iterator methods.
  • Implement LightweightIterator for MultiProduct and specialize some iterator methods.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions