Add an `Iterator::batch` transformer and explicit size hints for `FromIterator` and `Extend` #14271

erickt · 2014-05-18T02:48:36Z

This pull request is focused on adding a Iterator::batch transformer. batch allows you to batch up multiple values from a stream into one value. For example, this can be used for transforming a stream of a monotonically increasing infinite iterator like count(0, 1) into a stream of vec!(0, 1, 2), vec!(3, 4, 5), and etc. It can also be seen as a low level transformer primitive that could be used to implement a variety of iterator transformers, as you could use it to implement map, filter, reduce, and more.

Along the way this also adds support for passing an explicit size hints to .collect(), .from_iter() and .extend() in the form of .collect_with_capacity(), .from_iter_with_capacity() and .extend_with_capacity(). Iterator size hints and .collect() calls are generally conservative in the size of collections they preallocate. For example, a filter call like let xs: Vec<int> = range(1, 5).filter(|x| x != 3).collect() would initially allocate a zero-sized Vec<T> because the .filter() gives a lower bound of 0, and collect always uses the lower bounds. There is no way for the user to inform .collect() to preallocate space for 4 items. Similarly, there isn't a way to overallocate a collection with .collect(). These new methods allow the user to have better control over the memory allocation while still using generic methods.

I have to admit though, I'm not sure if it's better to create methods like .collect_with_capacity(), or if we should rename .size_hint() to .size_bounds(), provide a new Iterator.size_hint() that may be outside the bounds of the values, and a .sized_by() transformer that lets a user to provide a different hint than the lower bounds.

Finally, it also adds some .reserve_additional methods on some collections, and cleans up some dead code.

This transformer allows you to batch up multiple items in a stream of values into one value.

huonw · 2014-05-18T02:55:17Z

src/libcore/iter.rs

+            Some(it.by_ref().take(3).collect::<Vec<uint>>())
+        }).take(3);
+
+        assert!(it.next() == Some(vec!(0, 1, 2)));


assert_eq?

I couldn't get assert_eq!() to work in libcore because I can't get it to find the Show impl for Vec. I feel like I should be able to import Show, but for some reason that doesn't work,

I think it's because Vec comes from realstd, which means it's implementing a different Show than the one compiled in this libcore.

lilyball · 2014-05-18T03:05:29Z

cc me

lilyball · 2014-05-18T04:26:57Z

from_iter_with_capacity() is a bit unnecessary. We can just change from_iter() to take capacity and change collect() to pass the lower bound. Nobody calls FromIterator directly anyway, so there's no need to provide a convenience method from within that trait.

pub trait FromIterator<A> {
    /// Build a container with elements from an external iterator.
    ///
    /// `cap_hint` is a hint as to what the capacity of the collection should be.
    /// It is usually, but not always, the lower size bound of the iterator.
    fn from_iter<T: Iterator<A>>(iterator: T, cap_hint: uint) -> Self;
}

pub trait Iterator<A> {
    // ...
    fn collect<B: FromIterator<A>>(&mut self) -> B {
        let (lower, _) = self.size_hint();
        FromIterator::from_iter(self.by_ref(), lower)
    }
}

Similarly we don't need collect_with_capacity(). Actual uses of this functionality are likely to be rare enough that anyone who does need it can just use FromIterator::from_iter() directly.

lilyball · 2014-05-18T04:30:40Z

Aside from my previous comment, I'm generally in favor of this patch. I'm not thrilled about making the core method of Extendable into extend_with_capacity(), but there's no obvious alternative solution (beyond leaving this capability out of Extendable at all).

As for the suggested alternative to this patch, which is to change size_hint() into size_bound() and add a separate size_hint(), I don't think that will work. Notably, the wrapper Iterator that is returned by .sized_by() will not know how to appropriately adjust size_hint() as the iterator is consumed.

erickt · 2014-05-18T05:17:44Z

@kballard: I'm fine merging from_iter and from_iter_with_capacity. Regarding .collect_with_capacity(), I guess we could leave it for a possible future PR, but it certainly is nicer to be able to just call let x: Vec<int> = iter.collect_with_capacity(10) then to do:

use std::FromIterator;
...
let x: Vec<int> = FromIterator::from_iter(iter, 10);

It does have some nice symmetry with a .extend_with_capacity() method too.

Either way, I found maybe 10-20 spots in the stdlib we could use this _with_capacity feature, so you're right it's not common, but we could use it to simplify down some code.

lilyball · 2014-05-18T05:19:31Z

@erickt I'm largely just concerned about API bloat. Iterator already has 30 default methods.

alexcrichton · 2014-05-19T07:59:31Z

I would be more in favor of adding an iterator adaptor which modifies the return value of size_hint than extra methods for collect and friends. Something like:

[1, 2, 3, 4].iter().filter(|&i| i %2 == 0).size_hint()  // => (0, Some(4))
[1, 2, 3, 4].iter().filter(|&i| i %2 == 0).optimistic().size_hint()  // => (4, Some(4))

lilyball · 2014-05-19T08:07:29Z

@alexcrichton size_hint() is defined as a lower and upper bound. Every single Iterator in the standard libraries (that provides a size_hint()) conforms to this, and guarantees that it will vend at least as many elements as the lower bound of its size_hint(), and no more elements than its upper bound (if it has one). What you're proposing is a violation of this guarantee.

The only way to make this work is to redefine size_hint() so it's no longer bounds, or to provide an alternate value separate from the bounds that can be overridden.

A problem with doing this alternative hint is defining how it changes as the iterator is used up. The current size_hint() bounds provide their guarantee of correctness throughout the entire lifetime of the iterator. But if I create an iterator and override its alternative hint to be some specific number, how does the iterator know how that number should change as it's processed?

This is why I'm in favor of providing a way to override the capacity in from_iter(), as opposed to trying to define this estimate as a property of the iterator itself. I can calculate a good estimate when I first create my iterator, based on the knowledge I have about it (e.g. I know my filter() will only drop roughly 50% of the elements). But the iterator doesn't know how this estimate should change as it's consumed, nor do I know how to do that myself if I were given control over it.

alexcrichton · 2014-06-12T17:33:50Z

Closing due to inactivity, but feel free to reopen with a rebase!

Don't trigger unresolved method/field diagnostics on types containing errors

erickt added 4 commits May 17, 2014 10:15

serialize: Remove old commented out code

128e8c7

core: add a Iterator.batch method

b8b9a36

This transformer allows you to batch up multiple items in a stream of values into one value.

collections: add .reserve_additional methods to some collections

f4df666

core: add explicit capacity hint to collect, from_iter, and extend

017bdca

huonw reviewed May 18, 2014
View reviewed changes

erickt added 2 commits May 17, 2014 21:49

squash with with_capacity

787d756

core: add {option,result}::collect_with_capacity

6f7a953

erickt mentioned this pull request Jun 3, 2014

Add new iterator adapator .filter_scan() #14425

Closed

alexcrichton closed this Jun 12, 2014

lnicola pushed a commit to lnicola/rust that referenced this pull request Mar 13, 2023

Auto merge of rust-lang#14271 - Veykril:ty-diag-err, r=Veykril

ecc32c2

Don't trigger unresolved method/field diagnostics on types containing errors

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add an `Iterator::batch` transformer and explicit size hints for `FromIterator` and `Extend` #14271

Add an `Iterator::batch` transformer and explicit size hints for `FromIterator` and `Extend` #14271

Uh oh!

erickt commented May 18, 2014

Uh oh!

huonw May 18, 2014

Uh oh!

erickt May 18, 2014

Uh oh!

lilyball May 18, 2014

Uh oh!

lilyball commented May 18, 2014

Uh oh!

lilyball commented May 18, 2014

Uh oh!

lilyball commented May 18, 2014

Uh oh!

erickt commented May 18, 2014

Uh oh!

lilyball commented May 18, 2014

Uh oh!

alexcrichton commented May 19, 2014

Uh oh!

lilyball commented May 19, 2014

Uh oh!

alexcrichton commented Jun 12, 2014

Uh oh!

Uh oh!

Add an Iterator::batch transformer and explicit size hints for FromIterator and Extend #14271

Add an Iterator::batch transformer and explicit size hints for FromIterator and Extend #14271

Uh oh!

Conversation

erickt commented May 18, 2014

Uh oh!

huonw May 18, 2014

Choose a reason for hiding this comment

Uh oh!

erickt May 18, 2014

Choose a reason for hiding this comment

Uh oh!

lilyball May 18, 2014

Choose a reason for hiding this comment

Uh oh!

lilyball commented May 18, 2014

Uh oh!

lilyball commented May 18, 2014

Uh oh!

lilyball commented May 18, 2014

Uh oh!

erickt commented May 18, 2014

Uh oh!

lilyball commented May 18, 2014

Uh oh!

alexcrichton commented May 19, 2014

Uh oh!

lilyball commented May 19, 2014

Uh oh!

alexcrichton commented Jun 12, 2014

Uh oh!

Uh oh!

Add an `Iterator::batch` transformer and explicit size hints for `FromIterator` and `Extend` #14271

Add an `Iterator::batch` transformer and explicit size hints for `FromIterator` and `Extend` #14271