discussion: standard iterator interface #54245

ianlancetaylor · 2022-08-04T00:20:33Z

ianlancetaylor
Aug 4, 2022
Collaborator

This is a discussion that is intended to lead to a proposal.

This was written with lots of input from @jba and @rsc.

Background

Most languages provide a standardized way to iterate over values stored in containers using an iterator interface (see the appendix below for a discussion of other languages). Go provides for range for use with maps, slices, strings, arrays, and channels, but it does not provide any general mechanism for user-written containers, and it does not provide an iterator interface.

Go does have examples of non-generic iterators:

runtime.CallersFrames returns a runtime.Frames that iterates over stack frames; Frames has a Next method that returns a Frame and a bool that reports whether there are more frames.
bufio.Scanner is an iterator through an io.Reader, where the Scan method advances to the next value. The value is returned by a Bytes method. Errors are collected and returned by an Err method.
database/sql.Rows iterates through the results of a query, where the Next method advances to the next value and the value is returned by a Scan method. The Scan method can return an error.

Even this short list reveals that there are no common patterns in use today. This is in part because before generics were introduced, there was no way to write an interface that described an iterator. And of course there may be no simple pattern that will cover all of these use cases..

Today we can write an interface Iter[E] for an iterator over a container that has elements of type E. The existence of iterators in other languages shows that this is a powerful facility. This proposal is about how to write such an interface in Go.

What we want from Go iterators

Go is of course explicit about errors. Iterators over containers can't fail. For the most common uses it doesn't make any more sense to have iterators return an error than it does for a for range statement to return an error. Algorithms that use iterators should often behave differently when using iterators that can fail. Therefore, rather than try to combine non-failing and failing iterators into the same interface, we should instead return explicit errors from iterators that can fail. These errors can be part of the values returned by the iterator, or perhaps they can be returned as additional values.

Iterators have two fundamental operations: retrieve the current value, and advance to the next value. For Go we can combine these operations into a single method, as runtime.Frames does.

In the general case we may want to implement iterators with some additional state that is not trivially garbage collected, such as an open file or a separate goroutine. In C++, for example, this state would be cleared up by a destructor, but of course Go does not have destructors. Therefore, we should have some explicit way to indicate that we no longer need an iterator. This should be optional, as many iterators do not require any special cleanup. We should encourage iterators to use finalizers if necessary to clean up resources, and also to clean up after themselves when reaching the end of an iteration.

In Go the builtin type map permits values to be inserted and removed while iterating over the map, with well-defined behavior. In general for Go we should be flexible, though of course the program should never simply crash. We should let each container type define how it behaves if the container is modified while iterators are active. For example, container modification may cause arbitrary elements to be skipped or returned two or more times during the iteration. In some cases, hopefully rare, container modification may cause uses of existing iterators to panic, or to return values that have been removed from the container.

Proposal

We define a new package iter that defines a set of interfaces. The expectation is that containers and other types will provide functions and methods that return values that implement these interfaces. Code that wants to work with arbitrary containers will use the interfaces defined in this package. That will permit people to write functions that work with containers but are agnostic to the actual container type being used, much as interfaces like io.Reader permit code to be agnostic as the source of the data stream.

iter.Iter

The core interface in the iterators package is iter.Iter[E].

// Iter supports iterating over a sequence of values of type `E`.
type Iter[E any] interface {
	// Next returns the next value in the iteration if there is one,
	// and reports whether the returned value is valid.
	// Once Next returns ok==false, the iteration is over,
	// and all subsequent calls will return ok==false.
	Next() (elem E, ok bool)
}

We also define a related interface for containers, such as maps, for which elements inherently have two values.

// Iter2 is like Iter but each iteration returns a pair of values.
type Iter2[E1, E2 any] interface {
	Next() (E1, E2, bool)
}

An iterator that can fail will either return a single value that includes an error indication, or it will implement Iter2[E, error]. It's not yet clear which of those options is better.

As mentioned above, some iterators may have additional state that may be discarded when no more values are expected from an iterator (for example, a goroutine that sends values on a channel). Telling the iterator that no more values are expected is done using an optional interface that an iterator may implement.

// StopIter is an optional interface for Iter.
type StopIter[E any] interface {
	Iter[E]

	// Stop indicates that the iterator will no longer be used.
	// After a call to Stop, future calls to Next may panic.
	// Stop may be called multiple times;
	// all calls after the first will have no effect.
	Stop()
}

// StopIter2 is like StopIter, but for Iter2.
type StopIter2[E1, E2 any] interface {
	Iter2[E1, E2]
	Stop()
}

The Stop method should always be considered to be an optimization. The program should work correctly even if Stop is never called. If an iterator is read to the end (until Next returns false) calling Stop should be a no-op. If necessary, iterator implementations should use finalizers to clean up cases where Stop is not called.

As a matter of programming style, the code that calls a function to obtain a StopIter is responsible for calling the Stop method. A function that accepts an Iter should not use a type assertion to detect and call the Stop method. This is similar to the way that a function that accepts an io.Reader should not use a type assertion to detect and call the Close method.

iter.New functions

iter.Iter provides a convenient way for the users of a container to iterate over its contents. We also want to consider the other side of that operation, and provide convenient ways for containers to define iterators.

// NewGen creates a new iterator from a generator function gen.
// The gen function is called once.  It is expected to call
// yield(v) for every value v to be returned by the iterator.
// If yield(v) returns false, gen must stop calling yield and return.
func NewGen[E any](gen func(yield func(E) bool)) StopIter[E]

// NewGen2 is like NewGen for Iter2.
func NewGen2[E1, E2 any](gen func(yield func(E1, E2) bool)) StopIter2[E1, E2]

An appendix below discusses how these functions can be implemented efficiently.

Simpler containers may be able to easily capture all required state in a function.

// NewNext creates a new iterator from a next function.
// The next function is called for each call of the iterator's Next method.
func NewNext[E any](next func (E, bool)) Iter[E]

// NewNext2 is like NewNext for Iter2.
func NewNext2[E1, E2 any](next func (E1, E2, bool)) Iter2[E1, E2]

iterators for standard containers

The iter package will define iterators for the builtin container types.

// FromChan returns an iterator over a channel.
func FromChan[E any](<-chan E) Iter[E]

// FromMap returns an iterator over a map.
func FromMap[K comparable, V any](map[K]V) Iter2[K, V]

// FromSlice returns an iterator over a slice.
func FromSlice[E any]([]E) Iter[E]

Functions that accept iterators

The iter package could define functions that operate on iterators. We should be conservative here to start. It's not yet clear which of these functions will be useful.

// Map returns a new iterator whose elements are f applied to
// the elements of it.
func Map[E1, E2 any](f func(E1) E2, it Iter[E1]) Iter[E2]

// Filter returns a new iterator whose elements are those
// elements of it for which f returns true.
func Filter[E any](f func(E) bool, it Iter[E]) Iter[E]

// Reduce uses a function to reduce the elements of an
// iterator to a single value.  The init parameter is
// passed to the first call of f.  If the input iterator
// is empty, the result is init.
func Reduce[E1, E2 any](f func(E2, E1) E2, it Iter[E1], init E2) E2

// ToSlice collects the elements of the iterator into a slice.
// [ Perhaps this should be slices.FromIter. ]
func ToSlice[E any](it Iter[E]) []E

// ToMap collects the elements of the iterator into a map.
// [ Perhaps this should be maps.FromIter. ]
func ToMap[K comparable, V any](it Iter2[K, V]) map[K]V

// Concat returns the concatenation of two iterators.
// The resulting iterator returns all the elements of the
// first iterator followed by all the elements of the second.
func Concat[E any](it1, it2 Iter[E]) Iter[E]

Range loops

The for range syntax will be expanded to support iterators. Note that this is the only language change in this proposal. Everything else is library code and programming conventions.

If the argument to range implements Iter[E] or Iter2[E1, E2], then the loop will iterate through the elements of the iterator. For example, this code:

for e := range it {
	// statements
}

will be equivalent to this code:

for e, _ok := it.Next(); _ok; e, _ok = it.Next() {
	// statements
}

Here _ok is a hidden variable that is not seen by user code.

Note that breaking out of the loop will leave the iterator at that position, such that Next will return the next elements that the loop would have seen.

Using range with an Iter2[E1, E2] will permit using two variables in the for statement, as with range over a map.

Compatibility note: if the type of it is a slice, array, pointer-to-array, string, map, or channel type, then the Next method will be ignored and the for range will operate in the usual way. This is required for backward compatibility with existing code.

Because it's inconvenient to write for v := range c.Range(), we propose a further extension: we permit range c if c has a method Range that returns a value that implements Iter[E] or Iter2[E]. If the Range method implements StopIter[E] or StopIter2[E] then the range loop will ensure that Stop is called when exiting the loop. (Here whether the result implements StopIter is a static type check, not a dynamic type assertion: if the Range method returns the type Iter[E], Stop will not be called even if the actual type has a Stop method.)

For example:

for v := range c {
	// statements
}

where c.Range returns a value that implements StopIter[E], is roughly equivalent to:

_it := c.Range()
defer _it.Stop()
for e, _ok := it.Next(); _ok; e, _ok = it.Next() {
	// statements
	// Any goto L or continue L statement where L is outside the loop
	// is replaced by
	//   _it.Stop(); goto L (or continue L)
}
_it.Stop()

The compiler will arrange for _it.Stop to be called if the loop statements panic, even if some outer defer in the function recovers the panic. That is, the defer in the roughly equivalent code is run when leaving the loop, not just when leaving the function.

Note that if we adopt this change it will be the first case in which a language construct invokes a user-defined method.

That is all

That completes the proposal.

Optional future extensions

We can use optional interfaces to extend the capabilities of iterators.

For example, some iterators permit deleting an element from a container.

// DeleteIter is an Iter that implements a Delete method.
type DeleteIter[E any] interface {
	Iter[E]

	// Delete deletes the current iterator element;
	// that is, the one returned by the last call to Next.
	// Delete should panic if called before Next or after
	// Next returns false.
	Delete()
}

We could then implement

// Delete removes all elements from it which f returns true.
func Delete[E any](it DeleteIter[E], f func(E) bool)

Similarly some iterators permit setting a value.

// SetIter is an Iter that implements a Set method.
type SetIter[E any] interface {
	Iter[E]

	// Set replaces the current iterator element with v.
	// Set should panic if called before Next or after
	// Next returns false.
	Set(v E)
}

We could then implement

// Replace replaces all elements e with f(e).
func Replace[E any](it SetIter[E], f func(e) E)

Bi-directional iterators can implement a Prev method.

// PrevIter is an iterator with a Prev method.
type PrevIter[E any] interface {
	Iter[E]

	// Prev moves the iterator to the previous position.
	// After calling Prev, Next will return the value at
	// that position in the container. For example, after
	//   it.Next() returning (v, true)
	//   it.Prev()
	// another call to it.Next will again return (v, true).
	// Calling Prev before calling Next may panic.
	// Calling Prev after Next returns false will move
	// to the last element, or, if there are no elements,
	// to the iterator's initial state.
	Prev()
}

This is just a sketch of possible future directions. These ideas are not part of the current proposal. However, we want to deliberately leave open the possibility of defining additional optional interfaces for iterators.

Examples

This is some example code showing how to use and create iterators. If a function in this section is not mentioned above, then it is purely an example, and is not part of the proposal.

// ToSlice returns a slice containing all the elements in an iterator.
// [ This might be in the slices package, as slices.FromIter. ]
func ToSlice[E any](it iter.Iter[E]) []E {
	var r []E
	for v := range it {
		r = append(r, v)
	}
	return r
}

// ToSliceErr returns a slice containing all the elements
// in an iterator, for an iterator that can fail.
// The iteration stops on the first error.
// This is just an example, this may not be the best approach.
func ToSliceErr[E any](it iter.Iter2[E, error]) ([]E, error) {
	var r []E
	for v, err := range it {
		if err != nil {
			return nil, err
		}
		r = append(r, v)
	}
	return r
}

// Map returns a new iterator that applies f to each element of it.
func Map[E1, E2 any](f func(E1) E2, it Iter[E1]) Iter[E2] {
	return iter.NewNext(func() (E2, bool) {
		e, ok := it.Next()
		var r E2
		if ok {
			r = f(e)
		}
		return r, ok
	})
}

// Filter returns a new iterator that only contains the elements of it
// for which f returns true.
func Filter[E any](f func(E) bool, it Iter[E]) Iter[E] {
	return iter.NewNext(func() (E, bool) {
		for {
			e, ok := it.Next()
			if !ok || f(e) {
				return e, ok
			}
		}
	})
}

// Reduce reduces an iterator to a value using a function.
func Reduce[E1, E2 any](f func(E2, E1) E2, it Iter[E1], init E2) E2
	r := init
	for v := range it {
		r = f(r, v)
	}
	return r
}

// iter.FromSlice returns an iterator over a slice.
// For example purposes only, this iterator implements
// some of the optional interfaces mentioned earlier.
func FromSlice[E any](s []E) Iter[E] {
	return &sliceIter[E]{
		s: s,
		i: -1,
	}
}

type sliceIter[E any] struct {
	s []E
	i int
}

func (it *sliceIter[E]) Next() (E, bool) {
	it.i++
	ok := it.i >= 0 && it.i < len(it.s)
	var v E
	if ok {
		v = it.s[it.i]
	}
	return v, ok
}

// Prev implements PrevIter.
func (it *sliceIter[E]) Prev() [
	it.i—-
}

// Set implements SetIter.
func (it *sliceIter[E]) Set(v E) {
	it.s[it.i] = v
}

// FromChan returns an iterator for a channel.
func FromChan[E any](c <-chan E) Iter[E] {
	return iter.NewNext(func() (E, bool) {
		v, ok := <-c
		return v, ok
	})
}

// NewNext takes a function that returns (v, bool) and returns
// an iterator that calls the function until the second result is false.
func NewNext[E any](f func() (E, bool)) Iter[E] {
	return funcIter[E](f)
}

// funcIter is used by NewNext to implement Iter.
type funcIter[E any] func() (E, bool)

// Next implements Iter.
func (f funcIter[E]) Next() (E, bool) {
	return f()
}

// Equal reports whether two iterators have the same values
// in the same order.
func Equal[E comparable](it1, it2 Iter[E]) bool {
	for {
		v1, ok1 := it1.Next()
		v2, ok2 := it2.Next()
		if v1 != v2 || ok1 != ok2 {
			return false
		}
		if !ok1 {
			return true
		}
	}
}

// Merge takes two iterators that are expected to be in sorted order,
// and returns a new iterator that merges the two into a single
// iterator in sorted order.
func MergeIter[E constraints.Ordered](it1, it2 iter.Iter[E]) iter.Iter[E] {
	val1, ok1 := it1.Next()
	val2, ok2 := it2.Next()
	return &mergeIter[E]{
		it1:  it1,
		it2:  it2,
		val1: val1,
		ok1:  ok1,
		val2: val2,
		ok2:  ok2,
	}
}

type mergeIter[E constraints.Ordered] struct {
	it1, it2   iter.Iter[E]
	val1, val2 E
	ok1, ok2   bool
}

func (m *mergeIter[E]) Next() (E, bool) {
	var r E
	if m.ok1 && m.ok2 {
		if m.val1 < m.val2 {
			r = m.val1
			m.val1, m.ok1 = m.it1.Next()
		} else {
			r = m.val2
			m.val2, m.ok2 = m.it2.Next()
		}
		return r, true
	}
	if m.ok1 {
		r = m.val1
		m.val1, m.ok1 = m.it1.Next()
		return r, true
	}
	if m.ok2 {
		r = m.val2
		m.val2, m.ok2 = m.it2.Next()
		return r, true
	}
	return r, false
}

// Tree is a binary tree.
type Tree[E any] struct {
	val         E
	left, right *Tree[E]
}

// Range returns an in-order iterator over the tree.
// This shows how to use iter.NewGen to iterate over a
// complex data structure.
func (t *Tree[E]) Range() iter.StopIter[E] {
	return iter.NewGen(t.gen)
}

// gen is used by Range.  This is here just because we want
// to return bool from t.iterate but iter.NewGen takes a function
// with no results.
func (t *Tree[E]) gen(yield func(E) bool) {
	t.iterate(yield)
}

// iterate is used by Range.
func (t *Tree[E]) iterate(yield func(E) bool) bool {
	if t == nil {
		return true
	}
	// Keep providing values until yield returns false
	// or we have finished the tree.
	return t.left.iterate(yield) &&
		yield(t.val) &&
		t.right.iterate(yield)
}

// SQLRowsRange returns an iterator over a sql.Rows.
// This shows one way to adapt an existing iteration
// mechanism to the new interface.
// We wouldn't design things this way from scratch.
func SQLRowsRange(r *sql.Rows) iter.StopIter[SQLRowVal] {
	it := &SQLRowsIter{r}
	runtime.SetFinalizer(it, r.Close)
	return it
}

// SQLRowsIter implements iter.Iter[SQLRowVal].
type SQLRowsIter struct {
	r *sql.Rows
}

// Next implements iter.Next.
func (it *SQLRowsIter) Next() (SQLRowVal, bool) {
	ok := it.r.Next()
	var rit SQLRowVal
	if ok {
		rit = SQLRowVal{r}
	} else {
		it.r.Close()
		runtime.SetFinalizer(it, nil)
	}
	return rit, ok
}

// Stop implements iter.StopIter.
func (it *SQLRowsIter) Stop() {
	// We don't care about the error result here.
	// It's never a new error from the close itself,
	// just a saved error from earlier.
	// If the caller cares, they should check during the loop.
	it.r.Close()
	runtime.SetFinalizer(it, nil)
}

// SQLRowVal is an iteration value.
type SQLRowVal struct {
	r *sql.Rows
}

// Err returns any error for the current row.
func (i1 SQLRowVal) Err() error {
	return i1.r.Err()
}

// Scan fetches values from the current row.
func (i1 SQLRowVal) Scan(dest ...any) error {
	return i1.r.Scan(dest...)
}

// Total is an example of how SQLRowsRange might be used.
// Note how the function uses the proposed Map and Reduce functions.
func Total(r *sql.Rows) (int, error) {
	var rowsErr error
	toInt := func(i1 SQLRowVal) int {
		if err := i1.Err(); err != nil {
			rowsErr = err
			return 0
		}
		var v int
		if err := i1.Scan(&v); err != nil {
			rowsErr = err
		}
		return v
	}
	it := SQLRowsRange(r)
	defer it.Stop()
	ints := iter.Map(toInt, it)
	r := iter.Reduce(func(v1, v2 int) int { return v1 + v2 }, ints, 0)
	// Capture an error that was not returned by any iteration, if any.
	if rowsErr == nil {
		rowsErr = r.Err()
	}
	return r, rowsErr
}

Appendix: Iterators in other languages

C++

The C++ Standard Template Library defines a variety of iterator APIs. These are consistently implemented by C++ containers and are also used by other types such as files and streams. This makes it possible to write standard algorithms that work with all C++ containers.

C++ containers provide begin and end methods that return iterators. The begin method returns an iterator that refers to the beginning of the container. The end method returns an iterator that refers to the position just past the end of the container. Iterators to the same container may be compared for equality using the == and != operators. Any valid iterator (not the iterator returned by end) refers to a value in the container. That value is accessible using the unary * operator (which is the pointer dereference operator, thus iterators act like pointers into the container, and ordinary pointers act like iterators). The unary ++ operator advances the iterator to refer to the next element in the container. For any C++ container one can loop over all elements in the container by writing

  for (containerType::iterator p = c.begin(); p != c.end(); ++p)
    doSomething(*p);

As of C++11 this pattern is built into the language via the range-based for loop.

  for (auto&& var : container)
    doSomething(var);

This calls the begin and end methods of the container and loops as shown above.

Some C++ iterators have optional additional capabilities. Iterators can be grouped into five types.

Input iterators support the operations described above. They can be used to do a single sequential pass over a container. Example: an iterator that reads values from a file.
Output iterators permit setting a value through the iterator (*p = v), but do not permit retrieving it. Example: an iterator that writes values to a file.
Forward iterators support both input and output operations. Example: an iterator over a singly linked list.
Bidirectional iterators additionally support the unary -- operator to move to the preceding element. Example: an iterator over a doubly linked list.
Random access iterators additionally support adding or subtracting an integer, and getting the difference between two iterators to get the number of values between them, and comparing two iterators using < and friends, and indexing off an iterator to refer to a value. Example: an iterator over a slice (which C++ calls a vector).

C++ algorithms can use function overloading to implement the same algorithm in different ways depending on the characteristics of the iterator. For example, std::reverse, which reverses the elements in a container, can be implemented with a bidirectional iterator, but uses a more efficient algorithm when called with a random access iterator.

C++ iterators do not provide any form of error handling. Iterators over containers typically can't fail. An iterator associated with a file handles I/O errors by setting the file object into an error state, which can optionally cause an exception to be thrown.

Each C++ container type defines rules for when a modification to the container invalidates existing iterators. For example, inserting an element in a linked list does not invalidate iterators pointing to other elements, but inserting an element in a vector may invalidate them. Using an invalid iterator is undefined behavior, which can cause the program to crash or arbitrarily misbehave.

Java

Java also supports iterators to step through containers. Java iterators are much simpler than C++ iterators. Java defines an interface Iterator<E> that has three main methods: hasNext, next, and remove. Calling next in a situation where hasNext would return false will throw an exception, and in general Java iterators throw an exception for any error. (By the way, in C++ removing an iterator from a container is generally implemented as an erase method on the container type that takes an iterator as an argument.)

A Java container will have an iterator method that returns an Iterator that walks over the elements of the container. This too is described as a Java interface: Iterable<E>.

Java has a iterator using loop syntax like that of C++11 (C++ copied the syntax from Java):

  for (elementType var : container)
    doSomething(var);

This calls the iterator method on the container and then calls hasNext and next in a loop.

As far as I know Java does not have a standard implementation of output iterators or random access iterators. Specific containers will implement iterators with an additional set method that permits changing the value to which the iterator refers.

If a Java iterator is used after a container is modified in some way that the iterator can't support, the iterator methods will throw an exception.

Python

A container will implement an __iter__ method that returns an iterator object. An iterator will implement a __next__ method that returns the next element in the container, and raises a StopIteration exception when done. Code will normally call these methods via the builtin iter and next functions.

The Python for loop supports iterators.

  for var in container:
    doSomething(var)

This calls iter and next, and handles the StopIteration exception, as one would expect.

Python iterators generally don't permit modifying the container while an iterator is being used, but it's not clear to me precisely how they behave when it happens.

Discussion

For C++ and Python, iterators are a matter of convention: any type that implements the appropriate methods can return an iterator, and an iterator itself must simply implement the appropriate methods and (for C++) operator overloads. For Java, this is less true, as iterators explicitly implement the Iterator<E> interface. The for loop in each language just calls the appropriate methods.

These conventions are powerful because they permit separating the details of an algorithm from the details of a container. As long as the container implements the iterator interface, an algorithm written in terms of iterators will work.

Iterators do not handle errors in any of these languages. This is in part because errors can be handled by throwing exceptions. But it is also because iterating over a container doesn't fail. Iteration failure is only possible when a non-container, such as a file, is accessed via the iterator interface.

It's worth noting that the C++ use of paired begin and end iterators permit a kind of sub-slicing, at least for containers that support bidirectional or random access iterators.

Appendix: Efficient implementation of `iter.NewGen`

The natural way to implement iter.NewGen is to use a separate goroutine and a channel. However, we know from experience that that will be inefficient due to scheduling delays. A more efficient way to implement NewGen will be to use coroutines: let the generator function produce a new value and then do a coroutine switch to the code using the iterator. When that code is ready for the next value, do a coroutine switch back to the generator function. A coroutine switch can be fast: simply change the stack pointer and reload the registers. No need to go through the scheduler.

Of course Go doesn't have coroutines, but we can use compiler optimizations to achieve the same effect without any language changes. This approach, and much of the text below, is entirely due to @rsc.

First, we identify programming idioms that provide concurrency without any opportunity for parallelism, such as a send immediately followed by a receive. Second, we adjust the compiler and runtime to recognize the non-parallel idioms and optimize them to simple coroutine switches instead of using the thread-aware goroutine scheduler.

Coroutine idioms

A coroutine switch must start another goroutine and then immediately stop the current goroutine, so that there is no opportunity for parallelism.

There are three common ways to start another goroutine: a go statement creating a new goroutine, a send on a channel where a goroutine is blocked, and a close on a channel where a goroutine is blocked.

There are three common ways to immediately stop the current goroutine: a receive of one or two values (with or without comma-ok) from a channel with no available data and a return from the top of a goroutine stack, exiting the goroutine.

Optimizations

The three common goroutine starts and three common goroutine stops combine for nine possible start-stop pairs. The compiler can recognize each pair and translate each to a call to a fused runtime operation that does both together. For example a send compiles to chansend1(c, &v) and a receive compiles to chanrecv1(c, &v). A send followed by a receive can compile to chansend1recv1(c1, &v1, c2, &v2).

The compiler fusing the operations creates the opportunity for the runtime to implement them as coroutine switches. Without the fusing, the runtime cannot tell whether the current goroutine is going to keep running on its own (in which case parallelism is warranted) or is going to stop very soon (in which case parallelism is not warranted). Fusing the operations lets the runtime correctly predict the next thing the goroutine will do.

The runtime implements each fused operation by first checking to see if the operation pair would start a new goroutine and stop the current one. If not, it falls back to running the two different operations sequentially, providing exactly the same semantics as the unfused operations. But if the operation pair does start a new goroutine and stop the current one, then the runtime can implement that as a direct switch to the new goroutine, bypassing the scheduler and any possible confusion about waking new threads (Ms) or trying to run the two goroutines in different threads for a split second.

Note that recognizing these coroutine idioms would have potential uses beyond iterators.

NewGen

Here is an implementation of iter.NewGen that takes advantage of this technique.

// NewGen creates a new iterator from a generator function gen.
// The gen function is called once.  It is expected to call
// yield(v) for every value v to be returned by the iterator.
// If yield(v) returns false, gen must stop calling yield and return.
func NewGen[E any](gen func(yield func(E) bool)) StopIter[E] {
	cmore := make(chan bool)
	cnext := make(chan E)

	generator := func() {
		// coroutine switch back to client until Next is called (1)
		var zero E
		cnext <- zero
		if !<-cmore {
			close(cnext)
			return
		}
		gen(func(v E) bool {
			// coroutine switch back to client to deliver v (2)
			cnext <- v
			return <-cmore
		})

		// coroutine switch back to client marking end (3)
		close(cnext)
	}

	// coroutine switch to start generator (4)
	go generator()
	<-cnext

	r := &genIter[E]{cnext: cnext, cmore: cmore}
	runtime.SetFinalizer(r, (*genIter[E]).Stop)
	return r
}

// genIter implements Iter[E] for NewGen.
type genIter[E any] struct {
	cnext  chan E
	cmore  chan bool
	closed atomic.Bool
}

// Next implements Iter[E]
func (it *genIter[E]) Next() (E, bool) {
	// coroutine switch to generator for more (5)
	// (This panics if Stop has been called.)
	it.cmore <- true
	v, ok := <-it.cnext
	return v, ok
}

// Stop implements StopIter[E]
func (it *genIter[E]) Stop() {
	// Use the closed field to make Stop idempotent.
	if !it.closed.CompareAndSwap(false, true) {
		return
	}
	runtime.SetFinalizer(it, nil)
	// coroutine switch to generator to stop (6)
	close(it.cmore)
	<-it.cnext
}

The compiler would need to fuse the commented operation pairs for potential optimization by the runtime: send followed by receive (1, 2), close followed by return (3), go followed by receive (4), send followed by comma-ok receive (5), and close followed by receive (6).

rothskeller · 2022-08-04T01:04:36Z

rothskeller
Aug 4, 2022

Why is the special method for use in for range loops called Range? It is a container method that returns an iterator on the container; wouldn't Iter be a more natural name for it? That would be naming the method for what it does, rather than Range, which names the method based on what the calling code is thought likely to be doing with it.

8 replies

rothskeller Aug 4, 2022

If the only use of the method was to support the range keyword, I would agree with Range being the appropriate name. I believe, however, that there will be many other occasions when people want to ask a container for its iterator, aside from use in a range statement. And in those contexts, calling a method named Range is counter-intuitive. If you're asking for an iterator, a method named Iter makes more sense. Of course, the container could implement two identical methods with different names, but that is also awkward.

AndrewHarrisSPU Aug 4, 2022

At a glance I wonder if there would be some light confusion between a variable with an Iter method and a variable of type iter.Iter, and maybe especially so for anything with multiple Iter methods like IterFwd, IterRev, IterRand etc. It feels more like stuttering than substitution to me, and reminds me of container/heap's usage of package functions Push / Pop as well as an interface Push / Pop; I find it lightly confusing every time I come back to it.

In contrast Range suggests a point-blank (even, maybe, "fixed-point-blank"?) substitution. This is just how it strikes me.

Vaguely ... Python IMHO does really well with __dunder__ conventions to promote shared intuition, and they document really well. Range feels like a __dunder__ to me, and suggests a category of exceptional user-defined methods.

benhoyt Aug 4, 2022

I agree that .Iter() seems like a better name, and with @jimmyfrasche that the explicitness of for k, v := range x.Iter() is a good thing. It means we can have a map-like type that have m.Keys(), m.Values(), and m.Items() (kind of like Python's dict does). Couple of questions:

If you try to for k := range m.Items() where m.Items() returns an Iter2, what happens? Does Go discard the second value, or is that a compile-time error?
If you're making a slice-like type (a container indexable by integer), would you make s.Iter() return an iterator over the values? That would be most useful, but then for e := range s.Iter() -- or perhaps more confusing, for e := range s -- would loop over elements, whereas with the built-in slice that would iterator over indexes, and you'd only get elements if you did for i, e := range s. Seems like there's a fair bit of potential confusion here.

Merovius Aug 5, 2022

A downside of having to write range x.Iter() is that you don't get the automatic calling of Stop.

beoran Aug 6, 2022

It is a bike shed, but if we follow the Go convention on interfaces, we should call it the interface Nexter since it contains the Next method.

kortschak · 2022-08-04T01:15:59Z

kortschak
Aug 4, 2022

An iterator that can fail will either return a single value that includes an error indication, or it will implement Iter2[E, error]. It's not yet clear which of those options is better.

Would it be possible to have an auxilliary method, Err() error that works in the same way as the bufio.Scanner's Err method works?

Compatibility note: if the type of it is a slice, array, pointer-to-array, string, map, or channel type, then the Next method will be ignored and the for range will operate in the usual way. This is required for backward compatibility with existing code.

If the user wants to use an iterator method in place of the 'natural' range iteration, would this be possible to signal to the for loop? One way that I could see this working would be to wrap it as an embedded field in a struct.

Because it's inconvenient to write for v := c.Range(),

Should this be for v := range c.Range()?

38 replies

Merovius Aug 6, 2022

I don't think IterErr[E] is a good idea, unless and until we get generic type aliases. Currently, IterErr[E] would have to be its own distinct type, so it couldn't be used with functions composing Iter2. I don't believe that matrix should be further exploded. If we could write type IterErr[E] = Iter2[E, error], the tradeoff would change.

fluhus Aug 6, 2022

What about an Iter2[E, error] whose Next() returns zeroValue, err, true upon encountering an error? Then the loop would look something like:

var outErr error
for v, err := range someIterator {
  if err != nil {
    outErr = err
    break
    // or just return err
  }
  use(v)
}

szabba Aug 7, 2022

Thanks @Merovius for pointing me to this thread. :)

I hope this is not derailing things: some iterators with Stop() presumably will do IO in Stop(). That can encounter an error, but Stop() has no means to report it. So, IIUC, an error that occurs in Stop() is swallowed up and silently ignored. This is an issue both when working with iterators directly and when relying on the proposed syntax sugar to perform cleanup.

Merovius Aug 8, 2022

This is an issue both when working with iterators directly and when relying on the proposed syntax sugar to perform cleanup.

I think this is a good reason for collections that do I/O to get iterated not to have a Range method (but call it something else).

szabba Aug 8, 2022

I think this is a good reason for collections that do I/O to get iterated not to have a Range method (but call it something else).

Maybe this is an edge case but: Not all iterators that do I/O will need cleanup. An Iter2[SomeType, error] that lists things from a paginated HTTP API probably has no resources held between calls to Next.

(I'm assuming an *http.Client is used internally, and a successful response is fully consumed in Next.)

I'm coming to the conclusion that iterators that need I/O for resource cleanup should probably implement io.Closer instead of StopIter/StopIter2. (Stop still makes sense for generators that don't do I/O, or don't hold onto I/O resources between Next calls.)

Another point: should it be possible to interrupt/cancel I/O occuring in Next? Right now the proposal does not seem to address that.

rothskeller · 2022-08-04T01:17:37Z

rothskeller
Aug 4, 2022

I'm excited by the coroutine optimizations. I've lost track of how many times a goroutine-generator pattern would have been the cleanest way to implement something, but couldn't be used because of the inefficiencies. I support this whole proposal, which I think is necessary and valuable — but the coroutine optimizations would be hugely valuable even if the rest of this proposal didn't happen.

3 replies

blizzy78 Aug 7, 2022

Perhaps the coroutine optimizations should be extracted into a separate proposal.

betamos Aug 7, 2022

Can you help me understand? The whole section on generators seems so strange:

    // If yield(v) returns false, gen must stop calling yield and return.
    func NewGen[E any](gen func(yield func(E) bool)) StopIter[E]

So yield is just a callback, and in order to provide generator semantics (a) the compiler needs to ensure that the user respects the return value of yield (what should happen if it can't?) and (2) an extra goroutine, a regular unbuffered channel, a back-channel, an atomic and a finalizer are needed. To me, this seems like a heroic set of complexity based on non-existing optimizations, to deliver a moderately useful feature. Have I misunderstood or misrepresented something?

I'm not against either generator functions or compile-time channel optimizations, but it seems like it would be way easier to implement generator functions the old-fashioned way, which I assume is a new language construct in a separate proposal that applies the 1:1 coroutine transfer under the hood at the yield-point, without seemingly unrelated (?) multithreaded constructs like goroutines and atomics.

ianlancetaylor Aug 8, 2022
Collaborator Author

@blizzy78 There is no need to put the coroutine optimizations into a separate proposal, because they are simply optimizations. They can be done at any time, independent of this proposal. They are described in this proposal to show how NewGen can be implemented efficiently.

@betamos We have opposite ideas of "easier." To me it's much easier to not require a new language construct. A new language construct would have to fit in well with the rest of the language and work orthogonally with the go statement and channels and the select statement. It doesn't seem easy at all.

kalexmills · 2022-08-04T01:27:47Z

kalexmills
Aug 4, 2022

At the risk of inviting bike-shedding, let me raise a minor concern on a naming choice.

Iter2 would be a widely-used stdlib interface. Unlike other widely used interfaces from the stdlib (io.Reader, io.Writer, etc), it includes a digit in its name, which could impact cognitive chunking while reading code. The name IterPair could convey the same information but would not have this feature. That said, it is longer.

6 replies

jannotti Aug 4, 2022

Just because a noun ends in er doesn't mean it's forbidden in an interface because there's no corresponding verb. If my interface should have a method called Beer() or something like that, so be it.

lpar Aug 5, 2022

Maybe since the primary purpose is enabling for...range, it should be called a Ranger rather than an Iter.

kalexmills Aug 6, 2022

<bikeshedding>
A Ranger should contain a Range method, though. This contains Next, so Nexter would be more appropriate, but I kind of hate that.
</bikeshedding>

candlerb Aug 8, 2022

Another reason against this change: The proposal uses Iter2[E, error] for iterators which can fail. Doing this violates the idea of "Iter returns a key and a value" anyways

I don't like that either. My first preference for an "iterator which can fail" is that it should return a compound value like struct{Value V, Err error} or similar - whatever the iterator defines - and force the caller to decompose it and check for errors.

My second choice would be an Err() method that you can call once the iteration has stopped, to check whether it was stopped successfully, or stopped due to an error. But then it's possible that you're going to forget to check this, especially in a chain of iterators where you're consuming the top of the chain, but you may need to check Err() on the bottom of the chain (and possibly elsewhere too).

(Alternatively, all filter-type iterators need to proxy Err(), but that might be awkward to get right in general. For example, consider MergeIter, or even just Concat. When it1 terminates, if it does so because of an error, we don't want to continue onto it2. Therefore, Concat must call Err() at this point and store it in a local variable; and if this value is non-nil then iteration must stop. It's all doable but adds complexity)

I suppose a good example to consider is a JSON decoder acting as an iterator. How might you want this to work?

for v := range someDecoder {
    if v.Err != nil {
        panic("Bad data")
    }
    fmt.Println(v.Value)
}

or:

for v := range someDecoder {
    fmt.Println(v)
}
if err := someDecoder.Err() {
    panic("Bad data")
}

In either case it's easy to forget to do the error check, and without explicitly assigning the error to _

candlerb Aug 8, 2022

Another option is to let the iterator itself set an error value via a pointer:

var err error
someDecoder := makeDecoder(stream, &err)
for v := range someDecoder {
    ...
}
if err != nil {
    panic("Bad data")
}

That's not much better than calling Err(), except that multiple iterators in a chain can share a pointer to the same err value, and therefore any of them can set an error.

A random thought which occurs to me is that you could change ok bool to err error, and return io.EOF on normal termination, making an iterator analogous to an io.Reader. But the problem is then how you'd access that error value, in a for ... range expression.

bronze1man · 2022-08-04T01:27:53Z

bronze1man
Aug 4, 2022

I use following pattern to solve this problem in my project, I think it is simpler than interface from proposal , and c++ implementation.

func (ci *Db) MustRangeCallback(req RangeCallback_Req, cb func(row RowData)bool){
   ...
}
num:=0
db.MustRangeCallback(req, func(row RowData)bool{
   num+=row.GetCount()
   return num<=100
})
fmt.Println(num)

return true in cb means continue give me next one in the loop, return false in cb means break this loop.

Maybe we can make this pattern easier to call with golang for range grammar or do not add grammar to golang.
The performance is good right now. I have confirm cb in this pattern will be embeded if the cb is not too large. So there is no more memory alloc when you use this pattern compare to embed the loop in the implement of the callback.

7 replies

Merovius Aug 4, 2022

return true in cb means continue give me next one in the loop, return false in cb means break this loop.

Notably, there is no way to return out of the loop.

ianlancetaylor Aug 4, 2022
Collaborator Author

There's nothing horribly wrong about a MustRangeCallback method. But it's not an iterator, and this proposal it about iterators. It is awkward to implement functions like the proposed Map, Filter, and Reduce if all you have is MustRangeCallback. Experience in other languages shows that functions that operate on iterators can be a powerful technique.

bronze1man Aug 5, 2022

Notably, there is no way to return out of the loop.

Return false to break the loop and use a bool variable to return the outer function...

bronze1man Aug 5, 2022

There's nothing horribly wrong about a MustRangeCallback method. But it's not an iterator, and this proposal it about iterators.

So I want an iterator (for range) in caller side, and a MustRangeCallback method in implementer side.It will make both side simpler.

ianlancetaylor Aug 5, 2022
Collaborator Author

Iterators as outlined here are more than just for range. For example, you can iterate through two different collections at the same time. See the Equal function in the examples section.

seancfoley · 2022-08-04T02:06:23Z

seancfoley
Aug 4, 2022

You should be able to query if an iterator is empty without pulling an element out of it. The Java model is better for that reason. Example. The iterator interface needs another method.

18 replies

Merovius Aug 4, 2022

@seancfoley I think we've learned from other interfaces, that it is best to return a concrete type instead of an interface, where possible. So, if you have a BTree[E] container, the Range method should return a *BTreeIter[E], not iter.Iter[E]. *BTreeIter[E] can then have any methods it wants and the user can call them without type-assertions. That's why the proposal says that Range() should return a value implementing Iter[E].

That's what @kortschak means when he says "the iterator can have this method without a standard interface".

I think it's important to note that for many iterators, implementing an Empty() bool method would significantly complicate the implementation. For example, a channel-based iterator has to modify its internal state to figure out if there are any more values to come and it has to block.

So, I don't think requiring or even standardizing this method is a good idea.

ianlancetaylor Aug 4, 2022
Collaborator Author

There are natural iterator sources for which computing Empty is non-trivial: channels are an obvious example; trees are another. I think it would be a mistake to require all iterators to define an Empty method. Of course it's no problem for an individual iterator type to define an Empty method, and if that proves useful we could consider defining a standard optional EmptyIter interface along the lines of SetIter.

Turning things around, though, an Empty method is only useful if there are algorithms that can be simplified by calling Empty. There's no reason to add Empty if it's not useful. So: when is it useful?

And for cases where it is useful, we can write something like this (untested). This is not a perfect solution, but perhaps it suffices for most use cases.

// EmptyIter is an iterator with an Empty method.
type EmptyIter[E any] interface {
	iter.Iter[E]

	// Empty reports whether there are no remaining elements;
	// if Empty returns true, the next call to Next will return ok==false.
	Empty() bool
}

// EmptyIterator takes an iterator and returns a new iterator with an Empty method.
func EmptyIterator[E any](it iter.Iter[E]) EmptyIter[E] {
	return &emptyIter[E]{it: it}
}

// emptyIter implements EmptyIter for an arbitrary Iter.
type emptyIter[E any] struct {
	it     iter.iter[E]
	peeked bool
	val    E
	ok     bool
}

// Next implements iter.Iter
func (it *emptyIter[E]) Next() (E, bool) {
	if it.peeked {
		it.peeked = false
		return it.val, it.ok
	}
	return it.it.Next()
}

// Empty implements EmptyIter
func (it *emptyIter[E]) Empty() bool {
	if !it.peeked {
		it.val, it.ok = it.it.Next()
		it.peeked = true
	}
	return !it.ok
}

seancfoley Aug 4, 2022

@ianlancetaylor
I would propose adding EmptyIter now for those cases where it is useful. And yes, as you have demonstrated in your example code, and as I mentioned previously, it is always possible to wrap an iterator to achieve having an Empty method. Thanks for providing the proof. However, providing a standardized EmptyIter remains beneficial, so it can be shared amongst all go programmers, and we don't have all sorts of people writing their own in their own packages.

One pattern where Empty is useful: when code that is consuming elements from the iteration is separated from code that supplies the iterator. This example illustrates this concept:

func iteratorManager(i iter.Iter[E]) {
    for !i.Empty() {
        iteratorConsumer1(i) 
        iteratorConsumer2(i)
        ...
    }
}

func iteratorConsumer1(i iter.Iter[E]) {
...
}

func iteratorConsumer1(i iter.Iter[E]) {
...
}

In the above code...

you may be dishing E instances out for different purposes, each iteratorConsumerX consumer being quite different.
or perhaps you wish to separate the code that is consuming the elements from the code that is supplying the iterator. iteratorManager does not need to decide who pulls what out of the iterator. It just needs to know when it's empty, and it needs to know what consumers need iterator elements. Meanwhile, each iteratorConsumerX doesn't need to know the iterator is shared, it just needs to pull out what it needs, for whatever it does.

Patterns like the above are not uncommon.

seancfoley Aug 4, 2022

@Merovius 2 things:

@ianlancetaylor showed with his code example how any iterator can provide an Empty method, simply by peeking and caching.
That being said, an Empty method could simply be added to a new EmptyIter much like the Stop method was added to StopIter, keeping Iter the same, allowing people to choose whether to include it or not. My hunch is that most would include it. Because most would include it, I think it's valuable to add it to the proposal.

Merovius Aug 5, 2022

@seancfoley

@ianlancetaylor showed with his code example how any iterator can provide an Empty method, simply by peeking and caching.

Sure. I was talking under the assumption that Empty should have the benefit of not modifying internal state.

That being said, an Empty method could simply be added to a new EmptyIter much like the Stop method was added to StopIter, keeping Iter the same, allowing people to choose whether to include it or not.

Note that people can choose to do so even without anyone declaring EmptyIter. StopIter exists so range c can call it. Nothing we've talked so far for the iter package or language would call Empty(). Generally, the convention is to define interfaces where they are used - if we don't have a use for EmptyIter, it seems against convention to define it.

My hunch is that most would include it.

I disagree with this hunch. In particular, I don't think any channel-based iterator should, as it provides a false promise (that there is a cheap and immediate way to check for emptiness). map based iterators should not do it. Any implementation for which Empty would have to do the "peek ahead and cache" thing would be one which should not provide it at all. I think that's a pretty significant section of all iterators.

This is one of those "only time will tell" things. If we disagree about the most likely future and have no reason to sway us one way or another, it seems the most reasonable path to be conservative, not declare that interface and see if, indeed, most iterators end up having a method such as this and if, indeed, there are many cases where it would be useful.

jhenstridge · 2022-08-04T02:38:36Z

jhenstridge
Aug 4, 2022

With respect to integration with the range syntax, would it make sense to have two methods so the container can specifically handle one- vs. two-value iteration? I'm thinking of cases where the container can implement the iterator faster if it knows the second value won't be needed.

Taking an example from the language's builtins, consider slices: if I write for idx := range slice { ... }, the slice is just producing a sequence of integers. If I do for idx, val := range slice { ... }, it is also creating copies of each element of the slice.

If I wanted to implement my own container type that provided slice-like iteration, I'd need to make the Range method return an Iter2[int, E] and have its Next method always copy the elements. If the method is short enough, it might get inlined and the copy optimised out for a one-value range loop, but there's probably going to be cases where that doesn't happen. It'd be nice not to rely on the optimiser for this.

7 replies

kalexmills Aug 4, 2022

Oooh. Good catch. That would be a problem.

I think a simple fix then might be to rename Iter2's Next method to NextPair().

ianlancetaylor Aug 4, 2022
Collaborator Author

The Range method is a simplification for the common case. for range supports any iterator, so one can write, for example, a Keys method that returns an Iter over keys, a Value method that returns an Iter over values, and a Range method that returns an Iter2 over key/value pairs.

	for k := range m.Keys()

Admittedly you don't get to take advantage of for range calling the Stop method, so this is most useful when no Stop method is required.

gazerro Aug 6, 2022

If the argument to range implements StopIter[E] or StopIter2[E], range loop could ensure that Stop is called when exiting the loop, in the same way that is done now but only in the case of Range.

Merovius Aug 6, 2022

@gazerro I think it is entirely reasonable wanting to range over part of a StopIter using a loop, then break out of that loop and then consume the rest of the iterator. Just like it is for a normal Iter. The rule "whoever obtained a StopIter is who calls Stop" also seems easier to understand.

apparentlymart Aug 8, 2022

I had a similar thought reading the proposal: I could imagine writing the iterator differently if I knew it would only produce keys, only produce values, or both.

The idea of having for ... range call a different method to obtain an iterator from a non-iterator object depending on the declared symbols was also my first instinct, similar to @jhenstridge although with one more case:

for i := range foo would look for a RangeKeys method.
for _, v := range foo would look for a RangeValues method.
for i, v := range foo would look for a Range method.

Of course in the above I've made an implicit assumption that all iteration involves "keys" and "values", which is less general than the current framing of the proposal where in principle i and v in my above examples could represent literally anything. In particular the proposal discusses the idea of something like for v, err := range foo where what would normally be the "key" is actually the value, and what would normally be the "value" is an error (or possibly an ok bool, depending on the situation).

It also caused me to worry about the fact that the above hypothetical design can allow my i and v to take on entirely different meanings depending on how the for clause is written, whereas for all existing variants of for ... range there is no situation where omitting or not naming one of the symbols causes the other one to take on a different meaning.

Compared to the various available patterns for implementing "iterators" for custom types today, requiring copying both the key and value even when they aren't used doesn't seem any worse from a performance standpoint 🤔 but I suppose once arbitrary types with magic methods are allowed to appear in the range clause it would be a breaking change to start looking for any other method names to further optimize this later, because methods of the appropriate name and signature could already be present on the given type without the author intending them to be optimized variants of the Range method.

EDIT: A discussion further down reminded me that for ... range over channels already treats for i := range foo as iterating over the values of the channel, because channel "elements" don't have keys/indices. So replicating that interface under what I showed above would mean implementing a RangeKeys method, and that feels weird.

robaho · 2022-08-04T03:20:48Z

robaho
Aug 4, 2022

I don't like the complexity of adding generators. Their usefulness is suspect - as almost always if a generator makes sense it is because the size is unbounded which usually equates to a more complex process underlying the collection - the block/notify is almost always going to be needed at some level anyway in this case.

It may make sense for trivial series generation - but for those cases you can just as easily code it with a state machine.

I also don't like the xxxx2 interfaces. I would rather see KeyValue and IndexValue interfaces created to be used as the return types - so Iter[KeyValue[K,V]] or Iter[IndexValue[I,V]] or Iter[Any[E]]. The compiler can use similar inspection code for the range operator.

Other than that it is a great step forward for Go.

34 replies

Merovius Aug 12, 2022

Find a use case for that.

Calculating the union, difference, intersection and symmetric difference of a set are all operations requiring iterating sets. So does Map. Or the copy you mention. Serializing it for storage or transfer over a network. None of these care about ordering.

There are use cases which benefit from a deterministic order. For example, it is sometimes useful to serialize sets (or maps) in a specific order, because it makes the encoding deterministic. But they are by no means the norm and they can still pay that cost where they are needed.

Saying that these operations are useless or that it's reasonable for them to incur an unnecessary O(N) memory and time overhead is honestly confusing to me.

robaho Aug 12, 2022

Those operations are not useless but you don’t need external iteration to perform them.

This discussion stemmed from the claim that generators are useful for iterating sets/maps.

You don’t need generators to do this - except for builtin maps.

This is easily accomplished by adding a iter() builtin that returns an Iterator for a map.

Java does not have generators and you can iterate them fine if you wish.

If Go adds Set and Map types the only use case left for generators is channels and honestly the proposed channel semantic changes are baffling.

Merovius Aug 12, 2022

This is easily accomplished by adding a iter() builtin that returns an Iterator for a map.

True. I don't think this is a good idea, but it is a possibility.

robaho Aug 12, 2022

I just can’t see after all of the pushback on exceptions by the Go designers due to obscuring the program flow how generators are seen as a good idea. The flow control in the latter is far more complex.

Merovius Aug 12, 2022

Personally, I don't find the control flow for generators to be particularly complex. Their implementation, sure. But once you have a FromGen function, I don't find it particularly hard to understand the control flow of a usage of that. And often, I think they make it significantly easier. And they certainly are useful.

Of course, the relative ease or difficulty of understanding their usage is a matter of personal preference. Just like the relative importance of their use cases. Such matters should be easy to agree to disagree on.

Merovius · 2022-08-04T07:14:08Z

Merovius
Aug 4, 2022

Because it's inconvenient to write for v := c.Range(), we propose a further extension: we permit range c if c has a method Range that returns a value that implements Iter[E] or Iter2[E]. If the Range method implements StopIter[E] or StopIter2[E] then the range loop will ensure that Stop is called when exiting the loop.

One thing that strikes me as unfortunate about this is that there is no way for a function to take advantage of the same mechanism to work on either a full collection or a bare iterator. So, say I have Contains[E comparable](x TODO, v E) bool, I have to ask myself what TODO should be. If it is Iter[E], then I have to manually handle the (optional) Stop method with type-assertions. If it is interface{ Range() Iter[E] }, the function does not work with a pure iterator. It would be great if Contains could also take advantage of the same thing and just do

func Contains[E comparable](x TODO, v E) bool {
    for e := range x {
        if e == v {
            return true
        }
    }
    return false
}

which would handle Stop transparently. Not sure how to do this, though.

This also brings up another thing: The Range method doesn't "implement" anything, it's what the method returns, presumably. And when the proposal says "if the return value of Range implements StopIter[E]", does that mean statically, or will the compiler emit code to do an interface type-assertion? This should be clarified.

It might be useful to do a type-assertion, as it would be possible for a wrapping iterator to implement StopIter[E] if and only if the wrapped iterator does - which means the wrapping function needs to return Iter[E] (i.e. it needs to return an interface that does not statically implement StopIter[E]).

26 replies

smyrman Sep 27, 2022

@smyrman You seem to be missing an answer to the question of where a StopIter (or any other extended iterator) is coming from. Would you add RangeStop methods or something as well? Would every container type then gain 2^n different Range* implementations, for any combination of optional Iter interface it wants to support?

I did miss the expand thread button 🤦. Reading more of the answers now, it seams like StopIter probably would not come from a Range* method. To me, the presence of a Range method, is useful only if you have a type that can be iterated several times. That is what that interface ~~says~~ suggests. Even if you have an I/O type that could be iterated several times (and got a Range method), would it not then be more natural to leave the close method on the parent File object?

Other cases, such as the DeleteIter case is still relevant for in-memory data structures, and would, with this approach, require that that's returned from a separate method. Even if a Rangable and Rangable2 interface might be useful, that doesn't mean that a RangableDelete{,2} interface would be useful though.

Merovius Sep 27, 2022

Those (that is, your AllOf/AnyOf) would work with a normal iter. It's not trivial, but you can effectively create N iterators, each blocking in their Next call until all N of them have been called and then returning the value from the Next call of the one wrapped iterator. If that makes sense. So, effectively, they walk in lockstep. You can then call the functions in separate goroutines. It's tricky enough that I can't just write it down just now, but it should be possible.
You can provide a helper to create a ResetIter from any container ([edit] or, if you prefer [/edit]). And notably, that works with any iterator type returned from Range. So it's still not an argument for why you have to pass around the container itself.

Even though I brought up the concern that the mechanism from the proposal doesn't allow passing around containers, I'm coming increasingly around to the fact that it's just not a big deal.

Merovius Sep 27, 2022

Hm, on second thought, that wrapper has a serious flaw, in that it doesn't do the StopIter thing itself. So it's kicking the can down the road. Optional interfaces are tricky… You can do this. Obviously, this only addresses StopIter and not other optional interfaces. But ISTM optional interfaces make any wrapping tricky.

smyrman Sep 28, 2022

It's not trivial, but you can effectively create N iterators, each blocking in their Next call until all N of them have been called and then returning the value from the Next call of the one wrapped iterator.

@Merovius 's a pretty clever, but as you say, also very complex, and seam to require concurrency. I suppose you could always fall-back to do an AsSlice for non-concurrent composite [edit] ~~types~~ functions [/edit] that accepts Iter types, and maybe that's not a problem in practice.

That said, for this case in particular, accepting a Rangable / Rangable2 would probably still be cleaner ([edit] ~~given these~~ where the Range/Range2 method [/edit] return interfaces, not concrete types). That is not to say that accepting Rangable is worth the over-all trade-off.

Obviously, this only addresses StopIter and not other optional interfaces. But ISTM optional interfaces make any wrapping tricky.

Agreed. Do you know if it has been explored to do these [edit] ~~iterators~~ extension interfaces [/edit] one level up?

E.g. instead of doing:

func (s *Set) Range[E]() DeleteIter[E]

Then do:

func (s *Set) Range[E]() Iter[E]
func (s *Set) Add(e E)
func (s *Set) Remove(e E)

And instead of:

func (s *KeyValue) Range[E]() DeleteIter2[K,V]

Then do:

func (s *KeyValue) Range[V]() Iter[V]
func (s *KeyValue) Range2[K,V]() Iter2[K,V]
func (s *KeyValue) Set(k K, v V)
func (s *KeyValue) Remove(k K)

I see that there are some implications here. Maybe it makes it harder for an active range to respond correctly to a Set or Remove operation. Or maybe it just works, because both methods refer to the same internal data structure (E.g. a map).

Related discussion I could find:

Merovius Sep 28, 2022

I think if Delete is on the iterator, it can be more efficient. For example, if you have a tree-based map, deleting any given key is O(log(n)), but deleting a key that is encountered in iteration is O(1) (as you already have the actual node of that key in the iterator). That being said, standardizing extended iterators is a long-term future problem. For now, the important fact is that we shouldn't get locked into returning iter.Iter[E] exactly, as returning a concrete type is necessary to experiment anyways and will be needed if we ever do it.

That is, I want to be able to use a func (*Container[E]) Range() *FooIter[E] method, so that my *FooIter[E] can carry additional experimental methods and so that I can add a Delete (or whatever) method to it when we standardize on it, should it make sense for my container.

henryas · 2022-08-04T07:20:46Z

henryas
Aug 4, 2022

I like the general direction of this proposal. There are some minor disagreements though:

There is no need for two Iter interfaces. Only Iter[K,V any] is needed and the for-range loop will discard values appropriately as whether one or two values are needed. Hence, for a simple array-like iterator implementation, the iterator should return the index and the value.
Stop interface can be merged with Iter interface. I think cleaning up is an essential part of the iterator.
Iterators should be for reading values only. Insertion, updating, and deletion should be manually done. I disagree with the 'optional future extension' part.
I am ambivalent about generators as they can be convenient.

Another important point is to keep the number of "must-know" interfaces small. So try not to add too many of those.

14 replies

Merovius Aug 4, 2022

Note that I'm not arguing in favor of using struct{} over int. I'm arguing in favor of not using either, but instead not make the type lie about what it is.

If we absolutely need a sentinel, though, I also gave pretty concrete reasons why -1 is bad. struct{} (or a type with that representation) makes for a better sentinel, in general, because it can only have one possible value, so it can't possible be mis-interpreted to mean anything else but being a sentinel. ISTM you simply ignored the reasons I gave.

jimmyfrasche Aug 4, 2022
Collaborator

If Go had tuples we could just use those for multivalue iterators. Since we don't we have to work around that.

I'd prefer the workaround where there's only Iter2 because it makes it easier to use iterators outside of a for loop. That makes it easier to write and use higher order iterators. It makes it easier to write your own iterators because there's less to learn.

I'm 👍 on a package iter; type Unused struct{}. Not the most elegant but good enough. That could be used for chans and for iterators that only return a key as an optimization.

ianlancetaylor Aug 4, 2022
Collaborator Author

Returning an initial value of -1 means that for v := range c does the wrong thing if c returns a single value. You would have to write for _, v := range c. That can't be right. It doesn't match how channels behave today.

Also Iter2 inherently takes two type arguments, and they can't always be inferred. Do we always have to write Iter[E, int] or Iter[E, any]? That can't be right either.

Merovius Aug 4, 2022

@ianlancetaylor if we had generic type aliases, we could have type Iter[E any] = Iter2[E, iter.Unused]. That would still allow using functions defined using Iter2, while only requiring to write Iter and not needing to infer the extra type argument.

So I think the type argument objection can be addressed with a relatively intuitive language change. Of course, I don't want to say we should delay generic iterators until we were willing to make that change.

jimmyfrasche Aug 4, 2022
Collaborator

Will be many places you have to write it outside of higher order iterators and iterator definitions?

Merovius · 2022-08-04T09:35:54Z

Merovius
Aug 4, 2022

I would like us to encourage, from the start, to return exported, concrete types as iterators, not the iter.Iter{,2} interfaces. Concrete types make it possible to add extra methods if desired (for example for one of the possible future extensions). They might also, in some cases, make it easier for the compiler to do inlining and escape analysis, thus eliminating some copies and allocations.

In particular, I would like all the top-level functions of the iter-package to do that. In some cases, this means exporting extra types. In others (e.g. funcIter) it can mean eliminating a top-level function altogether.

A possible exception to this would be functions which exist to compose iterators, like Map/Filter/Reduce. Returning interfaces can reflect that composition aspect quite well. Though even for those, I think it might be sensible to return concrete types. If we ever added "refined method constraints" to the language to solve "optional interfaces"¹, it would enable such functions to implement StopIter if the underlying type implements StopIter, for example (and similar for other possible future extensions).

[1] I'm not sure if we've standardized on these terms, but I'm referring to the solution the FGG paper suggests for the expression problem

15 replies

robaho Aug 4, 2022

Returning concrete types is bad for testing and refactoring.

jhenstridge Aug 5, 2022

@mateusz834: the iterator will likely need to be a pointer variable anyway, so should fit inside an interface variable without extra allocations.

The Next method mutates the iterator's state, so will generally need to take a pointer receiver. That means it won't be part of the method set of non-pointer variables of the type.

mateusz834 Aug 5, 2022
Collaborator

type Iter[E any] interface{ Next() (elem E, ok bool) }

type C struct{ a int }

func (c *C) Range() funcIter[int] {
	return NewNext(func() (int, bool) { return c.a, false })
}

func NewNext[E any](f func() (E, bool)) funcIter[E] {
	return funcIter[E](f)
}

type funcIter[E any] func() (E, bool)

func (f funcIter[E]) Next() (E, bool) {
	return f()
}

func BenchmarkIter(b *testing.B) {
	for i := 0; i < b.N; i++ {
		c := C{}
		r := c.Range()
		r.Next()
		r.Next()
		r.Next()
	}
}

BenchmarkIter-4 1000000000 0.3395 ns/op 0 B/op 0 allocs/op

funcIter replaced with Iter:

func (c *C) Range() Iter[int] {
	return NewNext(func() (int, bool) { return c.a, false })
}

func NewNext[E any](f func() (E, bool)) Iter[E] {
	return funcIter[E](f)
}

BenchmarkIter-4 20527386 81.74 ns/op 24 B/op 2 allocs/op

mateusz834 Aug 5, 2022
Collaborator

Someome might argue, that returning funcIter directly from Range() might not me a great idea (what if we would want to replace the funcIter with our custom iterator in future, without breaking backwards compatibility). If we choose to return Iter from Range() we still allocate.

func (c *C) Range() Iter[int] {
	return NewNext(func() (int, bool) { return c.a, false })
}

func NewNext[E any](f func() (E, bool)) funcIter[E] {
	return funcIter[E](f)
}

BenchmarkIter-4 21444618 70.35 ns/op 24 B/op 2 allocs/op

But with something like that, we don't allocate, and we can easily change the iterator used in future (even to an Iter interface if required, but it will obviously allocate)

type CIter struct{ f funcIter[int] }

func (c *CIter) Next() (int, bool) {
	return c.f.Next()
}

func (c *C) Range() CIter {
	return CIter{NewNext(func() (int, bool) { return c.a, false })}
}

func NewNext[E any](f func() (E, bool)) funcIter[E] {
	return funcIter[E](f)
}

BenchmarkIter-4 129115926 9.222 ns/op 0 B/op 0 allocs/op

Edit:
We should also consider the Next() performance() overhead.

func BenchmarkIterNext(b *testing.B) {
	c := C{}
	r := c.Range()
	for i := 0; i < b.N; i++ {
		r.Next()
	}
}

func (c *C) Range() Iter[int]
func NewNext[E any](f func() (E, bool)) Iter[E]

BenchmarkIterNext-4 324370472 3.691 ns/op 0 B/op 0 allocs/op

func (c *C) Range() CIter {
func NewNext[E any](f func() (E, bool)) funcIter[E] {

BenchmarkIterNext-4 357346220 3.299 ns/op 0 B/op 0 allocs/op

func (c *C) Range() funcIter[int] {
func NewNext[E any](f func() (E, bool)) funcIter[E] {

BenchmarkIterNext-4 1000000000 0.3389 ns/op 0 B/op 0 allocs/o

Edit: faster "CIter" implementation.

type CIter struct{ cIter }
type cIter struct{ funcIter[int] } //required not tu expose funcIter (funcIter should be capitalized here)
func (c *C) Range() CIter {
	return CIter{cIter{NewNext(func() (int, bool) { return c.a, false })}}
}

BenchmarkIterNext-4     596732254                2.001 ns/op           0 B/op          0 allocs/op
BenchmarkIter-4         224482528                5.317 ns/op           0 B/op          0 allocs/op

The "CIter" is not something, that i propose for the iter package. It is just an example of a future proof alternative to interfaces.

gazerro Aug 7, 2022

If the Range method were for the exclusive use of the for range statement, it could return a "next" function and possibly a "stop" function instead of an iterator

// Range returns a function to iterate over c. It allows to write "for v := range c".
func (c C) Range() func() (E, bool)

// Range returns a function to iterate over c and a function to stop the iteration.
// It allows to write "for v := range c".
func (c C) Range() (next func() (E, bool), stop func())

The for range statement could also be extended to accept func(E, bool) and func(E1, E2, bool) functions as range expressions and an additional optional func() function called to stop the iteration.

We could write

for v := range c {
    // statement
}

// or explicitly:

for v := range c.Range() {
    // statement
}

// or more explicitly:

next := c.Range()
for v := range next {
    // statement
}

next, stop := c.Range()
for v := range next, stop {
    // statement
}

Merovius · 2022-08-04T09:43:42Z

Merovius
Aug 4, 2022

I would like to suggest adding two functions for composability and potentially reducing redundancy in APIs:

// Left returns an iterator over the first element type of an Iter2, dropping the second.
func Left[E1, E2 any](it Iter2[E1, E2]) Iter[E1]

// Right returns an iterator over the second element type of an Iter2, dropping the first.
func Right[E1, E2 any](it Iter2[E1, E2]) Iter[E2]

(bikeshed colors up for discussion. These names are taken from Haskell's Either, because "First" and "Last" are misleading in the context of iterators).

Far less useful, but suggesting it just for completeness:

type Nop struct{}

// ToLeft returns an Iter2 that uses it for its first element type
func ToLeft[E any](it Iter[E]) Iter2[E, Nop]

// ToRight returns an Iter2 that uses it for its second element type
func ToRight[E any](it Iter[E]) Iter2[Nop, E]

3 replies

Merovius Aug 4, 2022

An example usecase:

// Count counts how often any element appears in it.
func Count[E comparable](it Iter[E]) map[E]int

// Duplicates returns all elements from it which appear more than once.
func Duplicates[E comparable](it Iter[E]) Iter[E] {
    return iter.Left(iter.Filter2(func(_ E, count int) bool {
        return count > 1
    }, iter.FromMap(Count(it)))
}

The key point here is that the Iter2 needs to be filtered while both key and value are available, so we can't rely on, say maps.Keys or maps.Values to give us an Iter - the projection to Iter needs to happen after we massaged the Iter2 into shape.

earthboundkid Aug 5, 2022

I think there's a typo and you left the 2 out of Iter2, e.g. func Left[E1, E2 any](it Iter2[E1, E2]) Iter[E1].

earthboundkid Aug 5, 2022

I'm not sure how useful ToLeft/Right would be. I think most cases could be of wanting an Iter2 could be handled by Enumerate (to use the name Python uses) or Count (to use a slightly less jargony name for the same thing) plus one for errors called NilError.

Merovius · 2022-08-04T10:06:41Z

Merovius
Aug 4, 2022

Clarification:

Using range with an Iter2[E1, E2] will permit using two variables in the for statement, as with range over a map.

Does this mean it only permits the two-variable form, or does it mean to permit either the one or the two-variable form (as with range over a map)?

3 replies

ianlancetaylor Aug 4, 2022
Collaborator Author

The intent is to permit either form.

jhenstridge Aug 5, 2022

@ianlancetaylor: that's seems sub-optimal for the error return case. If a container has a Range() Iter2[E, error] method, then I can write for v := range container { ... } and it isn't obvious that I'm ignoring the error values.

Merovius Aug 5, 2022

I'm not sure I like that. The design suggests using Iter2[T, error] for iterators which can fail. If we allow the one-variable form for Iter2, it seems far too easy to accidentally ignore those errors. [edit] sorry, @jhenstridge's comment didn't appear for me until I submitted my duplicate comment [/edit]

Merovius · 2022-08-13T13:27:58Z

Merovius
Aug 13, 2022

func FromMap[K comparable, V any](map[K]V) Iter2[K, V]

How is this going to get implemented? If it uses NewGen, it would have to return StopIter. Otherwise it would have to use reflection or //go:linkname to call into the runtime, correct?

3 replies

szabba Aug 13, 2022

Which part of the spec would require that? Isn't this an acceptable implementation?

magical Aug 13, 2022

I'm guessing Merovius meant FromMap, not Map.

Merovius Aug 14, 2022

@magical You guessed correctly. Embarrassing typo/copy-paste fail. Edited the comment. Thanks for calling this out @szabba

Merovius · 2022-08-13T14:04:56Z

Merovius
Aug 13, 2022

It occurs to me that we might want to specify if Next() is allowed to return re-used values. This is relevant if the element type contains pointers. For example, an iterator producing []bytes could re-use a buffer for every returned value, saving allocations. But it can only do so, if it can rely on the caller not persisting the returned slice across the next call to Next.

On the other hand, this is kind of similar in nature to the infamous "closure over loop variable" problem, which is similar in nature, in that every loop-iteration re-uses the same variable/storage. So, if we do specify that, we might open ourselves up to similar bugs?

It might also be worth considering how this interplays with optimizations inlining Next calls in a range loop. Depending on the semantics, they might or might not be able to take advantage of a re-used loop variable.

0 replies

willfaught · 2022-08-14T00:01:58Z

willfaught
Aug 14, 2022

Some thoughts:

I kinda don't like the iter/Iter names. It reads like "ite-er", but my brain hears "it-er". I can also imagine "iter" being a common variable name, which we don't want package names to conflict with if possible. How about iterators.Iterator (only 2 letters longer than Reader)? Or iterators.I (like testing.T)?
Iter and Iter2 can't be implemented by the same type because the method names are shared and the method signatures conflict. This means I can't make my own Map type that imitates the map type, where it can be iterated by keys only, or by keys and elements.
Iter doesn't have HasNext() bool, so there's no way to peek if there are more items and then pass the iterator to another function for processing. Prev() would provide a way to call Next and then rewind the iterator one place to simulate HasNext, but Prev isn't included out of the box (and might not ever; it's in the optional future extensions section).
We also define a related interface for containers, such as maps, for which elements inherently have two values.

The map type doesn't have two values per element. It has "keys" and associated "elements." See reflect.Type.Elem/Key, reflect.MapOf, and https://go.dev/ref/spec#Map_types. This is just a special case of iterating the items of a "container," doing another operation with each item to get a second value, and then packaging both values together for a specialized purpose.

Where does it stop? What about 3 values per item? 4? 5? We shouldn't be adding special methods and interfaces to handle special cases. We have parameterized types for a reason: reusing code, regardless of types. We can handle 2-value, 3-value, 4-value, etc. cases with a single, parameterized type returned by the iteration that contains those values. If constructing and handling 2-value, 3-value, etc. types aren't facilitated by the current language features, then that's a sign that something needs to be added to the language to accommodate that, like tuples.

Worst case, we can always implement temporary 2/3/4/etc.-value types ourselves, e.g.
```
type MyIterVals struct { A string; B int; C map[rune]any; D error }
```
Stop or any other cleanup code shouldn't be part of the standard iterator interfaces. It should be handled by the thing that owns the resource in the first place, e.g.
```
var file, err = os.Open(...)
// ...
var t = NewThing(file)
defer file.Close() // or defer t.Cleanup()
var i = t.Iter()
for e, ok := i.Next(); ok; e, ok = i.Next() { ... }
return thing // file cleaned up without iterator
```
The iter, stop := x.Iter() idea proposed in another comment effectively standardizes another interface, which I don't think is a good idea (assuming you want it to work with for/range).
Use Iter() instead of Range(). Iterators have uses outside of range loops.
Overall, I suggest sticking to the Python or Java iterator interfaces, or the Go equivalent at least, which is basically the Go channel "interface:" one single value at a time without a way to express failure in the iteration itself. If we need to model errors per value, then we can stick them in the iteration values alongside their corresponding possible success values. Look at how Haskell does it: [Maybe Int], with the ability to convert to Maybe [Int] with trivial iteration code.

31 replies

jba Aug 24, 2022
Maintainer

@robaho, thanks for the example of Java's ConcurrentHashMap. I may be missing something—it's a complicated data structure—but it seems that it is a good example of how generators make some iterators easier to write.

It appears that the basic design of the hashmap is a sequence of buckets, each of which is a list of nodes. But some nodes are ForwardingNodes, which point to another bucket, so you could have a tree. The advance method handles that by using a stack:

for (;;) {
    ...
    if (e instanceof ForwardingNode) {
        tab = ((ForwardingNode<K,V>)e).nextTable;
        e = null;
        pushState(t, i, n);
        continue;
    }
    ...
}

A generator function would just loop over the buckets and recurse when necessary, as sketched here in Go:

func (h *ConcurrentHashMap[K, V]) gen(yield func(*Node[K, V]) bool) {
    var visit func([]*Node[K, V])
    visit = func(tab []*Node[K, V]) {
        for _, n := range tab {
            if  fn, ok := n.(*ForwardingNode); ok {
               visit(fn.nextTable)
            } else {
               if !yield(n) { return }
           }
      }
   }
}

There are other things going on in that data structure that I haven't tried to capture, but it seems clear that the generator version will be shorter and easier to understand.

robaho Aug 24, 2022

Not quite I don’t think. You might be able to make advance() into the generator but it is more nuanced as the value doesn’t need to be read. The current proposed semantics in the Go iterator avoids this because it doesn’t have a hasNext() method. If this were changed I am not sure how a generator would work.

robaho Aug 24, 2022

To summarize, I think iterators as implemented in language without generators are easier to read and to reason about lifecycles. But in all honestly if I find the arguments for generators juxtaposed against the the arguments against exceptions difficult to reconcile. I personally find generators far harder to validate. I also think the lack of exceptions is the current biggest omission in the Go language (now that generics have been added). I think it is only a matter of time before the same realization occurs with exceptions.

Merovius Aug 24, 2022

@willfaught

That's what the design says and always said, yes. It specifically says to call Stop in the single case where no one else can. And only in that case.

don't think that's true. From the design:

If the argument to range implements Iter[E] or Iter2[E1, E2], then the loop will iterate through the elements of the iterator.

The current design has for/range working with both c.Range() and c.Next(). I was saying calling Stop could work with c.Range() only (but that it shouldn't).

The section you quote does not mention Stop. It is about passing an iter.Iter to range - and have that work to iterate over it - not the container itself. The section about Stop is

Because it's inconvenient to write for v := range c.Range(), we propose a further extension: we permit range c if c has a method Range that returns a value that implements Iter[E] or Iter2[E]. If the Range method implements StopIter[E] or StopIter2[E] then the range loop will ensure that Stop is called when exiting the loop

This is clear, that Stop is only called on iterators returned from an implicit call to Range.

willfaught Aug 26, 2022

@AndrewHarrisSPU:

I think in every case of for v := range c that is employing a c.Range symbol, c is a context manager (it may not manage anything in many cases, but sometimes it does). For something like the database/sql example, sql.Rows would be a context manager.

I see what you mean, but I wonder if that can't better be handled by a GC finalizer. Perhaps there should be a Dispose() method that the GC looks for and calls automatically, such that if you create an anonymous iterator in a for/range expression, it will be cleaned up automatically when the function returns and no references remain. Then we could keep context managers and for/range separate. I guess for/range could even call Dispose() itself if it called Range(). At least Dispose wouldn't be tied to Iter. (I also like that the name Dispose has a finality to it, whereas things that can be Stopped can sometimes be Started again.)

@Merovius:

What I do dislike about iter.Iter is the stuttering, though.

I thought of another example: context.Context. So, I guess there's precedent, but I agree that iter.Iter isn't great for that reason.

The section you quote does not mention Stop. It is about passing an iter.Iter to range - and have that work to iterate over it - not the container itself. The section about Stop is [...] This is clear, that Stop is only called on iterators returned from an implicit call to Range.

Good point. I misread that part. Thanks for pointing that out. Sorry for the confusion. That seems fine. :)

adamluzsi · 2022-08-14T16:55:53Z

adamluzsi
Aug 14, 2022

I love the suggestion to have an Iterator interface in the stdlib!

I work heavily with iterator patterns because it allows me to apply information hiding in my business use case.
My business logic doesn't have to know about the implementation details of the data provider, just use and consume it.
This is highly beneficial TDD-wise as I can transparently use both in-memory variants and concrete database implementations.

I would like to share my experience working with iterators in Go.

The value of supporting error use-cases in the iterator interface

In other languages, the idiom to raise/throw an exception solves the integration of Error use-cases.
In Go, the idiomatic way to do that is to either return an error value on an action
or provide access to the error value through a method. (e.g.: context.Context.Err())

I use iterators primarily to decouple the implementation details of the data provider from the place they consume it.
For this, the ability to communicate errors during iteration is a must for my use-cases.

Most iterators in the stdlib already lean towards an OOP direction.
This way, the iterator manages the Next and the Err simultaneously,
while leaving space for extending it further for resource management.
(e.g.: sql.Rows, or bufio.Scanner)

type Iter[E any] interface {
	Next() (elem E, more bool)

	// Err return the error cause.
	// if an error occurs during Next, the "more" value will be false, and the error will be accessible from Err. 
	Err() error
}

With a wrapper iterator, it's possible to do lambda expressions func(elem E, more bool, err error).

The importance of closing an iterator

When we use an iterator to hide the origin of the data provider,
we still need to consider that there could be an attached resource,
that should be closed safely when the iteration is aborted.
(e.g.: sql.Rows, gRPC Stream, HTTP request body)

If the iterator embeds the io.Closer
then a simple defer can ensure
that we always close the attached resource as a consumer of the iterator.

Using this convention always to close an iterator ensures
that even if the implementation of the data is provided changes,
the code that depends on the iterator will remain the same.

type Iter[E any] interface {
	Next() (elem E, more bool)

	// Closer is required to make it able to cancel iterators where resources are being used behind the scene
	// for all other cases where the underlying resource is handled on a higher level, it should simply return nil
	io.Closer
}

Example of a currently used implementation

I share the iterator pattern I utilise and my use-cases where I apply them.
It was inspired by the iterators found in the stdlib.
Feel free to take it or leave it. 👍

my current Iterator interface

type Iterator[V any] interface {
	// Closer is required to make it able to cancel iterators where resources are being used behind the scene
	// for all other cases where the underlying resource is handled on a higher level, it should simply return nil
	io.Closer
	// Err return the error cause.
	// if an error occurs during Next, the "more" value will be false, and the error will be accessible from Err. 
	Err() error
	// Next will ensure that Value returns the next item when executed.
	// If the next value is not retrievable, Next should return false and ensure Err() will return the error cause.
	Next() bool
	// Value returns the current value in the iterator.
	// The action should be repeatable without side effects.
	Value() V
}

Limitations

At the moment, Next() bool is not interruptable, which makes it difficult to use for batching.
It could be worked around with HasNext() bool or passing a Context as an argument.

usage:

// get an iterator from somewhere
iter, err := storage.Users.FindAll(ctx)
if err != nil {
  return err
}
defer iter.Close()

// consume iterator
for iter.Next() {
  // use value to something
  _ = iter.Value()
}
return iter.Err()

My common use-cases for iterating:

*sql.Rows
*bufio.Scanner
json.Decoder
gRPC stream
slice
channels with Pipe in/out iterator
- PipeIn to send values or error
- PipeOut to iterate on the values
mapping from one value to another
reduce

tuples through struct types

like Iter2, but it is user-defined through a struct type to represent related values as a single iteration

type KeyValue[K, V any] struct {
	Key   K
	Value V
}
// for example Iterator[KeyValue[string,int] after a map[string]int is converted

etc

Example package I use for iterating:
https://pkg.go.dev/github.com/adamluzsi/frameless@v0.68.0/iterators

20 replies

Merovius Sep 22, 2022

please don't focus on the goal of the example.

FWIW I understood your example as trying to demonstrate the problem you are seeing in a certain scenario. And I think it was effective in that. It took me a bit of time to go through it, understand how it works and what it is trying to do. And how that jibes with what I have been saying and what I see as the core issue¹. And how I would do it instead, to support my claim that you don't need error handling in the iterator interface. So, I would say your example was effective. It helped me understand why you believe the iterator interface needs to include error handling.

I would like to ask that you extend the same courtesy to my example. Read it, understand what it does and why. And where I'm coming from. Because it should answer questions like yours about slices - I'm not returning slices, I return proper iterators. It works exactly the same as yours, from an efficiency consideration, but it's less code and easier to maintain (IMO). And maybe my example can help you understand why I don't believe you need error handling in the iterator interface, just as your example helped me.

[1] If you are curious: I say that the owner/creator of an iterator is responsible for closing/stopping it and checking the error. In your example, the owner is MyUseCaseLikeGetDates, so it should be responsible. But it also returns the iterator, so the iterator has to outlive its owner. So it needs to delegate these responsibilities to its caller, which requires wrapping the iterator or otherwise pass on the responsibility (returning separate io.Closer etc. would also be a way to pass on that responsibility). So the core issue - as I see it - is a lifetime conflict. A resource is outliving its owner.

The solution I suggest is thus to invert the ownership relation. Have MyUseCaseLikeGetDates (or what I call just GetDates) not be the owner of the base iterator. Instead, make the HTTP handler the owner of the base iterator, making that responsible to close/check errors instead. That's where you are trying to delegate the responsibility anyways, so that seems the right place. And it means you don't have to maintain all that code to delegate the error handling and your business logic becomes cleaner, as it doesn't have to concern itself with a responsibility it doesn't want or need.

robaho Sep 22, 2022

What is a “legacy synchronized iterator in Java”. There is no such thing. Java supported async programming via background threads and futures since the beginning. They were standardized in the concurrent collections support in Java 5 (but people rolled their own futures long before that).

adamluzsi Sep 23, 2022

@robaho, I misinterpreted your insight and the Java Vector type came into my thoughts. 🤦

robaho Sep 23, 2022

If you notice that was there since 1.0 yet they were able to update it to support generics with no code changes - that is why they chose the type erasure route.

adamluzsi Sep 24, 2022

@Merovius, I reread your proposal with a fresh mind.
I think we both agree that the Err and Close should be on the source iterator, just like in your solution.
I believe that the solutions are right on their terms.

So at this point, it's probably just team/personal conventions regarding Architecture Design decisions.
Our team keep a separation between domain+entities, external interfaces and external suppliers. For example, the controller component to an external interface focuses on the mapping between DTOs and domain entities and interacts with the domain layer through domain interactors. It does not interact with backing services, such as a repository. This isolation allows using the same domain interactions between different external interfaces (JSON api, gRPC, GraphQL). The domain interactor is the only component with reason to change for a domain expert's request, encapsulating the interactions with the backing services, regardless of the number of backing services it needs to interact with during the use-case interaction. It achieves that through role interfaces. Due to the role interface's nature, we don't use specific implementation details in the domain layer; thus, the returned iterator values contain the ioCloser interface. Using the iterators as part of the role interfaces plays well with LSP and SRP, especially if we define interface-testing-suites to test the behaviour solely. Furthermore, it allows all kinds of fancy things, such as adding a cache implementation for the Repository role interface, without the need to change anything but the application initialisation.

Overall, I believe your solution is excellent, and now I can see that it's just a difference in the design from the one my team practices.
It's an apple-to-orange scenario. Cheers for your time and patience in teaching me a new way of thinking; I will take some time and do a personal project where I will practice the pattern you proposed to me now.

willfaught · 2022-08-26T01:54:09Z

willfaught
Aug 26, 2022

Perhaps we should think about and design for all iteration concerns now, even if we don't roll them out all at once. What works well for the simplest case (Next) might not work so well for more complicated iteration needs.

Java mapped out this space pretty well, in my opinion:

Next
HasNext
NextIndex
Previous
HasPrevious
PreviousIndex
Add
Remove
Set

Perhaps batching is also something worth considering (see Reader).

We can always roll out the types and methods in x/exp without a compatibility guarantee, and save the for/range changes for when the entire design has solidified.

0 replies

katsuya94 · 2022-09-01T23:54:07Z

katsuya94
Sep 1, 2022

Reading through the proposal and the comments I have a few thoughts.

iter should be in builtin

I feel like an implicit expectation I have of Go is that all "special" declarations are limited to the builtin package.

generic functions like append, len (before generics were available)
functions that take types as "arguments" like make, new
panic, which has non-local return behavior
comparable (new in Go 1.18), which has richer behavior than other interfaces

There might be a few exceptions, notably special pointer-convertible types in the unsafe package, but their use cases seem to be somewhat niche compared to iterators.

Creating a new language construct, range on iter.Iter, seems to me to be a violation of this expectation with a common use case. That said, it seems like too much to extend builtin with the entirety of the proposal. I would recommend only extending builtin with an iter (and possibly a stopiter) interface, and keeping the remaining functions in an iterators package.

consolidating Iter and Iter2

Iter2 seems oddly specific though no doubt it would be useful. Some quick ideas:

Defer the inclusion of Iter2 for a future language change.
Investigate introducing a pair type in builtin that supports destructuring assignment as a language construct, possibly applicable outside of range as well.
Make Iter accept two type parameters, but define a special none type (or use _) to indicate that there is no "key" in which case you can either write _, el := range it or i, el := range it to get the integer index.

go vet finds ambiguous usages of range

If we have to use the old behavior of range for "range-able" types that also implement Iter for backwards compatibility, then go vet should be extended to point this out.

0 replies

ConradIrwin · 2022-09-21T12:53:08Z

ConradIrwin
Sep 21, 2022

I like that this proposal is being discussed, but I do want to flag a few issues with this direction that make it seems suboptimal to me.

As a few people have noted, handling errors correctly complicates thing; and the main other feature in Go that makes this worse are panics. Any time we're asking people to use SetFinalizer or run code on a new goroutine, they need to be clear what happens if that code panics. For web-services (which is most of the go code I write) it's not acceptable for an unhandled panic to terminate the program (it should terminate the request that caused it, and other requests should run to completion): it's vital to me that panics ladder up the call stack correctly.

Two features of this proposal make that hard:

making Stop() a best-effort call requires every stop iterator to use something like SetFinalizer to ensure it is called, and a panic in a finalizer will terminate the program. If failing to call Stop() was an error then this could be avoided.
(also super-minor; but if a method like Stop() exists, can it be called Close() to be consistent with other cleanup methods in Go?)
The NewGen() proposal requires running generators on a separate go-routine. And similar to x/sync/errgroup: propagate panics and Goexits through Wait #53757 it'll be fiddly to ensure the panic ends up in the right place with the right stack-trace (fixable with runtime support of course).

There is an iterator proposal at #47707 which solves both of these problems by leaning into an API where the author of the iterator controls the iteration instead of the consumer. (This proposal adds iter.NewGen/iter.NewNext that get to the same result, so #47707 gets the same benefits as this proposal but much more directly). To summarize the interface proposed, it looks like this:

type Ranger[T any] interface {
  Range(func (T) bool)
}

// this syntax
for x := r {
  if b(x) { 
     break
  }
}

// becomes equivalent to this:
r.Range(func (x T) bool {
  if b(x) {
     return true // stop iterating
  }
  return false
}

This immediately fixes the problems with Stop (Range can defer() any cleanup it wants to do) and NewGen (the iteration code is running on the current go-routine). The main downside that I see is that the implementor of Range() must correctly handle the boolean return value to stop iteration (and I proposed a possible improvement to that here)

It also has one other ergonomic benefit for authors of collections that can be iterated over: the implementation of Range() can use for-loops internally (with an iteration proposal using Next(), you cannot use a for loop because each item must be produced in a different stack frame; and so you must manually track your position, unless you use NewGen, but that comes with runtime overhead).

On a different tack, I don't think we should push functions like iter.Filter or Reduce into the standard library. Although it lets you build lazy collections easily, the downside of lazy collections are that they give you another way to do the same thing, that requires more allocations and are harder to reason about. Although lazy collections are somewhat interesting, I don't think they're something that we should encourage (though I'm sure someone will build a library to do it regardless :D).

To be more concrete, the problem I saw doing a lot of ruby programming is that Instead of:

for x := range y {
  if z(x) { 
  }

You end up with:

for x := range iter.Filter(y, z) {

}

There are definitely cases where this kind of thing is useful, but it can be trickier to comprehend the control flow (it works backwards down the call chain when you stack these things) and it's a lot more overhead (an extra function call per iteration - on top of any function calls required by the iteration proposal itself - and a new object allocation per clause).

I would probably keep the ToSlice/ToMap methods though I'm not sure that the Iter2[E, error] is a great way of error handling if we expect ToMap to be useful; so there may be more to think about there...

8 replies

ConradIrwin Oct 4, 2022

It's probably just a case of priorities :). As I said encouraging running code on the wrong goroutine without a solution to panic() laddering correctly with the right stack trace is worrying; as it would make the kind of code I write (mostly web servers) less well suited to the problem domain – I care a lot about that, because currently go is very good for web servers.

It's very alarming to me that we'd encourage something like iter.NewGen unless we could fix the panic() on the wrong goroutine/with the wrong stacktrace thing. How would you imagine that is solved?

The obvious solution to people storing the function passed to Range is to disallow that too. A simpler way to say it might be "you must call the callback on the stack that Range was called on" – it's an error to call the callback after the call to Range() has returned, or on a different goroutine.

As with the implementation of net.Conn it seems essential that people creating custom iterators build them in a way that interacts well with the rest of the language. Implementors would be free to use go-routines internally, but would be required to collect results and execute the callback in the right place.

That seems like an easy restriction to code-review for (and maybe even provide something like nettest for) than "generators must not panic". Most importantly any failure to follow the protocol is easy to debug, as it always panics if you do it wrong (whereas panics in generators must be assumed to be as rare as panics in any arbitrary go code).

Merovius Oct 4, 2022

I do not understand how you can propose panicing as an appropriate solution to the problem of panics. I'm also not sure that there even is a set of rules we could specify and enforce which would prevent these problems.

As I said, to me, it feels like you are suggesting to essentially create a different language that this Range method must conform to. I'd consider that a non-starter. YMMV of course.

ConradIrwin Oct 4, 2022

There's probably a missing nuance on the panics: I want bad code to reliably break in development, or test environments - because that's a single user context, so panic()ing consistently when someone calls the callback incorrectly is great. It will be fixed before it gets to production.

On the other hand, go code can (and does) panic due to runtime conditions; I don't want those panics to tear-down the entire server in production if those conditions were not found in development/test.

I'd hardly call it "a different language", functions like strings.TrimFunc are already bound by similar restrictions around when they can call their callback (even if it's not documented, I can't imagine that parallel calls to the callback, or calls after TrimFunc have returned could be allowed without breaking user code).

To push the thought experiment a bit further; it's actually not per-se a problem that you can't call the callback on a different goroutine; it's that each subsequent call to the callback must not happen before the previous call to the callback has returned, and the return of Range must not happen before the last callback has returned. This seems harder to describe and harder to debug (hence the suggestion of "same goroutine before Range returns"), but maybe helps avoid your feeling of "a different language?"

Merovius Oct 4, 2022

I'd hardly call it "a different language", functions like strings.TrimFunc are already bound by similar restrictions around when they can call their callback (even if it's not documented, I can't imagine that parallel calls to the callback, or calls after TrimFunc have returned could be allowed without breaking user code).

I don't think that's the same at all. The language allows races and some library functions might impose restrictions on that basis. But for this case, the language itself has to impose those restrictions. We have to formalize the rules and put them into the spec. Those two scenarios are very different.

Merovius Oct 4, 2022

FWIW I don't think there is anything you can say to convince me. I'm just trying to explain why: It seems like a very invasive change for a relatively small benefit (allowing using range with user-defined containers). And all to avoid something I feel pretty strongly is not a huge problem (crashes on panic, which IMO are the correct behavior - but this is an old argument not worth fighting about).

However, I'm not part of the Go team and not a decider. So not convincing me isn't actually that much of a problem. So, I'm just going to leave this be.

robaho · 2022-09-22T00:48:58Z

robaho
Sep 22, 2022

C# had concurrency from the beginning, it also had iterators (enumerators) and a standard “collections” library for the beginning. Well, .Net 1.1 anyway.

…

On Sep 21, 2022, at 7:18 PM, AndrewHarrisSPU ***@***.***> wrote: So if we don't want to exhaust the iterator, and error handling and closing are not part of the iterator interface, then you are forced to return IterCloser, and IterErrorer values along the iterator in the function signatures. Issues like this have been a stumbling block for many other languages' implementations of iterators, and the evolution of Python and C# to me seem like a useful contrast. Neither did concurrency before iterators, while in Go, we really can't take concurrency out of the language. IIRC both Python and C# have ended up with something like async iterators, with some subtly distinguished behaviors when employing concepts like using, with, try, catch etc. Additionally, both struggled to get to that point, revising decisions along the way, because it meant implementing more language-level stuff around concurrency, which was already complicated retrofitting. But eventually errors in iterators or errors in resources that provide iterators are inevitable, and good ways of managing these errors are needed. With this in mind, my opinion is something like: The basic notion of using and try/catch are reasonable. If it looks like exception handling, good, fine! Go allows us to implement our own exception handling constructs if we really want. In library code, approximations of using and try/catch - just for iterators - could make sense. We don't have to invent new fundamental language here because Go has always had concurrency in mind. Some of this I would argue should be in an iter library, but I'm not sure if any/all of it has to be. Also, because Go has always had concurrency in mind, Go devs won't have as much of a problem wrapping our heads around the edgier cases of concurrent iterator. We would, when using iterators, have to think about more elaborate layers of error handling than we often might. Loosely: type Resource[T any] struct { results Iter2[T, error] } type Results[T any] struct { iter Iter2[T, error] } func Using[T any]( src Resource[T any], catch func(err error)) Results[T] { ... } func Try[T any](results Results[T], catch func(err error)) Iter[T] { ... } To obtain an infallible iterator from Resource[T]: with the Using function, unwrap Resource[T] -> Results[T] by providing a callback for how to handle an error associated with a resource (e.g., the DB connection gets broken) with the Try function, unwrap Results[T] -> Iter[T] by providing a callback for how to handle an error associated with a result (e.g., some element was not found or was invalid) with enough nuance, the catch functions can serialize with iteration, while being concurrent with respect to the resources they are monitoring. In other words, the unwrapped iterator can't continue iterating until a catch function completes execution; that execution can do arbitrary things, like cause any further iteration to Stop(). It's elaborate machinery to implement and use, but I think is flexible enough to be ignored when not needed, and clear when it is needed. — Reply to this email directly, view it on GitHub <#54245 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABF2U4IMCQ6ZOM7PNX7ICZ3V7OQUXANCNFSM55QVDQJA>. You are receiving this because you were mentioned.

2 replies

AndrewHarrisSPU Sep 22, 2022

I think this was meant to be a reply here - It's a fair point that there have always things like task-level parallelism or multithreading in C#. I mean "concurrency" in the sense of what The history of C# calls "baking asynchrony into the language as a first-class participant" (async in C# 5.0). I do not mean "concurrency" in the sense of multi-threading with associated inter-thread communication strategies. I know this is skipping a lot of details, I just mean to underscore that the combination of language-level async and iterators is something that's been well explored (or - well, explored).

robaho Sep 22, 2022

Yea - replied to the email... Not sure why github can't figure that out - or allow repliers to add a marker in the content to start a new thread.

QuantumGhost · 2022-09-22T02:37:27Z

QuantumGhost
Sep 22, 2022

I think we should adopt the following iterator interface first:

// Iter supports iterating over a sequence of values of type `E`.
type Iter[E any] interface {
	// Next returns the next value in the iteration if there is one,
	// and reports whether the returned value is valid.
	// Once Next returns ok==false, the iteration is over,
	// and all subsequent calls will return ok==false.
	Next() (elem E, ok bool)
}

The StopIter interface can be implemented as a wrapper of the above Iter interface, and should NOT be included in the standard library.

As for map and map-like types, I think it's better to have a type Pair[K, V] and let the implementors of map-like types implement Iter[Pair[K, V]] to support iterating with for ... range syntax.

6 replies

seancfoley Sep 26, 2022

@oakad That does not make any sense, because you need something to keep track of what is "next", and attempting to do this in the original object for any and all the possible iterations would not work at all.

Merovius Sep 27, 2022

@seancfoley In defense of what @oakad said, nothing says that the RangeNext() method would have to be on the container. You can still have a Range() interface{ RangeNext() (E, bool) } method on the container itself, returning an iterator.

Both @oakad and @QuantumGhost are just saying "let's only implement part of the proposal" (for different parts respectively). Neither is really contradicting it. However, I'm missing any arguments as to why. Clearly, if we did do only those parts, the first thing would happen is that ~everybody would create an iter library with all the other parts. But they would be worse than what we can do as a 1st party implementation - their map iterators will either have to use reflect, or be inefficient, channel-based iterators and without a clear way to do and use StopIter, the latter would lead to resource leaks…

I don't really see a reason to put this burden on 3rd party implementations. IMO the difficult part is how the standard iterator interface should look - once we've decided on that, all the rest seems to be implied anyways.

oakad Sep 27, 2022

The proposed interface ITT is essentially a version of runtime.chanrecv2, generalized (https://cs.opensource.google/go/go/+/master:src/runtime/chan.go;drc=1e91ffc897efb1ed298753c08f086fbc8f725025;l=446). Right now, compiler knows how to rewrite range statement in terms of the chanrecv; it's not too big of a deal to extend that particular feature.

The rest of the fun "iterator" methods don't appear to add anything to the core language as compared to the current state of affairs.

jhenstridge Sep 27, 2022

@oakad: while that might be sufficient for types similar to channels, it breaks down for types that are more container-like and want separate range loops to each iterate over the same set of values independently (like maps and slices do).

I'd also note that the current proposal, a type that doesn't support independent iteration could act as its own iterator type:

func (c *MyChannel) Range() *MyChannel { return c }
func (c *MyChannel) Next() (elem E, ok bool) { ... }

QuantumGhost Sep 28, 2022

@Merovius

Neither is really contradicting it. However, I'm missing any arguments as to why. Clearly, if we did do only those parts, the first thing would happen is that ~everybody would create an iter library with all the other parts. But they would be worse than what we can do as a 1st party implementation - their map iterators will either have to use reflect, or be inefficient, channel-based iterators and without a clear way to do and use StopIter, the latter would lead to resource leaks…

I'm not against including iterator interface for map or map-like types. I just prefer map iterator with interface below:

type Pair[K, V] struct {
    Key K
    Value V
}

type MapIterator[K, V] interface {
    Next() (Pair[K, V], bool)
}

As for resource management, I prefer the following:

Stop or any other cleanup code shouldn't be part of the standard iterator interfaces. It should be handled by the thing that owns the resource in the first place, e.g.

because I believe we may need more usage of the standard iterator interfaces and find the best way to implement resource management for iterator in practice.

robaho · 2022-09-27T22:21:37Z

robaho
Sep 27, 2022

Saying you don’t need to pass around containers is like saying you don’t need to pass around slices - only iterators on slices. That doesn’t make sense to me. Sorry - too many messages in github so it will no longer move to the proper one to comment… ugh.

…

On Sep 27, 2022, at 4:55 PM, Axel Wagner ***@***.***> wrote: Those would work with a normal iter. It's not trivial, but you can effectively create N iterators, each blocking in their Next call until all N of them have been called and then returning the value from the Next call of the one wrapped iterator. If that makes sense. So, effectively, they walk in lockstep. You can then call the functions in separate goroutines. It's tricky enough that I can't just write it down just now, but it should be possible. You can provide a helper to create a ResetIter from any container <https://go.dev/play/p/1GswJFkjy1i>. And notably, that works with any iterator type returned from Range. So it's still not an argument for why you have to pass around the container itself. Even though I brought up the concern that the mechanism from the proposal doesn't allow passing around containers, I'm coming increasingly around to the fact that it's just not a big deal. — Reply to this email directly, view it on GitHub <#54245 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABF2U4PR7R3X6AIY6TGLFNLWANUMZANCNFSM55QVDQJA>. You are receiving this because you were mentioned.

2 replies

Merovius Sep 28, 2022

Correct, with this proposal, most reasons to pass around slices will go away.

robaho Sep 28, 2022

I think you need to rethink that. An interator does not offer O(1) access semantics - nor the other subslicing features.

firelizzard18 · 2022-10-05T00:44:42Z

firelizzard18
Oct 5, 2022
Collaborator

IMO Iter2 is rather awkward. It leads to lots of duplication (e.g. NewGen and NewGen2) and it only handles iterators that return pairs. IMO it would be far better to have only Iter and a general solution for returning multiple values. If Go had a built-in tuple type it would be a no-brainer - use Iter[Tuple[E1, E2]].

Assuming adding tuples is out of the question, I would prefer a special Pair or Tuple2 type:

// Pair is a 2-tuple. A for-loop ranging over an `Iter[Pair[E1, E2]]` will automagically separate pairs into two loop variables
type Pair[E1, E2] struct{ E1 E1, E2 E2 }


// FromMap returns an iterator over a map.
func FromMap[K comparable, V any](map[K]V) Iter[Pair[K, V]]

0 replies

AndrewHarrisSPU · 2022-10-18T00:17:59Z

AndrewHarrisSPU
Oct 18, 2022

There's a bit of intuition from session types I thought might be interesting regarding fallible iterators, where for each unit of progress, the currently active participant shares their perspective on the state of the session. Not on the kind of type theoretic basis that yields formal proofs, and definitely not to suggest there's a need for semantics that yield formal proofs, but just to borrow that bit of intuition - to sketch out another idea for fallible iterators / revisit some other ideas in a particular way:

For fallible iterators only, make Next require a State to advance. This sort of suggests that the way to implement a fallible iterator is definitely -not- an Iter2[T,error]:

type IterErr[T any] interface {
	Next(State) (T, State)
}

A State is mostly just error. The zero value is Ok:

type State struct {
	Err error
}

var Ok = State{nil}

When consuming an iterator:

	// without compile-time magic, for some IterErr 'it':
	for t, state := it.Next(iter.Ok); state.Err == nil; t, state = it.Next(state) { ... }

	// with compile-time magic, with c.Range returning an IterErr:
	for t, state := range c { ... }

The Stop() method here, for compiler magic, is still of significant importance; it can be understood as something like:

	state.Err = iter.ErrStop
	state = it.Next(state)

(Because state.Err is no longer nil, State is not Ok)

For a generator example, a counting generator, up to a limit:

type counter struct {
	n   int
	lim int
}

func (c *counter) Next(curr iter.State) (i int, next iter.State) {
	switch curr {
	case iter.Ok:
		c.n++
		if c.n > c.lim {
			next = iter.State{errOverLimit}
			return
		}
		return c.n, iter.Ok
	default:
		next = curr
		return
	}
}

--

This looked nicer to me in the details than some alternatives /shrug. Particularly, it allows setting and observing errors that are communicated from consumer to generator, or from generator to consumer -this is possible but messy otherwise.

Changing the signature of Next has been suggested earlier (at least, maybe in other places):
#54245 (comment)
#54245 (comment)

Sentinel errors have been mentioned earlier in a many places. To draw a distinction, the idea here would be to have not exactly sentinel logic, but session logic only for fallible iterators. I think this could still coexist with infallible iterators that work like Iter[T], Iter2[T,U]. Considering the following scenarios, a '+' is already possible or not surprising:

--	fallible generator	infallible generator
fallible-aware loop	+	+*
"infallible" loop	?	+

*A fallible-aware loop just handles errors internally - it may set a State to something that isn't Ok, but I think the compiler can arrange for that to be equivalent to what Stop() does now.

The '?' mark scenario:

One behavior could be that an infallible loop just ignores 'state', and behind the scenes any generator error stops iteration, and the error is silent/lost.

Another behavior could be that it doesn't type check (because for ... range IterErr can't be mistaken for similar constructs using Iter or Iter2).

0 replies

rsc · 2022-10-25T17:29:49Z

rsc
Oct 25, 2022
Maintainer

We split the discussion of language changes out to #56413.

0 replies

ianlancetaylor · 2022-10-25T22:24:42Z

ianlancetaylor
Oct 25, 2022
Collaborator Author

Thanks very much for all the discussion here. It has been very helpful in pointing out a number of difficulties with this approach. We're going to rethink the language changes over at #56413, which notably also rethinks the approach to an iterator Stop method. If the discussion at #56413 goes well, we'll revisit an experimental iterator package based on the ideas that get adopted there. I'm going to close this discussion now. Thanks again.

0 replies

This comment has been hidden.

Sign in to view

This comment has been hidden.

Sign in to view

This comment has been hidden.

Sign in to view

This comment has been hidden.

Sign in to view

This comment has been hidden.

Sign in to view

This comment has been hidden.

Sign in to view

This comment has been hidden.

Sign in to view

This comment has been hidden.

Sign in to view

This comment was marked as off-topic.

Sign in to view

discussion: standard iterator interface #54245

Uh oh!

Uh oh!

ianlancetaylor Aug 4, 2022 Collaborator

Background

What we want from Go iterators

Proposal

iter.Iter

iter.New functions

iterators for standard containers

Functions that accept iterators

Range loops

That is all

Optional future extensions

Examples

Appendix: Iterators in other languages

C++

Java

Python

Discussion

Appendix: Efficient implementation of iter.NewGen

Coroutine idioms

Optimizations

NewGen

Replies: 95 comments · 539 replies

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ianlancetaylor Aug 8, 2022 Collaborator Author

This comment has been hidden.

This comment has been hidden.

This comment has been hidden.

This comment has been hidden.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ianlancetaylor
Aug 4, 2022
Collaborator

Appendix: Efficient implementation of `iter.NewGen`

Replies: 95 comments 539 replies

ianlancetaylor Aug 8, 2022
Collaborator Author