Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spec: add range over int, range over func #61405

Closed
rsc opened this issue Jul 17, 2023 · 417 comments
Closed

spec: add range over int, range over func #61405

rsc opened this issue Jul 17, 2023 · 417 comments

Comments

@rsc
Copy link
Contributor

rsc commented Jul 17, 2023

Following discussion on #56413, I propose to add two new types that a for-range statement can range over: integers and functions.

In the spec, the table that begins the section would have a few more rows added:

Range expression                                   1st value          2nd value

array or slice      a  [n]E, *[n]E, or []E         index    i  int    a[i]       E
string              s  string type                 index    i  int    see below  rune
map                 m  map[K]V                     key      k  K      m[k]       V
channel             c  chan E, <-chan E            element  e  E
integer             n  integer type                index    i int
function, 0 values  f  func(func()bool) bool
function, 1 value   f  func(func(V)bool) bool      value    v  V
function, 2 values  f  func(func(K, V)bool) bool   key      k  K      v          V

Range over integer

If n is an integer type, then for x := range n { ... } would be completely equivalent to for x := T(0); x < n; x++ { ... }, where T is the type of n (assuming x is not modified in the loop body).

The additional spec text for range over integer is:

For an integer n, the index iteration values 0 through n-1 are produced, with the same type as n. If n <= 0, the loop does not run any iterations.

In the Go code I have examined from the main repo, approximately half of existing 3-clause for loops can be converted to range over integer. Loops that don't care about the index variable become even shorter. For example, the canonical benchmark iteration becomes:

for range b.N {
	do the thing being benchmarked
}

3-clause for loops certainly have their place in Go programs, but something as simple as counting to n should be easier. The combination of range over integer and range over functions should make range the for loop form of choice for almost all iteration. When you see a 3-clause loop, you'll know it is doing something unusual and know to examine it more carefully.

Range over function

If f is a function type of the form func(yield func(T1, T2)bool) bool, then for x, y := range f { ... } is similar to f(func(x T1, y T2) bool { ... }), where the loop body has been moved into the function literal, which is passed to f as yield. The boolean result from yield indicates to f whether to keep iterating. The boolean result from f itself is ignored in this usage but present to allow easier composition of iterators.

I say "similar to" and not "completely equivalent to" because all the control flow statements in the loop body continue to have their original meaning. In particular, break, continue, defer, goto, and return all do exactly what they would do in range over a non-function.

Both T1 and T2 are optional: func(yield func(T1)bool) bool and func(yield func()bool) bool can both be iterated over as well. Just like any other range loops, it is permitted to omit unwanted variables. For example if f has type func(yield func(T1, T2) bool) bool any of these are valid:

for x, y := range f { ... }
for x, _ := range f { ... }
for _, y := range f { ... }
for x := range f { ... }
for range f { ... }

The additional spec text for range over function is:

For a function f, the iteration proceeds by calling f with a synthesized yield function that invokes the body of the loop. The values produced correspond to the arguments in successive calls to yield. As with range over other types, it is permitted to declare fewer iteration variables
than there are iteration values. The return value from the yield function reports whether f should continue iterating. For example, if the loop body executes a break statement, the corresponding call to yield returns false. The return value from f is ignored by range but conventionally returns false when yield has returned false, to allow composing a sequence of such functions. The use of a synthesized yield function does not change the semantics of the loop body. In particular, break, continue, defer, goto, and return statements all behave in a loop body ranging over a function exactly as they do in a loop body ranging over other types.

Being able to range over a function enables writing range over user-specified iteration logic,
which should simplify many programs. It should also help make use of custom collection types nicer.

Not all iteration a program wants to do fits into a single for loop. To convert one of these functions to a "next-based" or "pull-based" iterator, we plan to add a function to the standard library that runs the yield-based ("push-based") iterator in a coroutine, along the lines of the coro.Pull function in my recent blog post. The specific name of those functions will be decided in a future proposal; this proposal is limited to language changes, not library changes.

One of the problems identified in the discussion of #56413 was having too many different possible functions to range over. For that reason, this proposal drops the possibility of funcs that do not return bool as well as funcs of the form func() (T, bool) ("pull-iterators"). Push iterators are the standard form with language support and that packages should provide, and something like coro.Pull will allow conversion to pull iterators in code that needs that form.

Implementation

I have implemented this proposal so that we can explore it and understand it better. To run that prototype implementation:

go install golang.org/dl/gotip@latest
gotip download 510541   # download and build CL 510541              
gotip version  # should say "(w/ rangefunc)"                             
gotip run foo.go       

and then use the Go toolchain you just built. If you already have Go checked out and use the codereview plugin, you can:

git change 510541
cd src
./make.bash

(Or git codereview change if you don't have the standard aliases set up.)

@rsc rsc added the Proposal label Jul 17, 2023
@rsc rsc added this to the Proposal milestone Jul 17, 2023
@rsc
Copy link
Contributor Author

rsc commented Jul 17, 2023

Discussion Summary / FAQ

This comment summarizes the discussion so far and answers frequently asked questions.

Last update: July 23, 2023

Can I try running my own programs?

Absolutely. The easiest way right now is to use gotip, like this:

go install golang.org/dl/gotip@latest
gotip download 510541   # download and build CL 510541
gotip version  # should say "(w/ rangefunc)"
gotip run foo.go

We will look into making range-over-int and range-over-func usable in a special mode on the Go playground as well.

Can you provide more motivation for range over functions?

The most recent motivation is the addition of generics, which we expect will lead to custom containers such as ordered maps, and it would be good for those custom containers to work well with range loops.

Another equally good motivation is to provide a better answer for the many functions in the standard library that collect a sequence of results and return the whole thing as a slice. If the results can be generated one at a time, then a representation that allows iterating over them scales better than returning an entire slice. We do not have a standard signature for functions that represent this iteration. Adding support for functions in range would both define a standard signature and provide a real benefit that would encourage its use.

For example, here are a few functions from the standard library that return slices but probably merit forms that return iterators instead:

  • strings.Split
  • strings.Fields
  • bytes variants of the above
  • regexp.Regexp.FindAll and friends

There are also functions we were reluctant to provide in slices form that probably deserve to be added in iterator form. For example, there should be a strings.Lines(text) that iterates over the lines in a text.

Similarly, iteration over lines in a bufio.Reader or bufio.Scanner is possible, but you have to know the pattern, and the pattern is different for those two and tends to be different for each type. Establishing a standard way to express iteration will help converge the many different approaches that exist today.

For additional motivation for iterators, see #54245. For additional motivation specifically for range over functions, see #56413.

Can you provide more motivation for range over integers?

3-clause for loops have a long and venerable history, and they are very flexible, but the majority of the time, they are used to count from 0 through N-1 with no complications. When you stop and think about that, a 3-clause for loop requires a lot of machinery and ceremony just to count from 0 through N-1.

As @seebs noted, there are many ways to complicate a 3-clause for loop, some of them not terribly obvious. Here are a few:

for i := 0; i < N; i++ { /* use i */ }
for i := 1; i < N; i++ { /* use i */ }
for i := 0; i <= N; i++ { /* use i */ }
for i := 1; i <= N; i++ { /* use i */ }
for i := 0; i < len(s); i++ { /* use i */ s := s[1:] }
for i := 0; i < fn(x); i++ { /* use i */ }
for i := 0; i < len(s); s = s[1:] { /* use i */ }

It helps readability of programs if the trivial counting form is conventionally written for i := range N instead of having to examine the loop header (and body!) carefully to make sure it really is the trivial counting form.

Later, @danderson made the point perfectly:

So far I've seen a bunch of mild opposition to range-over-int here, so I'll offer a voice in support instead: since reading this proposal, my fingers have wanted to type for i := range N in code I've been writing at least a half dozen times, and I've felt a passing sadness every time as I write out for i := 0; i < N; i++.

This is not in code cherrypicked for this kind of loop, they just came up in code I was writing (an ACL evaluator, and a database layer). This to me is an "emotional" confirmation of what the proposal's data suggests, that this loop form is extremely common. And while the cognitive load of the status quo is small, it's non-zero and accumulates as code grows: the expanded form is shift-heavy to type on qwerty, and it takes a beat for eyes to parse it and confirm that it's just a standard int loop that's not doing something weird with the bounds, increment terms, or messing with the counter in the loop body.

I would like to be able to use range N, somewhat for the easier typing but moreso because it's quicker to read: range N tells me very quickly that there are no surprising semantics in the loop I'm about to read.

I believe this also argues for keeping range-over-int separate from range-over-func, even if range N could be expressed as a stdlib function iterator. Keeping very common code concise is (IMO) more important than the conceptual cleanliness of cramming both into a single new syntax.

The non-trivial counting forms that are common enough can also be abstracted away using range-over-func. Combined, range-over-func and range-over-int should make the vast majority of loops in Go use the range form, which should lead to more regular, easier to read code.

Why not use a library function for range over ints?

That would be another approach, but counting from 0 through N-1 is incredibly common. If it required importing a separate package to get the range form, probably most code would keep using the 3-clause form. We want to do better. See also the quoted comment from @danderson in the previous answer.

Why not add a new syntax for more sophisticated integer ranges?

More sophisticated integer ranges are much less common than 0 through N-1. For those, the balance tips more easily toward custom functions that can be imported and called when necessary. For example, a very common 3-clause loop in some code bases is counting from N-1 down through 0, to iterate backward over a slice. We could provide that countdown as a function, or we could specialize to slices, with something like

package slices

func Backward(s []E) func(func(int, E) bool) bool {
	return func(yield func(int, E) bool) bool {
		for i := len(s)-1; i >= 0; i-- {
			if !yield(i, s[i]) {
				return false
			}
		}
		return true
	}
}

The ability to do this customization for these cases makes custom functions much more attractive. But lots of code just needs to count up, and that should be as easy as possible.

Why not use range over an interface instead of range over functions?

Today, range makes the decision of how to iterate based only on the underlying type of the range variable, not its methods. In fact, no syntax in Go ever makes use of specific methods. The closest exception is the predefined error interface, but even that never has its Error method called by any Go syntax. People often ask for operator methods, and we have avoided doing that for the same reason: we don't want to privilege specific methods. It is of course possible to create a language structured around such methods (Python is one example), but Go is not designed that way. Nothing in the Go language invokes specific method names.

A second design reason for the decision is that it's more difficult to construct implementations of an interface than it is to just write a function literal. A function like slices.Backward above would have to define a special backwardIterator object holding no state just to have a special method. Or some package would need to define a func-based implementation like http.HandlerFunc. It's much easier just to use functions directly and not worry about needing a different type for each kind of iterator.

There is also a technical incompatibility in using interfaces. If a custom map type (say) defined the special iteration method, then ranging over it in current Go would do a regular map iteration, while ranging over it in a future Go would invoke the special iteration method. That would be a breaking change. Perhaps a desired change, but still a breaking one. That concern is secondary to the basic design principles though.

There is more discussion about the downsides of range over interfaces in #54245. The switch to range over functions was meant to address that directly.

What is a simple example of how range over function runs?

Sure. Consider the Backwards function in the previous answer. It can be invoked as:

s := []string{"hello", "world"}
for i, x := range slices.Backward(s) {
	fmt.Println(i, x)
}

This program would translate inside the compiler to a program more like:

slices.Backward(s)(func(i int, x string) bool {
	fmt.Println(i, x)
	return true
})

The "return true" at the end of the body is the implicit "continue" at the end of the loop body. An explicit continue would translate to "return true" as well. A break statement would translate to "return false" instead. Other control structures are more complicated but still possible.

How are more complicated loops implemented?

Beyond simple break and continue, other control flow (labeled break, continue, goto out of the loop, return) requires setting a variable that the code outside the loop can consult when the loop breaks. For example a "return" might turn into something like "doReturn = true; return false" where the "return false" is the "break" implementation, and then when the loop finishes, other generated code would do "if(doReturn) return".

The full rewrite is explained at the top of cmd/compile/internal/rangefunc/rewrite.go in the implementation.

What if the iterator function ignores yield returning false?

We'll get there. Keep scrolling down.

Why are yield functions limited to at most two arguments?

There has to be a limit; otherwise people file bugs against the compiler when it rejects ridiculous programs. If we were designing in a vacuum, perhaps we would say it was unlimited but that implementations only had to allow up to 1000, or something like that.

However, we are not designing in a vacuum: go/ast and go/parser exist, and they can only represent and parse up to two range values. We clearly need to support two values to simulate existing range usages. If it were important to support three or more values, we could change those packages, but there does not appear to be a terribly strong reason to support three or more, so the simplest choice is to stop at two and leave those packages unchanged. If we find a strong reason for more in the future, we can revisit that limit.

Another reason to stop at two is to have a more limited number of function signatures for general code to define. Standard library changes are explicitly out of scope for this proposal, but having only three signatures means a package can easily define names for all three, like perhaps:

package iter

type Seq0 func(yield func() bool) bool
type Seq[V any] func(yield func(V) bool) bool
type Seq2[K, V any] func(yield func(K, V) bool) bool

Why are standard library changes out of scope for this proposal?

This proposal is limited to the language change. If we add standard library changes to the scope, it will be that much harder to converge. There will be other proposals for the standard library. Probably they will include some kind of useful iterator definitions, as well as iterator-returning forms of the functions mentioned earlier, like strings.Split and regexp.Regexp.FindAll. For now, though, please limit discussion to the language change.

We will comment on this issue with pointers to library change proposals when they are posted. We are still working through some details in the draft.

What will idiomatic APIs with range functions look like?

We don't know yet, and that's really part of the eventual standard library proposal. However, you could imagine a container like a binary tree implementing an All method that returns an iterator:

func (t *Tree[V]) All() iter.Seq[V]

We would like to establish a name like that, probably All, as the default "all items" iterator. Specific containers might provide others as well. Maybe a list would provide backward iteration too:

func (l *List[V]) All() iter.Seq[V]
func (l *List[V]) Backward() iter.Seq[V]

These examples are meant to show that the library can be written in a way that should make these kinds of functions readable and understandable. Please don't focus on the exact details. Again, that will be the focus of other proposals, which we will link here when they are ready.

Will Go programs using range over functions be readable?

We think they can be. For example using slices.Backward instead of the explicit count-down loop should be easier to understand, especially for developers who don't see count-down loops every day and have to think carefully through the boundary conditions to make sure they are correct.

It is true that the possibility of range over a function means that when you see range x, if you don't know what x is, you don't know exactly what code it will run or how efficient it will be. But slice and map iteration are already fairly different as far as the code that runs and how fast it is, not to mention channels. Ordinary function calls have this problem too - in general we have no idea what the called function will do - and yet we find ways to write readable, understandable code, and even to build an intuition for performance.

The same will certainly happen with range over functions. We will build up useful patterns over time, and people will recognize the most common iterators and know what they do.

What is the bool result from the iterator function for?

(TL;DR: It is a mistake that hasn't been deleted from the proposal yet.)

The bool result from the iterator function indicates whether the iterator finished. If yield returns false, the iterator should stop and return false. Otherwise it should return true. The intent was that the bool result would make iterator composition easier, but it looks like the only operation it helps is concatenating iterators, as in:

func Concat[V any](seqs ...iter.Seq[V]) iter.Seq[V] {
	return func(yield func(V) bool) bool {
		for _, seq := range seqs {
			if !seq(yield) {
				return false
			}
		}
		return true
	}
}

But this function could just as easily be written:

func Concat[V any](seqs ...iter.Seq[V]) iter.Seq[V] {
	return func(yield func(V) bool) {
		for _, seq := range seqs {
			for _, v := range seq {
				if !yield(v) {
					return
				}
			}
		}
	}
}

So the bool result does not seen entirely justified.

A second example of composition is iterating a binary tree:

func (t *Tree[V]) All(yield func(V) bool) bool {
	return t == nil ||
		t.left.All(yield) && yield(t.value) && t.right.All(yield)
}

Having the boolean result avoids writing a second helper function, because All is the iterator (as opposed to returning the iterator). However, writing programs using these functions, it quickly becomes confusing that sometimes parens are needed to obtain the iterator (range slices.Backward(x)) and sometimes they are not (range t.All). It seems less confusing if every method used in a range statement returns iterators instead of being iterators. It's also more readable, since these methods can explicitly return iter.Seq[V] in their public API instead of spelling out the iterator function signature. That means t.All should return an iterator, not be an iterator, which requires the helper function anyway, and the helper can use a bool even if the standard signature does not:

func (t *Tree[V]) All() iter.Seq[V] {
	return func(yield func(V) bool) { t.all(yield) }
}

func (t *Tree[V]) all(yield func(V) bool) bool {
	return t == nil ||
		t.left.all(yield) && yield(t.value) && t.right.all(yield)
}

When All was itself an iterator (as opposed to returning an iterator), there was a useful pun on the name All in that it could equivalently be defined as:

// All reports whether f returns true for all values in v.
func (t *Tree[V]) All(f func(V) bool) bool

This is like Python's All. However, if All returns an iterator, the opportunity for the useful pun goes away, and so does whatever small nudge it provided for a bool result.

Since all three reasons for a bool result from the iterator don't look terribly compelling, we should probably remove it. The proposal is not yet updated to reflect that.

What if the iterator function ignores yield returning false?

The current prototype does not notice, because the added check was causing problems with inlining, and we wanted to demonstrate that the performance of these range-over-func loops could be competitive with hand-written loops. For the real implementation, we will make sure the check happens, with a panic on misuse, without noticeably hurting performance.

What if the iterator function saves yield and calls it after returning?

The current prototype does not notice. Again, the real implementation will.

What if the iterator function calls yield on a different goroutine?

This is an open question that is not yet decided.

We have gone back and forth about whether this should be allowed. In general Go tries very hard to avoid exposing any notion of goroutine identity, and it would be strange to define that yield must be called on same goroutine that the function iterator was invoked on. It would be stranger still to notice and panic, since that would provide a way to create functions that only run on a single goroutine. Even so, maybe it is worth doing.

What if the iterator function calls yield on multiple goroutines simultaneously?

This is an open question that is not yet decided.

In general that would cause a race, and running the race detector would catch the problem. However, if the loop body does not write to any shared state and does not execute any special control flow other than continue (no break, no return, no goto), that code probably would not have any races. It is unclear whether we should disallow that kind of use (probably) and whether we can catch it efficiently (perhaps not). If we did decide that yield should only be called on its original goroutine, that would also eliminate the possibility of use on multiple goroutines simultaneously.

Can we write a vet check for misuse of a yield function?

We have not looked into this, but almost certainly yes. With the runtime checks, however, it may not be necessary. The initial version probably will not include this check. But maybe.

What do stack traces look like in the loop body?

The loop body is called from the iterator function, which is called from the function in which the loop body appears. The stack trace will show that reality. This will be important for debugging iterators, aligning with stack traces in debuggers, and so on.

Why are the semantics not exactly like if the iterator function ran in a coroutine or goroutine?

Running the iterator in a separate coroutine or goroutine is more expensive and harder to debug than having everything on one stack. Since we're going to have everything on one stack, that fact will change certain visible details. We just saw the first: stack traces show the calling function and the iterator function interleaved, as well as showing the explicit yield function that does not exist on the page in the program.

It can be helpful to think about running the iterator function in its own coroutine or gorotine as an analogy or mental model, but in some cases the mental model doesn't give the best answer, because it uses two stacks, and the real implementation is defined to use one.

What happens if the loop body defers a call? Or if the iterator function defers a call?

If a range-over-func loop body defers a call, it runs when the outer function containing the loop returns, just as it would for any other kind of range loop. That is, the semantics of defer do not depend on what kind of value is being ranged over. It would be quite confusing if they did. That kind of dependence seems unworkable from a design perspective. Some people have proposed disallowing defer in a range-over-func loop body, but this would be a semantic change based on the kind of value being ranged over and seems similarly unworkable.

The loop body's defer runs exactly when it looks like it would if you didn't know anything special was happening in range-over-func.

If an iterator function defers a call, the call runs when the iterator function returns. The iterator function returns when it runs out of values or is told to stop by the loop body (because the loop body hit a "break" statement which translated to "return false"). This is exactly what you want for most iterator functions. For example an iterator that returns lines from a file can open the file, defer closing the file, and then yield lines.

The iterator function's defer runs exactly when it looks like it would if you didn't know the function was being used in a range loop at all.

This pair of answers can mean the calls run in a different time order than the defer statements executed, and here the goroutine analogy is useful. Think of the main function running in one goroutine and the iterator running in another, sending values over a channel. In that case, the defers can run in a different order than they were created because the iterator returns before the outer function does, even if the outer function loop body defers a call after the iterator does.

What happens if the loop body panics? Or if the iterator function panics?

The deferred calls run in the same order for panic that they would in an ordinary return: first the calls deferred by the iterator, then the calls deferred by the loop body and attached to the outer function. It would be very surprising if ordinary returns and panics ran deferred calls in different orders.

Again there is an analogy to having the iterator in its own goroutine. If before the loop started the main function deferred a cleanup of the iterator, then a panic in the loop body would run the deferred cleanup call, which would switch over to the iterator, run its deferred calls, and then switch back to continue the panic on the main goroutine. That's the same order the deferred calls run in an ordinary iterator, even without the extra goroutine.

See this comment for a more detailed rationale for these defer and panic semantics.

What happens if the iterator function recovers a panic in the loop body?

This is an open question that is not yet decided.

If an iterator recovers a panic from the loop body, the current prototype allows it to invoke yield again and have the loop keep executing. This is a difference from the goroutine analogy. Perhaps it is a mistake, but if so it is a difficult one to correct efficiently. We continue to look into this.

Can range over a function perform as well as hand-written loops?

Yes. Consider the slices.Backward example again, which first translates to:

slices.Backward(s)(func(i int, x string) bool {
	fmt.Println(i, x)
	return true
})

The compiler can recognize that slices.Backward is trivial and inline it, producing:

func(yield func(int, E) bool) bool {
	for i := len(s)-1; i >= 0; i-- {
		if !yield(i, s[i]) {
			return false
		}
	}
	return true
}(func(i int, x string) bool {
	fmt.Println(i, x)
	return true
})

Then it can recognize a function literal being immediately called and inline that:

{
	yield := func(i int, x string) bool {
		fmt.Println(i, x)
		return true
	}
	for i := len(s)-1; i >= 0; i-- {
		if !yield(i, s[i]) {
			goto End
		}
	}
End:
}

Then it can devirtualize yield:

{
	for i := len(s)-1; i >= 0; i-- {
		if !(func(i int, x string) bool {
			fmt.Println(i, x)
			return true
		})(i, s[i]) {
			goto End
		}
	}
End:
}

Then it can inline that func literal:

{
	for i := len(s)-1; i >= 0; i-- {
		var ret bool
		{
			i := i
			x := s[i]
			fmt.Println(i, x)
			ret = true
		}
		if !ret {
			goto End
		}
	}
End:
}

From that point the SSA backend can see through all the unnecessary variables and treats that code the same as

for i := len(s)-1; i >= 0; i-- {
	fmt.Println(i, s[i])
}

This looks like a fair amount of work, but it only runs for simple bodies and simple iterators, below the inlining threshold, so the work involved is small. For more complex bodies or iterators, the overhead of the function calls is insignificant.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/510540 mentions this issue: runtime: add support for range-over-func

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/510541 mentions this issue: cmd/compile: implement range over func

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/510539 mentions this issue: cmd/compile, runtime: make room for rangefunc defers

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/510538 mentions this issue: cmd/compile: implement range over integer

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/510536 mentions this issue: cmd/compile: trim range typechecking

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/510537 mentions this issue: cmd/compile, go/types: typechecking of range over int, func

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/510535 mentions this issue: spec: changes for range over int, func

@seebs
Copy link
Contributor

seebs commented Jul 17, 2023

The func part seems like it'd be easier to reason about with a usage example showing the body of a function that would work here.

If I've understood this correctly:

func rangeTen(yield func(int) bool) {
    for i := range 10 {
        if !yield(i) { return }
    }
}

...

for i := range rangeTen {
    if i == 6 { break }
    fmt.Println(i)
}

would print 0, 1, 2, 3, 4, and 5, then the call to yield(6) would return false and the loop would terminate?

Is there a particular reason to cap functions at two arguments, rather than just taking arbitrary functions which return bool?

@rsc
Copy link
Contributor Author

rsc commented Jul 17, 2023

@seebs, yes, you're right about the rangeTen example.

If we cap the number of variables at two, then no changes are required in go/ast or go/parser, and libraries that want to support iteration functions know the 3 signatures they need to support.

@seebs
Copy link
Contributor

seebs commented Jul 17, 2023

Hmm. So, basically, break makes yield() return false, and continue makes yield() return true immediately. But now I'm thinking:

outer:
for a := range f1 {
    for b := range f2 {
        ...
        continue outer
    }
}

So... does the yield call in f2 return at all when we reach continue outer? Does anything enforce that the function then does the right thing? My concern is that a function could conceivably have cleanup code which needs to run after the loop, for instance. Or is that prohibited?

Basically, what happens if the user is holding it wrong?

Thinking about it more:

func rangeTen(yield func(int) bool) {
    for i := range 10 {
        if !yield(i) { break }
    }
    // this is a stand-in for "cleanup code we have to do after the loop"
    fmt.Println("got here")
}

...

for i := range rangeTen {
    if i == 6 { break }
    fmt.Println(i)
}

Does this print got here? If it doesn't, the control flow is a little surprising. If it does, this feels noticably unlike what I initially envisioned as "the break, continue, defer, goto, and return statements behave exactly as they do in a loop body ranging over other types". Because on the one hand, I expect the break to just instantly transfer control to just past the loop, but on the other hand, I expect the rangeTen function to actually finish its execution.

@danderson
Copy link
Contributor

The proposed spec change says As with range over other types, it is permitted to declare fewer iteration variables than there are iteration values.

I read that to mean: if I'm writing a library that provides a map-ish data structure, and I want to allow the consumer to do all of for range ..., for k := range ... and for k, v := range ..., I would only have to implement one iterator function, Iter(yield func(K, V) bool), and unnecessary yielded values will be discarded to fit the caller's desires.

Is that correct and the way you'd expect library authors to work with this language feature? Or are we going to end up having to provide IterFoo0, IterFoo1 and IterFoo2 variants for everything?

@cespare
Copy link
Contributor

cespare commented Jul 17, 2023

I, too, would find it easier to understand the proposal with more examples. Perhaps you can give some examples of the code you're referring to with:

Being able to range over a function enables writing range over user-specified iteration logic, which should simplify many programs.

and

It should also help make use of custom collection types nicer.


Do you think that the implementation of this feature will be able to be faster than 1 full function call per iteration?

@rittneje
Copy link

@seebs In your example:

func rangeTen(yield func(int) bool) {
    for i := range 10 {
        if !yield(i) { return }
    }
    fmt.Println("got here")
}

...

for i := range rangeTen {
    if i == 6 { break }
    fmt.Println(i)
}

Did you mean to use break in rangeTen instead of return?

@seebs
Copy link
Contributor

seebs commented Jul 17, 2023

... yes.

@apparentlymart
Copy link

apparentlymart commented Jul 17, 2023

I was just writing some custom collection types last week and had no good solutions for dealing with iteration over members, so this proposal is timely!

The ability to use a function with range does seem like it would fill the gap, but I'd like to see an example in the proposal of what you'd consider to be an idiomatic use of range-over-function with a hypothetical collection type. Specifically, I'm interested to see your viewpoint on:

  • Should the function used with range typically be a package-level function which takes the collection as an argument, or a method where the receiver argument is therefore implied?
  • What would you consider to be a good name for such a function?

Or, more concretely, what exactly would the for ... range loop for this collection type look like at the call site, including a realistic hypothetical name?


For the sake of example I'm thinking about a hypothetical "set of T" type; let's call it type Set[T any]. If I have a Set[string] value in a variable s, what might that look like at the callsite?

As a method func (s Set[T]) Range(func yield func(T) bool):

for name := range s.Range {
    // ...
}

...or as a function in package sets, func Range[T comparable](s Set[T])() func yield func(T) bool:

for name := range sets.Range(s) {
    // ...
}

Is Range a good name to use? I'm not sure!

Edit: when I first wrote this I had a thinko and for some reason wrote Rust-style for loops. I've edited it to now actually use Go for ... range loops. Sorry for the misfire!


Edit edit: A later rsc comment implicitly answers this question without directly answering it. 😀

From that comment I infer that:

  • Methods are the most likely way to offer "iterator functions" for a collection type, rather than package-level functions that take a collection and return another function. In practice that seems to mean using a method value as the range operand, so that it's automatically bound to the receiver.
  • The method should be named after what set of values the iterator will expose and, where relevant, what order it will do that in. For example, s.All for "all elements of the set".
for name := range s.All {
    // ...
}

That does seem pretty reasonable to me, though intentionally not including anything in the name representing "iterator" or "range" does mean that someone looking at the godoc for a package will presumably just need to be familiar with the pattern that any function which takes a yield function as its only argument is intended for use with for ... range. That doesn't seem like a show-stopper though, and presumably package authors will write doc comments explaining what these methods are intended to be used for.

@neild
Copy link
Contributor

neild commented Jul 17, 2023

... yes.

In that case, this prints 0, 1, 2, 3, 4, 5, "got here".

(You can test this with the draft implementation. Although that seems to require rangeTen to return bool, which is not in the proposal.)

The "got here" will execute if the range rangeTen exits due to finishing the range, return, break, or goto. It won't execute if it exits due to a panic, but a defer in rangeTen will still execute in that case.

@jimmyfrasche
Copy link
Member

It would be great if there were a playground for this. Building a custom tool chain isn't too hard but it's fairly involved if it's not something you've done before and even if you have it's a bit much if you want to just try 2 or 3 small things.

@seebs
Copy link
Contributor

seebs commented Jul 17, 2023

This seems like a possibly-surprising thing, because I think my current intuition is that if I'm doing a loop, and i execute break, i will just unconditionally and immediately be outside the loop. The category of "code which can be executed after a break but before execution resumes past the loop" is new, I think? I think it's necessary, but it's slightly surprising. I think less surprising than the alternative of not returning from yield would be.

But then I'm uncertain what I expect to happen in the continue outer case. Because that feels very much like it should just be causing the outer loop's yield to return true... So it's sort of like an implicit break from any loops between the current execution and the loop we're continuing, but nothing else between those happens. So we sort of go up the line causing yield to return false, and as soon as the range-iteration function that called it exits, we jump on to the next one and cause its yield to return false also?

@thepudds
Copy link
Contributor

thepudds commented Jul 18, 2023

It would be great if there were a playground for this. Building a custom tool chain isn't too hard but it's fairly involved if it's not something you've done before

Agreed a playground would be nice, but FWIW, it is pretty straightforward to try out a CL just by using the gotip wrapper utility, which will install a parallel Go toolchain in $HOME/sdk/gotip (without impacting your main Go toolchain).

To try out these changes with the CL Russ suggested:

$ go install golang.org/dl/gotip@latest
$ gotip download 510541            # download and build CL 510541              
$ gotip version                                       
$ gotip run foo.go                 # use 'gotip' in place of the 'go' command

@ianlancetaylor
Copy link
Contributor

func rangeTen(yield func(int) bool) {
    for i := range 10 {
        if !yield(i) { break }
    }
    // this is a stand-in for "cleanup code we have to do after the loop"
    fmt.Println("got here")
}

for i := range rangeTen {
    if i == 6 { break }
    fmt.Println(i)
}

Does this print got here?

Yes. The break statement in the range rangeTen loop will cause the yield function called by rangeTen to return false.

@ianlancetaylor
Copy link
Contributor

@danderson

I read that to mean: if I'm writing a library that provides a map-ish data structure, and I want to allow the consumer to do all of for range ..., for k := range ... and for k, v := range ..., I would only have to implement one iterator function, Iter(yield func(K, V) bool), and unnecessary yielded values will be discarded to fit the caller's desires.

Yes.

@ianlancetaylor
Copy link
Contributor

This seems like a possibly-surprising thing, because I think my current intuition is that if I'm doing a loop, and i execute break, i will just unconditionally and immediately be outside the loop. The category of "code which can be executed after a break but before execution resumes past the loop" is new, I think? I think it's necessary, but it's slightly surprising. I think less surprising than the alternative of not returning from yield would be.

There are two different loops here, and they should be considered separately.

rangeTen is just a function that calls yield up to ten times until it returns false. If yield returns false (and doesn't panic or call runtime.Goexit) then the loop in rangeTen just keeps going as usual. The fact that rangeTen happens to be invoked by another for range loop is irrelevant. rangeTen acts like ordinary Go code.

What is new here is that if the for range loop over rangeTen exits early, then yield in rangeTen returns false. But other than that detail, rangeTen is just ordinary Go code.

And, yes, if we break out of multiple for range loops over push functions at once, then the yield function for each of those push functions will return false, and when those functions have returned we will continue with the break in the outer loop. If those functions fail to return, that would be weird, but the program will just carry on until they eventually do return.

@ianlancetaylor
Copy link
Contributor

Here is a possibly illustrative (but untested) example.

// Tree is a binary tree.
type Tree[E any] struct {
	val         E
	left, right *Tree
}

// All may be used in a for/range loop to iterate
// through all the values of the tree.
// This implementation does an in-order traversal.
func (t *Tree[E]) All(yield func(E) bool) {
	t.doAll(yield)
}

// doAll is a helper for All, to make it easier
// to know when the iteration stopped in a subtree.
func (t *Tree[E]) doAll(yield func(E) bool) bool {
	if t == nil {
		return true
	}
	return t.left.doAll(yield) && yield(t.val) && t.right.doAll(yield)
}

Here is an example of using that method to sum all the values in a tree.

// SumTree returns the sum of all the elements in t.
func SumTree(t *Tree[int]) int {
	s := 0
	for v := range t.All {
		s += v
	}
	return s
}

@seebs
Copy link
Contributor

seebs commented Jul 18, 2023

Hmm. Okay, so, the actual implementation of break foo or continue foo might be more complex, but this is expressed as "if a loop is supposed to terminate, its yield function returns false". But you may then be in a state where, once the function returns, you immediately cause the next yield function in the sequence to return. But that is implementation-difficulty, it's invisible to the user; the break/continue just work as expected.

Perhaps it should be specified what, if anything, happens if you call yield again after it returned false; I think I'd expect a runtime panic?

@jba
Copy link
Contributor

jba commented Jul 18, 2023

The proposal should specify what happens if the yield function escapes and then is called.

@rittneje
Copy link

@rsc I built the toolchain from your branch as per your instructions (using make.bash) but it doesn't work.

$ ./go/bin/go version
go version devel go1.21-3bf145578f Mon Jul 17 17:18:06 2023 -0400 linux/amd64
$ ./go/bin/go run range.go 
# command-line-arguments
./range.go:8:17: cannot range over 5 (untyped int constant)

range.go

package main

import (
	"fmt"
)

func main() {
	for i := range 5 {
		fmt.Println(i)
 	}
}

What am I missing?

@qiulaidongfeng
Copy link
Member

I successfully compiled using gotip download 510541 and output the expected results.
Code copy for @rittneje

package main

import (
	"fmt"
)

func main() {
	for i := range 5 {
		fmt.Println(i)
 	}
}

My go version output go version devel go1.21-3bf145578f Mon Jul 17 17:18:06 2023 -0400 windows/amd64.

@gophun
Copy link

gophun commented May 16, 2024

@DmitriyMV It was tracked by this issue here, see the "Change https://go.dev/cl/XXX mentions this issue" comments, with the last one being #61405 (comment)

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/584596 mentions this issue: cmd/compile: for rangefunc, add checks and tests, fix panic interactions

@dmitshur dmitshur modified the milestones: Backlog, Go1.23 May 17, 2024
@DmitriyMV
Copy link
Contributor

DmitriyMV commented May 18, 2024

@dr2chase

Is this code https://go.dev/play/p/ikPyKMASpKv?v=gotip spec compliant? Is not using range-over-func for {} syntax in iterators and syncing access to the final for body yield call results in correct code? I tried running it with gotip download 584596 && gotip run -race main.go and it detected no errors. If I start using for v := range seq in iterators it (as expected) fails.

@dr2chase
Copy link
Contributor

@DmitriyMV Looking at the code, I think it should fail, but when I run it with all the checks enabled (in a CL that has not yet landed) it does not, nor does it tweak the race detector when I try that.

As a general rule, an iterator that runs loop bodies in parallel is likely to fail, because the misuse detector assumes serial execution. There are shared state and checking variables that both trigger races and probably have "wrong values" because of concurrent updates. The reason for this restrictive assumption is that supporting parallel execution would generate questions that we don't want to figure out answers to, and would almost add so much overhead that the normal, serial, case would be come uninteresting. Parallel execution is simply not the use case for this. (In a language with a much more exotic type system I can imagine this happening, but Go is not that language, and I'm not sure one like that even exists outside of research.)

gopherbot pushed a commit that referenced this issue May 21, 2024
Modify rangefunc #next protocol to make it more robust

Extra-terrible nests of rangefunc iterators caused the
prior implementation to misbehave non-locally (in outer loops).

Add more rangefunc exit flag tests, parallel and tricky

This tests the assertion that a rangefunc iterator running
in parallel can trigger the race detector if any of the
parallel goroutines attempts an early exit.  It also
verifies that if everything else is carefully written,
that it does NOT trigger the race detector if all the
parts run time completion.

Another test tries to rerun a yield function within a loop,
so that any per-line shared checking would be fooled.

Added all the use-of-body/yield-function checking.

These checks handle pathological cases that would cause
rangefunc for loops to behave in surprising ways (compared
to "regular" for loops).  For example, a rangefunc iterator
might defer-recover a panic thrown in the syntactic body
of a loop; this notices the fault and panics with an
explanation

Modified closure naming to ID rangefunc bodies

Add a "-range<N>" suffix to the name of any closure generated for
a rangefunc loop body, as provided in Alessandro Arzilli's CL
(which is merged into this one).

Fix return values for panicky range functions

This removes the delayed implementation of "return x" by
ensuring that return values (in rangefunc-return-containing
functions) always have names and translating the "return x"
into "#rv1 = x" where #rv1 is the synthesized name of the
first result.

Updates #61405.

Change-Id: I933299ecce04ceabcf1c0c2de8e610b2ecd1cfd8
Reviewed-on: https://go-review.googlesource.com/c/go/+/584596
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Tim King <taking@google.com>
@DmitriyMV
Copy link
Contributor

DmitriyMV commented May 22, 2024

@dr2chase the reason I'm asking this is that calls to trasformed for body using yield call happen only in the last part (inside closure defined in Sync function). All other functions use traditional Go syntax that you can use today: that is yield calls proper Go functions and not transformed Go for body. And since the yield call to for body happens in synchronized environment (yield call in Sync closure is mutex protected so it's sequential and exited boolean variable will not allow it to be called again if previous yield returned false) all this gave me the idea that this the correct approach.

I understand that this approach is also quite fragile because if any iterator in the chain between All and Sync decides to use range-over-func syntax in their own closures, it will fail and stop being correct. But otherwise in current form, what does this code does wrong and why?

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/589397 mentions this issue: spec: document for range loop over functions

gopherbot pushed a commit that referenced this issue Jun 4, 2024
For #61405.
Fixes #65237.

Change-Id: Ia7820c0ef089c828ea7ed3d2802c5185c945290e
Reviewed-on: https://go-review.googlesource.com/c/go/+/589397
TryBot-Bypass: Robert Griesemer <gri@google.com>
Reviewed-by: Robert Griesemer <gri@google.com>
Auto-Submit: Robert Griesemer <gri@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/590756 mentions this issue: doc: mark range-over-func as documented

gopherbot pushed a commit that referenced this issue Jun 5, 2024
The language change for the accepted range-over-func proposal #61405
was documented in CL 590616. Remove the corresponding 'TODO' entry.

Also improve formatting slightly, and switch to preferred relative
links. They'll work better in the long term and in more contexts.

While here, also simplify the suggested line to preview release notes
locally: setting the -content='' flag explicitly is no longer required
as of CL 589936.

For #65614.

Change-Id: I6cee951b9ede33900bca48c9f709e3b2c5e87337
Reviewed-on: https://go-review.googlesource.com/c/go/+/590756
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Robert Griesemer <gri@google.com>
@haraldrudell
Copy link

I have a late-night suggestion:

if the iterator function has an optional error pointer argument, then an error occurring during iteration can be conveniently collected by the enclosing function outside the for loop. That’s a headache with pull-iterators

— people like Harald that spend their days iterating over SQL and locked-up nfs thinks this would be great

An enclosing function returning the error would look like:

func checkEverything() (err error) {
  // select.Star is an iterator-function method
  for i, x := range select.Star(s, &err) {
    …
  }
  if err != nil {
    parl.Log("SQL sppelling error")
    return
  }
}

// iterator code would look like:
if err != nil && errp != nil {
  *errp = err
} else {
  panic(err)
}

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/592935 mentions this issue: doc/next: add motivation and details links for range-over-func iterators

@thediveo
Copy link

— people like Harald that spend their days iterating over SQL and locked-up nfs thinks this would be great

must be a different Harald, as I managed to avoid SQL alsmost completely so far. I'm the one doing ugly Linux kernel namespace acrobatics, locking go routines all the time 😀

gopherbot pushed a commit that referenced this issue Jun 17, 2024
The "Changes to the language" section at the top of the release notes
will likely ultimately include more explanation about iterators, or at
least, the Go project will likely publish additional introductory
material on iterators on the blog and so on.

As a perhaps temporary step given current interest, this CL updates the
release notes with two additional links for details and motivation.

The new package documentation for the iter package is up-to-date,
precise, and also more accessible than the language spec, while the 2022
pre-proposal GitHub discussion starts with perhaps the most compelling
motivation writeup so far. (We purposefully include "2022" in the text
to help illustrate this was not the result of an overly hasty process).

We also update the target of the existing language spec reference to be
closer to the new material.

For #61405.

Change-Id: I4bc0f99c40f31edfc5c0e635dca5f844b26b6eeb
Reviewed-on: https://go-review.googlesource.com/c/go/+/592935
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
@kodeyeen
Copy link

kodeyeen commented Jul 21, 2024

Will function iterators work with spread operator?
For example, if we have a container data structure with method Add(items ...Item) and we want to add items from another container we could write ctr1.Add(ctr2.All()...).
Or append them to a slice:
s = append(s, ctr.All()...)

@thediveo
Copy link

My understanding is that they work with the range operator, but the spread syntax doesn't support it. What would be the semantics for two-return value iterators, considering not only the index,v and k,v and v,err use cases?

@Merovius
Copy link
Contributor

Merovius commented Jul 21, 2024

Also, what about this?

func main() {
	var x = []int{1,2,3,4,5}
	F(slices.Values(x)...)
}

func F(x ...int) {
	x[0] = 42
}

Or what about this?

func main() {
	var x = func(yield func(int) bool) {
		for i := 0; ; i++ {
			if !yield(i) {
				return
			}
		}
	}
	F(x...)
}

func F(x ...int) {
	fmt.Println(len(x))
}

@fzipp
Copy link
Contributor

fzipp commented Jul 21, 2024

Or append them to a slice: s = append(s, ctr.All()...)

@kodeyeen
For this use case there is slices.AppendSeq: s = slices.AppendSeq(s, ctr.All())

@kodeyeen
Copy link

kodeyeen commented Jul 21, 2024

Ok, get it

Also, what about this?

func main() {
	var x = []int{1,2,3,4,5}
	F(slices.Values(x)...)
}

func F(x ...int) {
	x[0] = 42
}

Or what about this?

func main() {
	var x = func(yield func(int) bool) {
		for i := 0; ; i++ {
			if !yield(i) {
				return
			}
		}
	}
	F(x...)
}

func F(x ...int) {
	fmt.Println(len(x))
}

In the first one x would be a slice . So you're just changing first element of the slice.
The second one will block forever.

But I agree with thediveo it's unclear what to do with iter.Seq2 functions. Maybe allow only iter.Seq.

@ansiwen
Copy link
Contributor

ansiwen commented Aug 15, 2024

@rsc Sorry for being very much too late to the party, but could you or someone please point me to the motivation, why it has been decided that push iterator functions are the standard interface, and pull iterator functions are realized by a (rather complicated) stdlib implementation?

Given the fact that pull iterator functions are trivially wrapped into push iterator functions, I naively assumed the opposite (making pull iterator functions the standard) would be the more natural way to go. I'm sure there are good reasons, but there is quite some discussion spread everywhere and I couldn't find it in a reasonable time. Thanks!

@bitfield
Copy link

why it has been decided that push iterator functions are the standard interface

According to #61897:

"The vast majority of the time, push iterators are more convenient to implement and to use, because setup and teardown can be done around the yield calls rather than having to implement those as separate operations and then expose them to the caller. Direct use (including with a range loop) of the push iterator requires giving up storing any data in control flow, so individual clients may occasionally want a pull iterator instead. Any such code can trivially call Pull and defer stop."

In other words, if you're implementing a pull iterator, it has to have a 'next()' method and some place to keep its data and keep track of the iterator's progress in between calls to next(). That's a lot of extra machinery compared to just calling yield in the middle of some loop. Plus, most people really don't want to call next() manually; they'd rather just range over the results. So if it doesn't make any difference to users, but it's much easier to implement pull iterators than push iterators (and you already have coroutines), this choice makes sense.

(Interestingly, or not, Rust uses pull iterators, but is now in the process of adding coroutines and push iterators.)

@ianlancetaylor
Copy link
Contributor

@ansiwen See also all the complexities we got into with stopping an iterator in the earlier discussion #54245. Using push iterators lets for/range cleanly handle the stopping issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Accepted
Development

No branches or pull requests