Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: Go 2: add a new iterator syntax, package, interfaces #54047

Closed
earthboundkid opened this issue Jul 25, 2022 · 12 comments
Closed

proposal: Go 2: add a new iterator syntax, package, interfaces #54047

earthboundkid opened this issue Jul 25, 2022 · 12 comments
Labels
FrozenDueToAge LanguageChange Suggested changes to the Go language Proposal v2 An incompatible library change
Milestone

Comments

@earthboundkid
Copy link
Contributor

earthboundkid commented Jul 25, 2022

Author background

  • Would you consider yourself a novice, intermediate, or experienced Go programmer?

Experience Go programmer.

  • What other languages do you have experience with?

JavaScript, Python.

Related proposals

  • Has this idea, or one like it, been proposed before?
    • If so, how does this proposal differ?

#20733 and #24282 address improving for loops. #40605, #43557, and #50112 address iterators. #17747, #34544, and other have added vet methods to ensure erors are checked on close. This proposal differs by using the range keyword to make the change more clear than reusing for, differentiating values, key values, and results, and by having a mandatory close mechanism that forces error handling, making the vet check less necessary.

  • Does this affect error handling?

No.

  • Is this about generics?

It uses the implemented design and makes various future patterns possible.

Proposal

  • What is the proposed change?

The following types would be added to the standard library in an iterators package (🚲 iters?).

type Value[T any] interface {
    Next() bool
    Value() T
    Close()
}

type KeyValue[K, V any] interface {
    Next() bool
    Value() (K, V)
    Close()
}

type Result[T any] interface {
    Next() bool
    Value() T
    Close() error
}

type KeyValueResult[K, V any] interface {
    Next() bool
    Value() (K, V)
    Close() error
}

type ValueIterable[T any] interface {
    Iter() Value[T]
}

type KeyValueIterable[K, V any] interface {
    Iter() KeyValue[K, V]
}

type ResultIterable[T any] interface {
    Iter() Result[T]
}

type KeyValueResultIterable[K, V any] interface {
    Iter() KeyValueResult[K, V]
}

The language spec would be changed to allow for loop statements of the following type:

// Suppose bufio.Scanner has been updated to have an Iter() iterators.Result[[]byte] method

s := bufio.NewScanner(os.Stdin)
range s : b : err {
    fmt.Println("line:", string(b))
    if bytes.Contains(b, []byte("done")) {
        break // scanner.Close() is automatically called here
    }
} // scanner.Close() would also be automatically called here
// err is now a equal to s.Err()
if err != nil { /* ... */ }

// Suppose there is an iterators.Slice function
s := getsomeslice()
range iterators.Slice(s) : row {
    go use(row) // row is closure safe
}

// Suppose there is an iterators.Enumerate 
// that converts ValueIterable to KeyValue iterables
s := getsomeslice()
range iterators.Enumerate(iterators.Slice(s)) : i, row {
    go use(i, row) // i and row are both closure safe
}

// One can also imagine helpers to promote the other types 
// to a KeyValueResult iterator with the key being i and err being nil.

// interval could be in iters package or written by hand
range interval(0, n): i {
    go doSomething(i)
}

Because all four iterator types have Value and Close methods, they are mutually exclusive for a type. As per #20733, the range block reassigns Value() on each loop. The Result and KeyValueResult types take an extra argument separated by a colon which only comes into scope after the loop is exited. (Open question: does Close() run after a panic?)

  • Who does this proposal help, and why?

It makes a wide variety of iterator patterns easier to use. It is compatible with existing designs in the standard library. It solves the problem of users accidentally closing over for loop variable or forgetting to check the final error in a result iterator type. It allows for the creation of iterator libraries that iterate over chunks, subsets, permutations, etc.

  • Please describe as precisely as possible the change to the language.

See above.

  • What would change in the language spec?

See above. Subject to further discussion.

  • Please also describe the change informally, as in a class teaching Go.

Go has a way of writing your own types that can be iterated over. First add method Iter() iterators.Value[myStruct] to our struct that just returns itself. Now we can use myStruct like this:

s := NewMyStruct()
range s : row {
   use(row)
}
  • Is this change backward compatible?

I believe so. That's why it uses range instead of for. I'm not committed to this color of the bikeshed if someone can make it work with for but still be clearly readable.

  • Show example code before and after the change.
    • Before
    • After

N/A because this is a new feature. Open question: should existing iterable types be magically opted into working with range statements? I think probably not.

  • Orthogonality: how does this change interact or overlap with existing features?

It adds a new way to iterate, which is unforunate. On the other hand, it fixes two long standing sources of bugs: not realizing a loop reuses variables and not calling .Err() after scanning a row from bufio or sql.

  • Is the goal of this change a performance improvement?

No.

Costs

  • Would this change make Go easier or harder to learn, and why?

Harder because there is another form of iteration. But it makes it easier to write bug free code that uses an iterator.

  • What is the cost of this proposal? (Every language change has a cost).

Largely the cost is the introduction of a new concept.

  • How many tools (such as vet, gopls, gofmt, goimports, etc.) would be affected?

They would need to know the new loop syntax.

  • What is the compile time cost?

Low. There would need to be a type analysis to ensure that the right type of iterator is used, e.g. you can't use a Result as a Value, etc.

  • What is the run time cost?

Low.

  • Can you describe a possible implementation?

TBD.

  • Do you have a prototype? (This is not required.)

No.

@gopherbot gopherbot added this to the Proposal milestone Jul 25, 2022
@seankhliao seankhliao changed the title proposal: add a new iterator syntax, package, interfaces proposal: Go 2: add a new iterator syntax, package, interfaces Jul 25, 2022
@seankhliao seankhliao added LanguageChange Suggested changes to the Go language v2 An incompatible library change labels Jul 25, 2022
@earthboundkid
Copy link
Contributor Author

Templates have the syntaxes {{ range $x }} {{ range $v := $x }} {{ range $k, $v := $x }}. I suppose this could be tweaked to look more similar:

range v := x {
range k, v := x {
range v := x; err {
range k, v := x; err {

If the name err is already in scope, err is assigned iter.Close(), otherwise it brings a new err variable into scope after the loop.

@earthboundkid
Copy link
Contributor Author

earthboundkid commented Jul 25, 2022

Adapting io/fs.WalkDir is a little tricky because you need to be able to skip directories, but you could do it by adding an fs.WalkIterator struct with a SkipDir method and Path, DirEntry, Error fields. It shouldn't be a Result iterator because the error is not necessarily final, for example, permission errors diving into a subdirectory.

range e := fs.Walk(fsys, ".") {
    if strings.HasPrefix(e.Path, ".") || (e.Entry.IsDir() && os.IsPermission(e.Error)) {
        e.SkipDir()
    }
    if e.Error != nil {
        break
    }
   doSomething(e.DirEntry)
}

In general, the wrappers to adapt existing standard library iterators to the new method are pretty straightforward to write.

@ianlancetaylor
Copy link
Contributor

The suggested range syntax is unlike anything else in Go, and I don't understand why. We already have for range. Why not use that same syntax?

@ianlancetaylor
Copy link
Contributor

Also, FYI, there are a couple of iterator proposals kicking around inside the Go team. I hope that they will be published soon.

@earthboundkid
Copy link
Contributor Author

We already have for range. Why not use that same syntax?

It is my understanding that the for variable scope issue can't be solved in a backwards compatible way, so it needs some new syntax to indicate "this is the loop where that works as expected." It's also good to signal to readers when the native (fast!) looping will happen versus the method based looping (hopefully not slow, but probably slower since it needs a method call). I'd be happy if there were a for based syntax that could make this distinction clear. Looking forward to seeing new proposals!

@beoran
Copy link

beoran commented Jul 26, 2022

I am not sure this syntactic sugar like this is needed. As an example, the bbolt library already has iterators, named cursors. With them you can iterate like this:

cursor := bucket.Cursor()
for k, v:= cursor.First(); k != nil ; k, v := cursor.Next() {...}

That seems easy enough without any extra syntax.

@oakad
Copy link

oakad commented Jul 26, 2022

@beoran I'd say First() is superfluous as well: the first position is implicitly set in the function, constructing the iterator. PebbleDB (a database, similar to Bolt/Bbolt) has both First() and Next() methods with semantics similar to the one below, and I have never, ever, found the First() method useful. :-)

I use iterator pattern a lot in my code, and it seems Next[T]() (value T, valid bool) works pretty well for vast majority of use cases. This is also similar to how channel value "iteration" works already.

iter := CustomSet.iterator()
// iter points at first value already here

for next, ok := iter.Next(); ok; next, ok = iter.Next() {
/* do stuff */ 
}

@beoran
Copy link

beoran commented Jul 26, 2022

@oakad , yes there are several styles of iterators already possible in Go. So that's why I am doubtful that syntactic sugar is really needed, or, that we have to officially bless one style of iteration on the language.

@oakad
Copy link

oakad commented Jul 26, 2022

Built in support for range iteration will be handy, though. It can be defined the same as range for loop was defined in C++: in terms of available method Next (or whatever).

@earthboundkid
Copy link
Contributor Author

earthboundkid commented Jul 26, 2022

You can always do without a blessed iterator syntax. Why do we have three values for statements and range at all? You can do everything with a while true loop (a zero value for loop in Go). The point is to make it more convenient and to prevent errors due to accidental misuse. In particular the scanner idiom with its final .Err() call is prone to misuse. Introducing a new syntax will make it harder for users to accidentally omit the error check.

That said, I'm less committed to the syntax part of this proposal than the interface part of this proposal. Having a standard interface for iteration means we can start building libraries that consume the standard interfaces, batch elements in slices, deduplicate repeated elements, etc., etc.

@beoran
Copy link

beoran commented Jul 29, 2022

@carlmjohnson well, there is a precedent for standard interfaces with the error type. There could probably also be a built in iterator type tho achieve this standardization, without adding any new other language features.

@earthboundkid
Copy link
Contributor Author

Closing in favor of continuing the discussion at #54245.

@golang golang locked and limited conversation to collaborators Aug 5, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge LanguageChange Suggested changes to the Go language Proposal v2 An incompatible library change
Projects
None yet
Development

No branches or pull requests

6 participants