Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/tools/gopls: check for failure to use result of a 'yield' call #65795

Closed
adonovan opened this issue Feb 19, 2024 · 20 comments
Closed

x/tools/gopls: check for failure to use result of a 'yield' call #65795

adonovan opened this issue Feb 19, 2024 · 20 comments
Assignees
Labels
Analysis Issues related to static analysis (vet, x/tools/go/analysis) gopls/analysis Issues related to running analysis in gopls gopls Issues related to the Go language server, gopls. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Tools This label describes issues relating to any tools in the x/tools repository.
Milestone

Comments

@adonovan
Copy link
Member

adonovan commented Feb 19, 2024

Go 1.23 introduces "push" iterators, also known as "range over func", in which the compiler transforms the loop body (consumer) into a stackless coroutine yield func(T) bool for some element type T, and then passes it to the push iterator (producer). The producer calls yield for each element, and must stop if any call to yield returns false, indicating that the desired continuation of the loop body is not continue (e.g. break, outer continue, return, panic, goto).

It is a programmer error to fail to honor the result of any call to yield. The runtime reports a dynamic error in this case, but it would be nice to catch it earlier with a static check.

We should add a new analyzer (or perhaps augment the existing unusedresult analyzer) to report when the result of a dynamic call to a function named yield (with type func(T) bool for some T) is ignored.

(Update: for a second yield mistake that vet could check for, see #65795 (comment).)

@adonovan adonovan added the Analysis Issues related to static analysis (vet, x/tools/go/analysis) label Feb 19, 2024
@robpike
Copy link
Contributor

robpike commented Feb 19, 2024

There are three conditions for vet checks, as listed in its README. We don't know yet whether this is a large enough problem in the wild to satisfy the frequency criterion.

@adonovan
Copy link
Member Author

adonovan commented Feb 19, 2024

There are three conditions for vet checks, as listed in its README. We don't know yet whether this is a large enough problem in the wild to satisfy the frequency criterion.

It is certainly not a large problem in the wild because no-one has merged code with a push iterator... yet; but I've already made this mistake several times in my experiments. Nonetheless your point is taken.

@thanm thanm added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 20, 2024
@timothy-king
Copy link
Contributor

We should add a new analyzer (or perhaps augment the existing unusedresult analyzer) to report when the result of a dynamic call to a function named yield (with type func(T) bool for some T) is ignored.

I am not sure this will be sufficiently precise. For GoVersion<=1.21, one could already have written such a function and ignore the return value of the function named 'yield'. Thatcould never have been used in a "range over func" statement, and would have been fine.

I think we will be able to get sufficiently good precision by switching this to looking at instances of for range f where f is a function type, f(yield) is a static call, and f ignores a call to its yield function. A big disadvantage of this approach though is where to report the diagnostic. Within the body of f is more actionable, but this will not always be doable if we are looking at the package containing the for range f statement.

@adonovan
Copy link
Member Author

I am not sure this will be sufficiently precise. For GoVersion<=1.21, one could already have written such a function and ignore the return value of the function named 'yield'. Thatcould never have been used in a "range over func" statement, and would have been fine.

You're absolutely right, it's possible it is just a coincidence, but I suspect the combination of calling a function whose operand is a local variable named 'yield', with the appropriate type, is pretty rare today, and we could (with a new convention) practically carve out the name as an informally reserved word for the new loop semantics.

@timothy-king
Copy link
Contributor

I suspect the combination of calling a function whose operand is a local variable named 'yield', with the appropriate type, is pretty rare today

Fair enough.

we could (with a new convention) practically carve out the name as an informally reserved word for the new loop semantics.

It makes me a bit nervous to only check when this is called 'yield'. This is new and conventions are somewhat weak. What is special about 'yield'? Why not 'y' or 'f'? How about 'yield' and 'yield2' if there is more than one? I am personally very prone to typing 'yeild' incorrectly.

I am somewhat more comfortable about treating 'yield' specially if we have a backup plan for warning the other cases too. This falls under "a check that misses too many of the cases it's looking for will give a false sense of security" IMO. Perhaps report 'yield' instances at the missed check regardless of being called, report intra-package calls at the missed check, and report at the call site for inter-package instances (#65795 (comment)). This is a bit complicated, but still doable.

@seankhliao seankhliao added this to the Unplanned milestone Jul 13, 2024
@Deleplace
Copy link
Contributor

I 100% agree that ignoring the result of yield() is too easy to do accidentally, in good faith. I've been bitten myself.

Part of the problem is that the user never writes the body of yield, as yield is automagically provided as an argument. In many use cases the termination cause we have in mind is when the iter.Seq or iter.Seq2 decides to stop producing values, and we're not thinking about the other cause "the consumer doesn't want any more values". It was a mild surprise for me to discover that the keyword break in the consuming loop would cause yield() to return false (as admittedly stated in the spec).

Ideally the vet check would trigger only for iterator functions that are actually ranged over in at least 1 place in the codebase, but that may be too costly to implement.

@adonovan
Copy link
Member Author

Another yield-related mistake is to write:

for x := range seq {
   yield(x)
}

when seq(yield) is more efficient, since it avoids back-to-back dynamic calls and the complex range desugaring. Perhaps vet could check for this too.

@Deleplace
Copy link
Contributor

for x := range seq {
   yield(x)
}

The worst problem here is that it is incorrect, right? And indeed, not very efficient.

@adonovan
Copy link
Member Author

The worst problem here is that it is incorrect, right? And indeed, not very efficient.

This is mainly an efficiency problem. But if this code appears within an iterator of a recursive data structure (e.g. a tree) and this is the inductive case (seq is a subtree), then it could turn a linear-time iteration into an O(n log n) operation because every element would make a chain of calls proportional to its depth (log n).

@Deleplace
Copy link
Contributor

I meant that it would lead to a panic, because of ignoring the result of yield(x)

@adonovan
Copy link
Member Author

I meant that it would lead to a panic, because of ignoring the result of yield(x)

Sorry, I completely failed to notice that my for x := range seq { yield(x) } example committed two mistakes, the first being the title of this issue! Which perhaps shows how easy a mistake it is to make.

I meant to write:

for x := range seq {
  if !yield(x) { break }
}

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/609617 mentions this issue: gopls/internal/analysis/yield: analyzer for iterator (yield) mistakes

@adonovan
Copy link
Member Author

adonovan commented Oct 31, 2024

Unsurprisingly there are only a handful of matches in the corpus for the quick-and-dirty proof of concept in the previous CL. All but one (in github.com/emitter-io/go) are false positives because the yield is followed by a return statement. That can be fixed by being a little smarter about control flow. Despite the low occurrence rate in the corpus we have heard from our Google3 colleagues that this mistake is fairly commonly reported in Go readability code reviews, so it would be nice to have a checker for it.

https://go-mod-viewer.appspot.com/cuelabs.dev/go/oci/ociregistry@v0.0.0-20240906074133-82eb438dd565/ociclient/lister.go#L106: result of call to yield is ignored
https://go-mod-viewer.appspot.com/cuelabs.dev/go/oci/ociregistry@v0.0.0-20240906074133-82eb438dd565/ociclient/lister.go#L112: result of call to yield is ignored
https://go-mod-viewer.appspot.com/cuelabs.dev/go/oci/ociregistry@v0.0.0-20240906074133-82eb438dd565/ociclient/lister.go#L118: result of call to yield is ignored
https://go-mod-viewer.appspot.com/cuelabs.dev/go/oci/ociregistry@v0.0.0-20240906074133-82eb438dd565/ociclient/lister.go#L136: result of call to yield is ignored
https://go-mod-viewer.appspot.com/cuelabs.dev/go/oci/ociregistry@v0.0.0-20240906074133-82eb438dd565/ocidebug/debug.go#L267: result of call to yield is ignored
https://go-mod-viewer.appspot.com/cuelang.org/go@v0.10.1/internal/mod/modimports/modimports.go#L99: result of call to yield is ignored
https://go-mod-viewer.appspot.com/github.com/cilium/statedb@v0.3.2/any_table.go#L111: result of call to yield is ignored
https://go-mod-viewer.appspot.com/github.com/cilium/statedb@v0.3.2/script.go#L576: result of call to yield is ignored
https://go-mod-viewer.appspot.com/github.com/cilium/statedb@v0.3.2/table.go#L392: result of call to yield is ignored
https://go-mod-viewer.appspot.com/github.com/emitter-io/go/v2@v2.1.0/emitter.go#L458: result of call to yield is ignored
https://go-mod-viewer.appspot.com/github.com/firebase/genkit/go@v0.1.1/genkit/flow.go#L732: result of call to yield is ignored
https://go-mod-viewer.appspot.com/github.com/firebase/genkit/go@v0.1.1/genkit/flow.go#L734: result of call to yield is ignored
https://go-mod-viewer.appspot.com/github.com/go-kivik/kivik/v4@v4.3.2/changes.go#L214: result of call to yield is ignored
https://go-mod-viewer.appspot.com/github.com/go-kivik/kivik/v4@v4.3.2/resultset.go#L481: result of call to yield is ignored
https://go-mod-viewer.appspot.com/github.com/go-kivik/kivik/v4@v4.3.2/updates.go#L170: result of call to yield is ignored
https://go-mod-viewer.appspot.com/github.com/ydb-platform/ydb-go-sdk/v3@v3.89.2/topic/topicsugar/iterators.go#L110: result of call to yield is ignored
https://go-mod-viewer.appspot.com/github.com/ydb-platform/ydb-go-sdk/v3@v3.89.2/topic/topicsugar/iterators.go#L123: result of call to yield is ignored

@adonovan
Copy link
Member Author

adonovan commented Oct 31, 2024

To reduce false positives, I rewrote the analyzer using SSA for control flow analysis. (This may make it ineligible to run in vet.) I reran it on the corpus and got the following results: two true positives and one false positive (in that order):

https://go-mod-viewer.appspot.com/github.com/go-kivik/kivik/v4@v4.3.2/resultset.go#L475: yield may be called again (on L481) after returning false
https://go-mod-viewer.appspot.com/github.com/go-kivik/kivik/v4@v4.3.2/updates.go#L164: yield may be called again (on L170) after returning false
https://go-mod-viewer.appspot.com/github.com/emitter-io/go/v2@v2.1.0/emitter.go#L458: yield may be called again (on L471) after returning false

The first two really illustrate the power of SSA for control flow analysis: a naive syntactic analysis would say "!yield() followed by break, I guess that's ok", whereas SSA reveals the fact even despite (correctly) breaking out of the loop after !yield(), there is another call to yield:

   160  func (f *DBUpdates) Iterator() func(yield func(*DBUpdate, error) bool) {
   161  	return func(yield func(*DBUpdate, error) bool) {
   162  		for f.Next() {
   163  			update := f.curVal.(*driver.DBUpdate)
   164  			if !yield((*DBUpdate)(update), nil) { // yield may be called again (on L170) after returning false
   165  				_ = f.Close()
   166  				break
   167  			}
   168  		}
   169  		if err := f.Err(); err != nil {
   170  			yield(nil, err)
   171  		}
   172  	}
   173  }

The third one is arguably a false positive because if err != nil, resp is likely to be an empty result, so the return at L466 will be executed and yield won't be called again. But that's not guaranteed and the code really shouldn't be relying on it, so you could also argue (and I do) that the code is wrong even though the bug is latent.

   446   		for {
   447  				req := &historyRequest{
...
   454  				}
   455  
   456  				resp, err := c.request("history", req)
   457  				if err != nil {
   458  					yield(HistoryMessage{}, err) // yield may be called again (on L471) after returning false
   459  				}
   460  
   461  				// Cast the response.
   462  				result, _ := resp.(*historyResponse)
   463  
   464  				// If no messages left in the history then return.
   465  				if len(result.Messages) == 0 {
   466  					return
   467  				}
   468  
   469  				// Yield each message returned by the history request.
   470  				for i := 0; i <= len(result.Messages)-1; i++ {
   471  					if !yield(result.Messages[i], nil) { // (other yield call)
   472  						return
   473  					}
   474  				}

@adonovan adonovan self-assigned this Oct 31, 2024
@adonovan adonovan added the gopls/analysis Issues related to running analysis in gopls label Oct 31, 2024
@adonovan
Copy link
Member Author

@synenka ran the yield analyzer on Google's Go code base and turned up five results, all true positives. Here's an example:

			for i := 0; i < list.Len(); i++ {
				if !yield(list.Get(i)) {
					continue
				}
			}

We should proceed with the analyzer as implemented.

@adonovan adonovan changed the title cmd/vet: check for failure to use result of a 'yield' call x/tools/gopls: check for failure to use result of a 'yield' call Nov 13, 2024
@gopherbot gopherbot added Tools This label describes issues relating to any tools in the x/tools repository. gopls Issues related to the Go language server, gopls. labels Nov 13, 2024
@adonovan adonovan modified the milestones: Unplanned, gopls/v0.17.0 Nov 13, 2024
@timothy-king
Copy link
Contributor

Which is the "as implemented" analyzer that was run? With or without ssa?

@adonovan
Copy link
Member Author

With or without ssa?

As implemented using go/ssa (see PatchSet 5).

@Deleplace
Copy link
Contributor

What's the best policy when a file does not import "iter" at all, and triggers this check?

We might get false positives in funcs that have the same signature as iter.Seq, iter.Seq2 but not the same semantics, e.g. custom iterators implemented long before Go 1.23.

@timothy-king
Copy link
Contributor

@Deleplace If needed we can summarize the functions that match the control flow criteria, and report the bug at the site of usage in a range loop. This is likely a more confusing place to report the issue though.

However, given Alan's experiments #65795 (comment) and #65795 (comment), I am not particularly worried about false positives yet. Note that the current draft requires the name "yield" in order to apply.

@adonovan
Copy link
Member Author

What's the best policy when a file does not import "iter" at all, and triggers this check?

We might get false positives in funcs that have the same signature as iter.Seq, iter.Seq2 but not the same semantics, e.g. custom iterators implemented long before Go 1.23.

If those custom iterators have a boolean result, it is very likely that they have the same semantics (false => break), and if not, then they are an accident waiting to happen. Either way, the check is appropriate for them.

dennypenta pushed a commit to dennypenta/tools that referenced this issue Dec 3, 2024
This analyzer detects failure to check the result of a call to
yield (which can cause a range loop to run beyond the sequence,
leading to a panic).

It is not always a mistake to ignore the
result of a call to yield; it depends on whether control can
reach another call to yield without checking that the first
call returned true. Consequently, this analyzer uses SSA
for control flow analysis.

We plan to add this analyzer to gopls before we promote it to vet.

+ test, relnote

Fixes golang/go#65795

Change-Id: I75fa272e2f546be0c2acb10a1978c82bc19db5bd
Reviewed-on: https://go-review.googlesource.com/c/tools/+/609617
Auto-Submit: Alan Donovan <adonovan@google.com>
Reviewed-by: Robert Findley <rfindley@google.com>
Commit-Queue: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
dennypenta pushed a commit to dennypenta/tools that referenced this issue Dec 3, 2024
This analyzer detects failure to check the result of a call to
yield (which can cause a range loop to run beyond the sequence,
leading to a panic).

It is not always a mistake to ignore the
result of a call to yield; it depends on whether control can
reach another call to yield without checking that the first
call returned true. Consequently, this analyzer uses SSA
for control flow analysis.

We plan to add this analyzer to gopls before we promote it to vet.

+ test, relnote

Fixes golang/go#65795

Change-Id: I75fa272e2f546be0c2acb10a1978c82bc19db5bd
Reviewed-on: https://go-review.googlesource.com/c/tools/+/609617
Auto-Submit: Alan Donovan <adonovan@google.com>
Reviewed-by: Robert Findley <rfindley@google.com>
Commit-Queue: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
dennypenta pushed a commit to dennypenta/tools that referenced this issue Dec 3, 2024
This analyzer detects failure to check the result of a call to
yield (which can cause a range loop to run beyond the sequence,
leading to a panic).

It is not always a mistake to ignore the
result of a call to yield; it depends on whether control can
reach another call to yield without checking that the first
call returned true. Consequently, this analyzer uses SSA
for control flow analysis.

We plan to add this analyzer to gopls before we promote it to vet.

+ test, relnote

Fixes golang/go#65795

Change-Id: I75fa272e2f546be0c2acb10a1978c82bc19db5bd
Reviewed-on: https://go-review.googlesource.com/c/tools/+/609617
Auto-Submit: Alan Donovan <adonovan@google.com>
Reviewed-by: Robert Findley <rfindley@google.com>
Commit-Queue: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
dennypenta pushed a commit to dennypenta/tools that referenced this issue Dec 3, 2024
This analyzer detects failure to check the result of a call to
yield (which can cause a range loop to run beyond the sequence,
leading to a panic).

It is not always a mistake to ignore the
result of a call to yield; it depends on whether control can
reach another call to yield without checking that the first
call returned true. Consequently, this analyzer uses SSA
for control flow analysis.

We plan to add this analyzer to gopls before we promote it to vet.

+ test, relnote

Fixes golang/go#65795

Change-Id: I75fa272e2f546be0c2acb10a1978c82bc19db5bd
Reviewed-on: https://go-review.googlesource.com/c/tools/+/609617
Auto-Submit: Alan Donovan <adonovan@google.com>
Reviewed-by: Robert Findley <rfindley@google.com>
Commit-Queue: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
dennypenta pushed a commit to dennypenta/tools that referenced this issue Dec 3, 2024
This analyzer detects failure to check the result of a call to
yield (which can cause a range loop to run beyond the sequence,
leading to a panic).

It is not always a mistake to ignore the
result of a call to yield; it depends on whether control can
reach another call to yield without checking that the first
call returned true. Consequently, this analyzer uses SSA
for control flow analysis.

We plan to add this analyzer to gopls before we promote it to vet.

+ test, relnote

Fixes golang/go#65795

Change-Id: I75fa272e2f546be0c2acb10a1978c82bc19db5bd
Reviewed-on: https://go-review.googlesource.com/c/tools/+/609617
Auto-Submit: Alan Donovan <adonovan@google.com>
Reviewed-by: Robert Findley <rfindley@google.com>
Commit-Queue: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
dennypenta pushed a commit to dennypenta/tools that referenced this issue Dec 3, 2024
This analyzer detects failure to check the result of a call to
yield (which can cause a range loop to run beyond the sequence,
leading to a panic).

It is not always a mistake to ignore the
result of a call to yield; it depends on whether control can
reach another call to yield without checking that the first
call returned true. Consequently, this analyzer uses SSA
for control flow analysis.

We plan to add this analyzer to gopls before we promote it to vet.

+ test, relnote

Fixes golang/go#65795

Change-Id: I75fa272e2f546be0c2acb10a1978c82bc19db5bd
Reviewed-on: https://go-review.googlesource.com/c/tools/+/609617
Auto-Submit: Alan Donovan <adonovan@google.com>
Reviewed-by: Robert Findley <rfindley@google.com>
Commit-Queue: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
dennypenta pushed a commit to dennypenta/tools that referenced this issue Dec 3, 2024
This analyzer detects failure to check the result of a call to
yield (which can cause a range loop to run beyond the sequence,
leading to a panic).

It is not always a mistake to ignore the
result of a call to yield; it depends on whether control can
reach another call to yield without checking that the first
call returned true. Consequently, this analyzer uses SSA
for control flow analysis.

We plan to add this analyzer to gopls before we promote it to vet.

+ test, relnote

Fixes golang/go#65795

Change-Id: I75fa272e2f546be0c2acb10a1978c82bc19db5bd
Reviewed-on: https://go-review.googlesource.com/c/tools/+/609617
Auto-Submit: Alan Donovan <adonovan@google.com>
Reviewed-by: Robert Findley <rfindley@google.com>
Commit-Queue: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Analysis Issues related to static analysis (vet, x/tools/go/analysis) gopls/analysis Issues related to running analysis in gopls gopls Issues related to the Go language server, gopls. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Tools This label describes issues relating to any tools in the x/tools repository.
Projects
None yet
Development

No branches or pull requests

7 participants