-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/vet: flag benchmarks that don’t use b #38677
Comments
I've actually seen this twice in the past few months. Something else I tend to see is slow benchmarks that do a lot of heavy initialization work, but don't take care of resetting the timer. I'm not sure how far vet can get here, though. Writing a decent benchmark is more than just using Wouldn't there be a way for the testing package to catch this? For example, for any benchmark that can be run with N=1 quickly (which should be the vast majority of them), any run with N>100 should at least take 10x more time. If it does not, we are either benchmarking something else (like the initialization code, or nothing at all), or the benchmark doesn't use It would be hard to cover very expensive benchmarks by default there, such as those that take one second per iteration, since the default is |
True. I'd hope that being made aware of the problem and referred (by the vet error message) to some docs would set most people on a better path.
There's been scattered discussion between @rsc, @bcmills, @aclements, myself, and maybe others, about having the benchmark framework detect non-linearity. But that's a heavy lift, and you'll get false positives, particularly since you have to base the decision on a single run. There's also been discussion of having the benchmark framework print all intermediate b.N runs and have benchstat detect non-linearity, bi-modal distributions, and other distribution problems. But then you have to know about benchstat, at which point, you probably aren't the target audience for this vet check. I definitely wish we could somehow build benchstat into package testing. But the "before"/"after" problem is a thorny one. (See my failed machinations in #10930.) |
I'm not in favor of this. It seems a minor problem and opening vet up to analyzing people's tests leads us down a long and slippery path. |
@josharian your thoughts are very interesting - you've clearly been thinking about this for a while :) I agree that, in general, writing good benchmarks in Go requires reading quite a lot of content and using multiple tools together carefully. Even though it's a difficult task, I still think that the only way to truly solve that problem is to make the tooling better. Be it by making the To me, fuzzing and benchmarking are kind of similar, in the sense that they are very helpful and necessary for many developers, but they are difficult to use correctly simply because not enough hours have gone into properly including them as part of Adding small checks to |
A In contrast, a So I don't think there is a slippery slope here: the argument for |
Change https://golang.org/cl/230978 mentions this issue: |
We do vet API usage, checking things like printf format strings. That said, I am not convinced this specific check needs to be in vet. As Josh noted, we've talked before about having a linearity check. But a highly accurate linearity check is not needed to detect |
Assuming we get CL 230978 cleaned up and submitted, anyone think we should still discuss changing cmd/vet? |
If we had this to do over again, it would be nice to do something more like b.RunParallel. Instead of exposing b.N, have the benchmark call b.Next() until it returns false. You could expose a Count method that returns an int for benchmarks that need to pre-size auxiliary structures. Then detecting this situation would be trivial and immediate (b.Next never got called). And we could probably do away with b.ResetTimer, since we could wait to start the timer until the first b.Next call. But I guess that's neither here nor there.
It seems to me that that is an empirical question, and the relevant data point--do we still need it?--is best answered by getting CL 230978 in and then waiting to find out. |
Austin pointed out various problems with CL 230978 so I ended up writing a vet check after all. Results of running that vet check on the public Go ecosystem are at https://gist.github.com/rsc/4ba2eacb2d1a00339f4539e7b1bee576. There is one false positive I need to correct involving assigning b to a global that is then read by another function called via yet another function, but other than that, the check seems quite accurate and useful. |
This proposal has been added to the active column of the proposals project |
Based on the discussion above, this proposal seems like a likely accept. |
No change in consensus, so accepted. 🎉 |
I was just looking through the results you posted from your prototype vet check and I see what looks like more false positives than the one you described. Take entry (9) in the list: https://go-mod-viewer.appspot.com/gonum.org/v1/gonum@v0.13.0/internal/asm/f64/bench_other_test.go#L347 The relevant bits of the flagged code are:
I see this pattern in other entries on your list where the |
@ChrisHines I agree that those look like false positives and should not be flagged. The final implementation should probably be conservative when a callee |
I propose that cmd/vet flag any benchmark that doesn’t have any reference to b in its body.
The implementation should be fairly straightforward. Pick out benchmarks. Pick out the identifier (usually
b
). Look for any uses of that identifier. If none, complain.cc @robpike @mvdan
The text was updated successfully, but these errors were encountered: