-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: testing: a less error-prone API for benchmark iteration #48768
Comments
I think the API would be simpler with just
And you need to make a closure if you need to capture anything, including arguments. |
@randall77, the problem with the closure API is that it doesn't address the compiler-optimization issues. That would make it too easy to write something like func BenchmarkFib10(b *testing.B) {
b.Iterate(func() { Fib(10) })
} which wouldn't really address #27400 at all. |
I guess another way to put it is: one the advantage of the explicit-argument API is that it encourages users to write their benchmark in terms of an exported function or method (perhaps expressed as a method value) instead of a wrapper around that function or method. Benchmarking a wrapper instead of the actual function is exactly what leads to the complex interactions with compiler optimizations, so avoiding the need for wrappers — at least in the typical case — is an important aspect of this proposal. |
Won't the overhead of calling a closure (or, even worse, invoking |
To me, |
Yes. The proposed API may require specialization in order to provide a good enough noise floor. (See the “Implementation” section of the original post.)
|
How about providing two sets of arguments for the function to be benched? And Iterate() could randomly pick one for each call. Guessing most of the time would want each set of arguments to be used equally, but in an unpredictable order.
|
@renthraysk, I considered also including something like // Iterate accepts an argument of any function type and a 'next' function that produces values
// assignable to f's arguments.
// Generate then invokes f(next()), sequentially, exactly b.N times,
// with each invocation using a set of values from a distinct call to next.
func (*B) Generate(f interface{}, next interface{}) But, I'm not sure that the benefits of that style of method outweigh its complexity, given that the equivalent func BenchmarkFib(b *testing.B) {
args := [2]int{10, 20}
r := rand.Uint32()
i := uint64(r)<<32 | uint64(^r)
b.Iterate(func() uint64 {
i = bits.RotateLeft64(i, 1)
return Fib(args[i&1])
})
} |
I guess that |
@bcmills The problem in my example, and the latter is for each argument set to be used equally, b.N has to be multiple of 64. Otherwise the computed stats will be off. |
@renthraysk, benchmarks are noisy anyway. If being off by one iteration makes a difference, your value of |
Some useful, reliable benchmarks naturally take multiple seconds to run, and thus end up with N=1. |
I think we would still need to expose b.N. It is not uncommon for a benchmark to do expensive O(b.N) setup work to enable accurately benchmark. |
That's true, but those benchmarks do not depend on |
I am not proposing to remove That said, we shouldn't deprecate it if it is actually still needed in some cases. Can you give some examples of the sorts of benchmarks that would still need it? |
A better version which args set is used alternates, but the random rotation defeats the compiler's optimizations.
|
A quick grep for unusual uses of Also, it's not possible to use |
As far as I know, the problems with micro-benchmarks all boil down to needing a way to force the computation of some value. If we add |
@ianlancetaylor, as far as I can tell That is: b.Iterate(func() { b.Sink(Fib(10)) }) does not prevent the compiler from turning Or, if it does, that would imply that the compiler applies special AST-rewriting magic to the expression passed to |
@josharian, thanks for the references! Some analysis:
TL;DR: as far as I can tell, all of the tests in that sample would be improved by eliminating the dependency on |
Neat! Per the documentation for
I can't even parse what that means — I would argue that it mostly points to the need for a more ergonomic |
Hmm. I find that quite clear, although that's unsurprising since I was involved in its design. How can I help you help me find a clearer way to express that? In case it is useful, examples of non-per-iteration metric are cache hit rate or percent packet loss or STW latency percentiles. |
That is true. I suppose we could add |
We could, but I suppose it would have to be a standalone generic function, so that it could return a value? That might look like: b.Iterate(func() {
b.Sink(Fib(testing.Source(10)))
}) But then you would end up needing to call |
@josharian, the terms I've seen from systems like Prometheus are “counter” (or “cumulative”) and “gauge”. Counter metrics are often meaningful per-op, whereas gauge metrics generally are not. So I suppose that the advice in the comment is really to report “counter” metrics averaged over the number of operations. But this is a bit of a digression, because I don't think deprecating var n, userTime, systemTime int64
…
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
…
atomic.AddInt64(&n, 1)
atomic.AddInt64(&userTime, int64(cmd.ProcessState.UserTime()))
atomic.AddInt64(&systemTime, int64(cmd.ProcessState.SystemTime()))
}
})
b.ReportMetric(float64(userTime)/float64(n), "user-ns/op")
b.ReportMetric(float64(systemTime)/float64(n), "sys-ns/op") So I think the only change strictly needed to |
I agree. The difference is that I think I see how to implement |
No matter what we do, a benchmark than runs for The different implementation possibilities for
|
This proposal has been added to the active column of the proposals project |
It sounds like the most important case where something new is needed is for microbenchmarks that might be optimized away entirely, but those are also the case where the overhead of the function call is likely to be problematic. (That overhead is why we chose the loop to begin with.) It doesn't seem like this discussion is converging to an answer. |
Placed on hold. |
I'm not sure this is a good idea, but another approach is to establish a baseline for that overhead and subtract it out. For example, the
Another option is to wrap the called function itself:
I'm not sure what a good name for this would be (it's similar to Rust's This doesn't help with the b.N issue like Iterate does, but I think its much better than wrapping all of the inputs in Source and the result in Sink. Most likely this function would need to be understood by the compiler. It can be implemented today at an overhead of two instructions, as I did above, but I believe that forces any pointer arguments to escape. |
This proposal has been added to the active column of the proposals project |
Will mark this likely decline in favor of whatever we decide in #61179. |
Based on the discussion above, this proposal seems like a likely decline. |
No change in consensus, so declined. |
Background
It is currently difficult for Go users to write benchmarks that accurately measure what they appear to measure, for two reasons:
Certain compiler optimizations are relatively easy to trigger accidentally, even when those optimizations are not intended for the code being benchmarked (testing: document best practices for avoiding compiler optimizations in benchmarks #27400).
It is relatively easy for a benchmark to accidentally run the code to be benchmarked the wrong number of times, either by ignoring the passed-in
b.N
entirely (cmd/vet: flag benchmarks that don’t use b #38677), or by doing either one or N iterations on a data-structure or input of size N instead of N iterations on an input of constant cost (examples).b.N
using an approach like the one in CL 230978, but I don't see how we could do the same to detect benchmarks that misuseb.N
: some real APIs that use caches or global data-structures really do have weird timing dependencies on the number of prior calls.Proposal
I propose that we add the following methods to
*testing.B
, inspired byF.Fuzz
(#44551) andT.RunParallel
, and deprecate explicit use of theN
field.Specifically, the
Iterate
method replaces afor
-loop onb.N
, and theIterateParallel
replaces a call tob.RunParallel
using a much simpler API.Examples
The example cited in #27400 comes from @davecheney's 2013 blog post, How to write benchmarks in Go. The naive benchmark looks like this:
At the end of the post, the author suggests to defeat optimizations by writing to a global variable:
However, there are at least two problems with that approach:
Fib
function is a constant and the function is pure, the compiler may still inline the function call and hoist the entire computation out of the loop.result
are dead, and apply dead-code elimination to theb.N
loop.I regard the author of that blog post as a Go expert. If even he can't get it right, I don't expect that most Go novices can either. I am an active Go maintainer, and I also run into this problem on a semi-regular basis (#27400 (comment), #48684 (comment)).
In contrast, under this proposal, the simple way to write the benchmark would also be the correct way:
Another example by the same author cited in #27400 (comment) (@gbbr), explicitly uses the current iteration number as an input to the function under test:
This example mostly works around the compiler-optimization problem — assuming that the compiler and linker aren't clever enough to eliminate the dead store to the write-only
Result
variable — but may again mask nonlinear behavior that depends on the specific value ofN
. That is: it mostly avoids the first pitfall, but by falling headlong into the second one.This function would be both simpler and more linear using
Iterate
:If the user actually intends for the test to benchmark
popcnt
with a variety of interleaved inputs, they can make that more explicit:Or, to restore the explicit dependency on
i
, for the specific cases where the author believes it not to be harmful:Compatibility
This proposal is a strict addition to the
testing.B
API. As a strict addition, it would not change or invalidate the behavior of any existing benchmarks. (On the other hand, it would also do nothing to detect problematic optimizations or nonlinearities in existing benchmarks.)Implementation
The proposed API could theoretically be implemented using the
reflect
package today. However, care would need to be taken to ensure that such an implementation does not add noise or bias to the benchmarks, taking into account the high current cost and rate of garbage ofreflect.Call
(#7818, #43732).If it is not practical to modify
reflect.Call
to provide stable, fast performance, the compiler itself could special-case these functions and compile them down to more efficient equivalents, ensuring that the documented optimization properties continue to be respected (CC @randall77, @josharian).The text was updated successfully, but these errors were encountered: