Description
Update July 26, 2023: See this comment for the latest Loop API. The motivation and arguments for this proposal still otherwise apply in full, but the API has switched to the for b.Loop() { ... }
form proposed in the Alternatives section.
Currently, Go benchmarks are required to repeat the body of the benchmark (*testing.B).N
times. This approach minimizes measurement overhead, but it’s error-prone and has many limitations:
-
As we discovered in cmd/vet: flag benchmarks that don’t use b #38677, it’s surprisingly common for benchmarks to simply forget to use
b.N
. -
While a vet check can pretty reliably detect forgotten uses of
b.N
, there’s some evidence that many benchmarks useb.N
incorrectly, such as using it to size the input to an algorithm, rather than as an iteration count. -
Because the benchmark framework doesn’t know when the b.N loop starts, if a benchmark has any non-trivial setup, it’s important for it to use
(*testing.B).ResetTimer
. It’s generally not clear what counts as non-trivial setup, and very hard to detect whenResetTimer
is necessary.
Proposal
I propose that we add the following method to testing.B
and encourage its use over b.N
:
// Loop invokes op repeatedly and reports the time and (optionally) allocations per invocation
// as the results of benchmark b.
// Loop must be called only once per benchmark or sub-benchmark.
//
// A benchmark should either use Loop or contain an explicit loop from 0 to b.N, but not both.
// After b.Loop returns, b.N will contain the total number of calls to op, so the benchmark
// may use b.N to compute other average metrics.
func (b *B) Loop(op func())
This API has several advantages over b.N
loops:
-
It cannot be misused for something other than an iteration count. It’s still possible for a benchmark to forget entirely to use
b.Loop
, but that can be detected reliably by vet. -
The benchmarking framework can record time and other metrics around only the benchmarked operation, so benchmarks no longer need to use
ResetTimer
or be careful about their setup. -
Iteration ramp-up can be done entirely within
b.Loop
, which means that benchmark setup beforeb.Loop
will happen once and only once, rather than at each ramp-up step. For benchmarks with non-trivial setup, this saves a lot of time. Notably, benchmarks with expensive setup can run for far longer than the specified-benchtime
because of the large number of ramp-up steps (setup time is not counted toward the-benchtime
threshold). It’s also less error-prone than using a globalsync.Once
to reduce setup cost, which can have side effects on GC timing and other benchmarks if the computed results are large. -
As suggested by @rsc,
b.Loop
could be a clear signal to the compiler not to perform certain optimizations in the loop body that often quietly invalidate benchmark results. -
In the long term, we could collect distributions rather than just averages for benchmark metrics, which would enable deeper insights into benchmark results and far more powerful statistical methods, such as stationarity tests. The way this would work is that
b.Loop
would perform iteration ramp-up only to the point where it can amortize its measurement overhead (ramping up to, say, 1ms), and then repeat this short measurement loop many times until the total time reaches the specified-benchtime
. For short benchmarks, this could easily gather 1,000 samples, rather than just a mean.
Alternatives
This proposal is complementary to testing.Keep
(#61179). It’s an alternative to testing.B.Iterate
(originally in #48768, with discussion now merged into #61179), which essentially combines Keep and Loop. I believe Iterate could have all of the same benefits as Loop, but it’s much clearer how to make Loop low-overhead. If Loop implicitly inhibits compiler optimizations in the body of its callback, then it has similar deoptimization benefits as Iterate. I would argue that Loop has a shallower learning curve than Iterate, though probably once users get used to either they would have similar usability.
If #61405 (range-over-func) is accepted, it may be that we want the signature of Loop
to be Loop(op func() bool) bool
, which would allow benchmarks to be written as:
func Benchmark(b *testing.B) {
for range b.Loop {
// … benchmark body …
}
}
It’s not clear to me what this form should do if the body attempts to break
or return
.
Another option is to mimic testing.PB.Next
. Here, the signature of Loop would be Loop() bool
and it would be used as:
func Benchmark(b *testing.B) {
for b.Loop() {
// … benchmark body …
}
}
This is slightly harder to implement, but perhaps more ergonomic to use. It's more possible to misuse than the version of Loop that takes a callback (e.g., code could do something wrong with the result, or break out of the loop early). But unlike b.N
, which is easy to misuse, this seems much harder to use incorrectly than to use correctly.
Metadata
Metadata
Assignees
Type
Projects
Status