testing: add testing.B.Loop for iteration

**Update July 26, 2023:** See [this comment](https://github.com/golang/go/issues/61515#issuecomment-1650419338) for the latest Loop API. The motivation and arguments for this proposal still otherwise apply in full, but the API has switched to the `for b.Loop() { ... }` form proposed in the Alternatives section.

Currently, Go benchmarks are required to repeat the body of the benchmark [`(*testing.B).N`](https://pkg.go.dev/testing#B) times. This approach minimizes measurement overhead, but it’s error-prone and has many limitations:

- As we discovered in #38677, it’s [surprisingly common](https://github.com/golang/go/issues/38677#issuecomment-1610477510) for benchmarks to simply forget to use `b.N`.

- While a vet check can pretty reliably detect forgotten uses of `b.N`, there’s some evidence that many benchmarks use `b.N` incorrectly, such as using it to size the input to an algorithm, rather than as an iteration count.

- Because the benchmark framework doesn’t know when the b.N loop starts, if a benchmark has any non-trivial setup, it’s important for it to use `(*testing.B).ResetTimer`. It’s generally not clear what counts as non-trivial setup, and very hard to detect when `ResetTimer` is necessary.

## Proposal

I propose that we add the following method to `testing.B` and encourage its use over `b.N`:

```go
// Loop invokes op repeatedly and reports the time and (optionally) allocations per invocation
// as the results of benchmark b.
// Loop must be called only once per benchmark or sub-benchmark.
//
// A benchmark should either use Loop or contain an explicit loop from 0 to b.N, but not both.
// After b.Loop returns, b.N will contain the total number of calls to op, so the benchmark
// may use b.N to compute other average metrics.
func (b *B) Loop(op func())
```

This API has several advantages over `b.N` loops:

- It cannot be misused for something other than an iteration count. It’s still possible for a benchmark to forget entirely to use `b.Loop`, but that can be detected reliably by vet.

- The benchmarking framework can record time and other metrics around only the benchmarked operation, so benchmarks no longer need to use `ResetTimer` or be careful about their setup.

- Iteration ramp-up can be done entirely within `b.Loop`, which means that benchmark setup before `b.Loop` will happen once and only once, rather than at each ramp-up step. For benchmarks with non-trivial setup, this saves a lot of time. Notably, benchmarks with expensive setup can run for far longer than the specified `-benchtime` because of the large number of ramp-up steps (setup time is not counted toward the `-benchtime` threshold). It’s also less error-prone than using a global `sync.Once` to reduce setup cost, which can have side effects on GC timing and other benchmarks if the computed results are large.

- [As suggested by @rsc](https://github.com/golang/go/issues/61179#issuecomment-1642486590), `b.Loop` could be a clear signal to the compiler not to perform certain optimizations in the loop body that often quietly invalidate benchmark results.

- In the long term, we could collect distributions rather than just averages for benchmark metrics, which would enable deeper insights into benchmark results and far more powerful statistical methods, such as stationarity tests. The way this would work is that `b.Loop` would perform iteration ramp-up only to the point where it can amortize its measurement overhead (ramping up to, say, 1ms), and then repeat this short measurement loop many times until the total time reaches the specified `-benchtime`. For short benchmarks, this could easily gather 1,000 samples, rather than just a mean.

## Alternatives

This proposal is complementary to `testing.Keep` (#61179). It’s an alternative to `testing.B.Iterate` (originally in #48768, with discussion now merged into #61179), which essentially combines Keep and Loop. I believe Iterate could have all of the same benefits as Loop, but it’s much clearer how to make Loop low-overhead. If Loop implicitly inhibits compiler optimizations in the body of its callback, then it has similar deoptimization benefits as Iterate. I would argue that Loop has a shallower learning curve than Iterate, though probably once users get used to either they would have similar usability.

If #61405 (range-over-func) is accepted, it may be that we want the signature of `Loop` to be `Loop(op func() bool) bool`, which would allow benchmarks to be written as:

```go
func Benchmark(b *testing.B) {
    for range b.Loop {
        // … benchmark body …
    }
}
```

It’s not clear to me what this form should do if the body attempts to `break` or `return`.

Another option is to mimic [`testing.PB.Next`](https://pkg.go.dev/testing#PB.Next). Here, the signature of Loop would be `Loop() bool` and it would be used as:

```go
func Benchmark(b *testing.B) {
    for b.Loop() {
        // … benchmark body …
    }
}
```

This is slightly harder to implement, but perhaps more ergonomic to use. It's more possible to misuse than the version of Loop that takes a callback (e.g., code could do something wrong with the result, or break out of the loop early). But unlike `b.N`, which is easy to misuse, this seems much harder to use incorrectly than to use correctly.

cc @bcmills @rsc


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

testing: add testing.B.Loop for iteration #61515

Proposal

Alternatives

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

testing: add testing.B.Loop for iteration #61515

Description

Proposal

Alternatives

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions