proposal: testing: a less error-prone API for benchmark iteration

### Background

It is currently difficult for Go users to write benchmarks that accurately measure what they appear to measure, for two reasons:

1. Certain compiler optimizations are relatively easy to trigger accidentally, even when those optimizations are not intended for the code being benchmarked (#27400).

   * Today, that often occurs with constant-propagation and dead-store elimination. If the Go compiler gets better at loop-invariant code motion (#15808) and loop optimizations in general (such as #14768), this may become a problem in even more cases.
   * @cespare gives a good summary of the workarounds and pitfalls today in https://github.com/golang/go/issues/27400#issuecomment-546514145.

2. It is relatively easy for a benchmark to accidentally run the code to be benchmarked the wrong number of times, either by ignoring the passed-in `b.N` entirely (#38677), or by doing either one or N iterations on a data-structure or input of size N instead of N iterations on an input of constant cost ([examples](https://dave.cheney.net/high-performance-go-workshop/dotgo-paris.html#benchmark_mistakes)).

   * We may be able to automatically flag tests that fail to use `b.N` using an approach like the one in [CL 230978](https://golang.org/cl/230978), but I don't see how we could do the same to detect benchmarks that _misuse_ `b.N`: some real APIs that use caches or global data-structures really _do_ have weird timing dependencies on the number of prior calls.


### Proposal

I propose that we add the following methods to `*testing.B`, inspired by `F.Fuzz` (#44551) and `T.RunParallel`, and deprecate explicit use of the `N` field.

```go
package testing

// Iterate accepts an argument of any function type and zero or more arguments to pass to that function.
// Iterate then invokes the function, sequentially, exactly b.N times with those arguments.
//
// The compiler will not optimize through the function's arguments or return values,
// so arguments passed to the function will not be subject to constant-propagation
// optimizations, and values returned by it will not be subject to dead-store elimination.
func (b *B) Iterate(f interface{}, args ...interface{})

// IterateParallel is like Iterate, but invokes the function on multiple goroutines in parallel.
//
// The number of goroutines defaults to GOMAXPROCS, and can be increased by calling
// SetParallelism before IterateParallel.
func (b *B) IterateParallel(f interface{}, args ...interface{})
```

Specifically, the `Iterate` method replaces a `for`-loop on `b.N`, and the `IterateParallel` replaces a call to `b.RunParallel` using a much simpler API.

### Examples

The example cited in #27400 comes from @davecheney's 2013 blog post, [How to write benchmarks in Go](https://dave.cheney.net/2013/06/30/how-to-write-benchmarks-in-go). The naive benchmark looks like this:
```go
func BenchmarkFib10(b *testing.B) {
	// run the Fib function b.N times
	for n := 0; n < b.N; n++ {
		Fib(10)
	}
}
```

At the end of the post, the author suggests to defeat optimizations by writing to a global variable:
```go
var result int

func BenchmarkFibComplete(b *testing.B) {
        var r int
        for n := 0; n < b.N; n++ {
                // always record the result of Fib to prevent
                // the compiler eliminating the function call.
                r = Fib(10)
        }
        // always store the result to a package level variable
        // so the compiler cannot eliminate the Benchmark itself.
        result = r
}
```

However, there are at least two problems with that approach:
1. Because the input to the `Fib` function is a constant and the function is pure, the compiler may still inline the function call and hoist the entire computation out of the loop.
2. A correct Go compiler may determine that all stores to `result` are dead, and apply dead-code elimination to the `b.N` loop.

I regard the author of that blog post as a Go expert. If even he can't get it right, I don't expect that most Go novices can either. I am an active Go maintainer, and I also run into this problem on a semi-regular basis (https://github.com/golang/go/issues/27400#issuecomment-620054145, https://github.com/golang/go/issues/48684#issuecomment-930354870).

In contrast, under this proposal, the _simple_ way to write the benchmark would also be the _correct_ way:
```go
func BenchmarkFib10(b *testing.B) {
	b.Iterate(Fib, 10)
}
```

[Another example by the same author](https://dave.cheney.net/high-performance-go-workshop/dotgo-paris.html#fixing_the_benchmark) cited in https://github.com/golang/go/issues/27400#issuecomment-546436055 (@gbbr), explicitly uses the current iteration number as an input to the function under test:
```go
var Result uint64

func BenchmarkPopcnt(b *testing.B) {
	var r uint64
	for i := 0; i < b.N; i++ {
		r = popcnt(uint64(i))
	}
	Result = r
}
```

This example mostly works around the compiler-optimization problem — _assuming_ that the compiler and linker aren't clever enough to eliminate the dead store to the write-only `Result` variable — but may again mask nonlinear behavior that depends on the specific value of `N`. That is: it mostly avoids the first pitfall, but by falling headlong into the second one.

This function would be both simpler and more linear using `Iterate`:
```go
func BenchmarkPopcnt0(b *testing.B) {
	b.Iterate(popcnt, 0)
}
```

If the user actually _intends_ for the test to benchmark `popcnt` with a variety of interleaved inputs, they can make that more explicit:
```go
func BenchmarkPopcntInterleaved(b *testing.B) {
	inputs := make([]uint64, 512)
	for i := range inputs {
		inputs[i] = rand.Uint64()
	}
	b.Iterate(func(inputs []uint64) uint64 {
		var result uint64
		for _, x := range inputs {
			result += popcnt(x)
		}
		return result
	}, inputs)
}
```

Or, to restore the explicit dependency on `i`, for the specific cases where the author believes it not to be harmful:
```go
func BenchmarkPopcntInterleaved(b *testing.B) {
	var i uint64
	b.Iterate(func() uint64 {
		result := popcnt(i)
		i++
		return result
	}, inputs)
}
```


### Compatibility

This proposal is a strict addition to the `testing.B` API. As a strict addition, it would not change or invalidate the behavior of any existing benchmarks. (On the other hand, it would also do nothing to detect problematic optimizations or nonlinearities in existing benchmarks.)


### Implementation

The proposed API could theoretically be implemented using the `reflect` package today. However, care would need to be taken to ensure that such an implementation does not add noise or bias to the benchmarks, taking into account the high current cost and rate of garbage of `reflect.Call` (#7818, #43732).

If it is not practical to modify `reflect.Call` to provide stable, fast performance, the compiler itself could special-case these functions and compile them down to more efficient equivalents, ensuring that the documented optimization properties continue to be respected (CC @randall77, @josharian).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

proposal: testing: a less error-prone API for benchmark iteration #48768

Background

Proposal

Examples

Compatibility

Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

proposal: testing: a less error-prone API for benchmark iteration #48768

Description

Background

Proposal

Examples

Compatibility

Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions