Skip to content

proposal: testing: a less error-prone API for benchmark iteration #48768

Closed
@bcmills

Description

@bcmills

Background

It is currently difficult for Go users to write benchmarks that accurately measure what they appear to measure, for two reasons:

  1. Certain compiler optimizations are relatively easy to trigger accidentally, even when those optimizations are not intended for the code being benchmarked (testing: document best practices for avoiding compiler optimizations in benchmarks #27400).

  2. It is relatively easy for a benchmark to accidentally run the code to be benchmarked the wrong number of times, either by ignoring the passed-in b.N entirely (cmd/vet: flag benchmarks that don’t use b #38677), or by doing either one or N iterations on a data-structure or input of size N instead of N iterations on an input of constant cost (examples).

    • We may be able to automatically flag tests that fail to use b.N using an approach like the one in CL 230978, but I don't see how we could do the same to detect benchmarks that misuse b.N: some real APIs that use caches or global data-structures really do have weird timing dependencies on the number of prior calls.

Proposal

I propose that we add the following methods to *testing.B, inspired by F.Fuzz (#44551) and T.RunParallel, and deprecate explicit use of the N field.

package testing

// Iterate accepts an argument of any function type and zero or more arguments to pass to that function.
// Iterate then invokes the function, sequentially, exactly b.N times with those arguments.
//
// The compiler will not optimize through the function's arguments or return values,
// so arguments passed to the function will not be subject to constant-propagation
// optimizations, and values returned by it will not be subject to dead-store elimination.
func (b *B) Iterate(f interface{}, args ...interface{})

// IterateParallel is like Iterate, but invokes the function on multiple goroutines in parallel.
//
// The number of goroutines defaults to GOMAXPROCS, and can be increased by calling
// SetParallelism before IterateParallel.
func (b *B) IterateParallel(f interface{}, args ...interface{})

Specifically, the Iterate method replaces a for-loop on b.N, and the IterateParallel replaces a call to b.RunParallel using a much simpler API.

Examples

The example cited in #27400 comes from @davecheney's 2013 blog post, How to write benchmarks in Go. The naive benchmark looks like this:

func BenchmarkFib10(b *testing.B) {
	// run the Fib function b.N times
	for n := 0; n < b.N; n++ {
		Fib(10)
	}
}

At the end of the post, the author suggests to defeat optimizations by writing to a global variable:

var result int

func BenchmarkFibComplete(b *testing.B) {
        var r int
        for n := 0; n < b.N; n++ {
                // always record the result of Fib to prevent
                // the compiler eliminating the function call.
                r = Fib(10)
        }
        // always store the result to a package level variable
        // so the compiler cannot eliminate the Benchmark itself.
        result = r
}

However, there are at least two problems with that approach:

  1. Because the input to the Fib function is a constant and the function is pure, the compiler may still inline the function call and hoist the entire computation out of the loop.
  2. A correct Go compiler may determine that all stores to result are dead, and apply dead-code elimination to the b.N loop.

I regard the author of that blog post as a Go expert. If even he can't get it right, I don't expect that most Go novices can either. I am an active Go maintainer, and I also run into this problem on a semi-regular basis (#27400 (comment), #48684 (comment)).

In contrast, under this proposal, the simple way to write the benchmark would also be the correct way:

func BenchmarkFib10(b *testing.B) {
	b.Iterate(Fib, 10)
}

Another example by the same author cited in #27400 (comment) (@gbbr), explicitly uses the current iteration number as an input to the function under test:

var Result uint64

func BenchmarkPopcnt(b *testing.B) {
	var r uint64
	for i := 0; i < b.N; i++ {
		r = popcnt(uint64(i))
	}
	Result = r
}

This example mostly works around the compiler-optimization problem — assuming that the compiler and linker aren't clever enough to eliminate the dead store to the write-only Result variable — but may again mask nonlinear behavior that depends on the specific value of N. That is: it mostly avoids the first pitfall, but by falling headlong into the second one.

This function would be both simpler and more linear using Iterate:

func BenchmarkPopcnt0(b *testing.B) {
	b.Iterate(popcnt, 0)
}

If the user actually intends for the test to benchmark popcnt with a variety of interleaved inputs, they can make that more explicit:

func BenchmarkPopcntInterleaved(b *testing.B) {
	inputs := make([]uint64, 512)
	for i := range inputs {
		inputs[i] = rand.Uint64()
	}
	b.Iterate(func(inputs []uint64) uint64 {
		var result uint64
		for _, x := range inputs {
			result += popcnt(x)
		}
		return result
	}, inputs)
}

Or, to restore the explicit dependency on i, for the specific cases where the author believes it not to be harmful:

func BenchmarkPopcntInterleaved(b *testing.B) {
	var i uint64
	b.Iterate(func() uint64 {
		result := popcnt(i)
		i++
		return result
	}, inputs)
}

Compatibility

This proposal is a strict addition to the testing.B API. As a strict addition, it would not change or invalidate the behavior of any existing benchmarks. (On the other hand, it would also do nothing to detect problematic optimizations or nonlinearities in existing benchmarks.)

Implementation

The proposed API could theoretically be implemented using the reflect package today. However, care would need to be taken to ensure that such an implementation does not add noise or bias to the benchmarks, taking into account the high current cost and rate of garbage of reflect.Call (#7818, #43732).

If it is not practical to modify reflect.Call to provide stable, fast performance, the compiler itself could special-case these functions and compile them down to more efficient equivalents, ensuring that the documented optimization properties continue to be respected (CC @randall77, @josharian).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions