Skip to content

Commit

Permalink
14313-benchmark-format: add unit metadata
Browse files Browse the repository at this point in the history
The predefined keys differ slightly from what I proposed in golang/go#43744. I
tried specifying {higher,lower}={better,worse} like I originally
proposed and it just got really messy. Turning it around to
better={higher,lower} means its somewhat backwards from what you might
expect from English phrasing, but it lets us use a single key because
I don't think anyone is going to accidentally write
worse={higher,lower}, and this avoids any annoying questions about
what happens if a user specifies both "higher" and "lower".

For golang/go#43744.

Change-Id: I895914b179c291003e76f897cabbcbdb2381f163
Reviewed-on: https://go-review.googlesource.com/c/proposal/+/357530
Reviewed-by: Michael Knyszek <mknyszek@google.com>
  • Loading branch information
aclements committed Oct 29, 2024
1 parent 1dd567d commit 986bcc1
Showing 1 changed file with 46 additions and 3 deletions.
49 changes: 46 additions & 3 deletions design/14313-benchmark-format.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ the need to process custom output formats in future benchmarks.
## Proposal

A Go benchmark data file is a UTF-8 textual file consisting of a sequence of lines.
Configuration lines and benchmark result lines, described below,
Configuration lines, benchmark result lines, and unit metadata lines, described below,
have semantic meaning in the reporting of benchmark results.

All other lines in the data file, including but not limited to
Expand Down Expand Up @@ -150,7 +150,7 @@ In the example, the CPU cost is reported per-operation and the
throughput is reported per-second; neither is a total that
depends on the number of iterations.

### Value Units
#### Value Units

A value's unit string is expected to specify not only the measurement unit
but also, as needed, a description of what is being measured.
Expand All @@ -167,7 +167,7 @@ and rescale known measurement units.
For example, consistently large “ns/op” or “L1-miss-ns/op”
might be rescaled to “ms/op” or “L1-miss-ms/op” for display.

### Benchmark Name Configuration
#### Benchmark Name Configuration

In the current testing package, benchmark names correspond to Go identifiers:
each benchmark must be written as a different Go function.
Expand All @@ -184,6 +184,49 @@ that slash-prefixed key=value pairs in the benchmark name are
treated by benchmark data processors as per-benchmark
configuration values.

### Unit metadata

When a benchmark reports units outside the standard units implemented
by the testing package, it can be useful for tools to understand
additional metadata about those units.

A unit metadata line has the form

Unit <unit> <key>=<value> <key>=<value> ...

The fields are separated by runs of space characters (as defined by
`unicode.IsSpace`), and space characters are not allowed within unit,
key, or value.
Keys must not contain `=`.

It is an error to specify different values for any given unit and key,
even on different unit metadata lines.
That is, once unit metadata is specified, it can't be overridden.
Specifying the same value for a key multiple times is not an error.

Unit metadata applies to all following benchmark result lines, though
it is unspecified whether it applies to earlier benchmark results
lines.
This allows for stream-oriented processing of benchmark results.

Keys are not constrained, but the following keys have predefined
meanings:

- `better={higher,lower}` indicates whether higher or lower values of
this unit are better (indicate an improvement).
By default, ns/op, B/op, and allocs/op are `better=lower`, and MB/s
is `better=higher`.
Other units do not assume a default.

- `assume={nothing,exact}` indicates what statistical assumption to
make when considering distributions of values.
`nothing` means to make no statistical assumptions (e.g., use
non-parametric methods) and `exact` means to assume measurements are
exact (repeated measurement does not increase confidence).
The default is `nothing`.
In the future we may also support `normal`, but that's almost never
the right assumption for benchmarks.

### Example

The benchmark output given in the background section above
Expand Down

0 comments on commit 986bcc1

Please sign in to comment.