Description
Edit: if there's already a way to do this and I'm just being ignorant: please say!
Often I'm trying to compare alternative implementations that can impact multiple scenarios - essentially baseline vs option 1 (vs option 2 etc), but for multiple separate tests A, B, C.
The Baseline = true
feature is great, but only really allows a single baseline. If multiple methods are marked as Baseline
, then the test runner fails and complains at you.
However, [BenchmarkCategory(...)]
exists (via #248). It is currently only used to filter tests to run, but it could be much richer:
- the result grids could be split by category
- the relative comparisons against baseline could be computed by category
so instead of:
Method | Mean | Error | StdDev | Op/s | Scaled | ScaledSD | Gen 0 | Gen 1 | Allocated |
------------------------- |-------------:|-----------:|-----------:|----------:|-------:|---------:|---------:|--------:|-----------:|
'single, Stream' | 61.32 us | 1.2102 us | 1.8842 us | 16,309.04 | 1.00 | 0.00 | 2.2583 | 0.0292 | 13.88 KB |
'single, ReadOnlyBuffer' | 52.98 us | 0.1676 us | 0.1568 us | 18,874.14 | 0.86 | 0.03 | 2.2583 | 0.0042 | 13.88 KB |
'multi, Stream' | 14,285.07 us | 54.3681 us | 50.8560 us | 70.00 | 233.18 | 6.95 | 564.1667 | 10.0000 | 3470.77 KB |
'multi, ReadOnlyBuffer' | 13,219.59 us | 34.1087 us | 31.9053 us | 75.65 | 215.79 | 6.41 | 564.1667 | 0.8333 | 3470.76 KB |
we could have:
Category: 'multi'
Method | Mean | Error | StdDev | Op/s | Scaled | ScaledSD | Gen 0 | Gen 1 | Allocated |
------------------------- |-------------:|-----------:|-----------:|----------:|-------:|---------:|---------:|--------:|-----------:|
'Stream' | 61.32 us | 1.2102 us | 1.8842 us | 16,309.04 | 1.00 | 0.00 | 2.2583 | 0.0292 | 13.88 KB |
'ReadOnlyBuffer' | 52.98 us | 0.1676 us | 0.1568 us | 18,874.14 | 0.86 | 0.03 | 2.2583 | 0.0042 | 13.88 KB |
Category: 'single'
Method | Mean | Error | StdDev | Op/s | Scaled | ScaledSD | Gen 0 | Gen 1 | Allocated |
------------------------- |-------------:|-----------:|-----------:|----------:|-------:|---------:|---------:|--------:|-----------:|
'Stream' | 14,285.07 us | 54.3681 us | 50.8560 us | 70.00 | 1.00 | *.** | 564.1667 | 10.0000 | 3470.77 KB |
'ReadOnlyBuffer' | 13,219.59 us | 34.1087 us | 31.9053 us | 75.65 | 0.93 | *.** | 564.1667 | 0.8333 | 3470.76 KB |
with Scaled
/ ScaledSD
being relative to the Baseline
(if one) in that same category.
(*.** is just where I haven't "done the math" by hand; to be clear: "single" and "multi" here are completely different tests - it isn't just more of the same - naming is hard)
If necessary, this could be an opt-in SplitResultsByCategory
feature on custom options. Or it could be implicit: "there's multiple baselines == split by category" (since this won't have worked previously, this can't change existing working behaviour)