Skip to content

Allow baseline per category #617

Closed
Closed
@mgravell

Description

@mgravell

Edit: if there's already a way to do this and I'm just being ignorant: please say!

Often I'm trying to compare alternative implementations that can impact multiple scenarios - essentially baseline vs option 1 (vs option 2 etc), but for multiple separate tests A, B, C.

The Baseline = true feature is great, but only really allows a single baseline. If multiple methods are marked as Baseline, then the test runner fails and complains at you.

However, [BenchmarkCategory(...)] exists (via #248). It is currently only used to filter tests to run, but it could be much richer:

  • the result grids could be split by category
  • the relative comparisons against baseline could be computed by category

so instead of:

                   Method |         Mean |      Error |     StdDev |      Op/s | Scaled | ScaledSD |    Gen 0 |   Gen 1 |  Allocated |
------------------------- |-------------:|-----------:|-----------:|----------:|-------:|---------:|---------:|--------:|-----------:|
         'single, Stream' |     61.32 us |  1.2102 us |  1.8842 us | 16,309.04 |   1.00 |     0.00 |   2.2583 |  0.0292 |   13.88 KB |
 'single, ReadOnlyBuffer' |     52.98 us |  0.1676 us |  0.1568 us | 18,874.14 |   0.86 |     0.03 |   2.2583 |  0.0042 |   13.88 KB |
          'multi, Stream' | 14,285.07 us | 54.3681 us | 50.8560 us |     70.00 | 233.18 |     6.95 | 564.1667 | 10.0000 | 3470.77 KB |
  'multi, ReadOnlyBuffer' | 13,219.59 us | 34.1087 us | 31.9053 us |     75.65 | 215.79 |     6.41 | 564.1667 |  0.8333 | 3470.76 KB |

we could have:


Category: 'multi'

                   Method |         Mean |      Error |     StdDev |      Op/s | Scaled | ScaledSD |    Gen 0 |   Gen 1 |  Allocated |
------------------------- |-------------:|-----------:|-----------:|----------:|-------:|---------:|---------:|--------:|-----------:|
                 'Stream' |     61.32 us |  1.2102 us |  1.8842 us | 16,309.04 |   1.00 |     0.00 |   2.2583 |  0.0292 |   13.88 KB |
         'ReadOnlyBuffer' |     52.98 us |  0.1676 us |  0.1568 us | 18,874.14 |   0.86 |     0.03 |   2.2583 |  0.0042 |   13.88 KB |

Category: 'single'

                   Method |         Mean |      Error |     StdDev |      Op/s | Scaled | ScaledSD |    Gen 0 |   Gen 1 |  Allocated |
------------------------- |-------------:|-----------:|-----------:|----------:|-------:|---------:|---------:|--------:|-----------:|
                 'Stream' | 14,285.07 us | 54.3681 us | 50.8560 us |     70.00 |   1.00 |     *.** | 564.1667 | 10.0000 | 3470.77 KB |
         'ReadOnlyBuffer' | 13,219.59 us | 34.1087 us | 31.9053 us |     75.65 |   0.93 |     *.** | 564.1667 |  0.8333 | 3470.76 KB |

with Scaled / ScaledSD being relative to the Baseline (if one) in that same category.

(*.** is just where I haven't "done the math" by hand; to be clear: "single" and "multi" here are completely different tests - it isn't just more of the same - naming is hard)

If necessary, this could be an opt-in SplitResultsByCategory feature on custom options. Or it could be implicit: "there's multiple baselines == split by category" (since this won't have worked previously, this can't change existing working behaviour)

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions