Is your feature request related to a problem? Please describe.
Often I have different implementations of one algorithm (e.g. a scalar and a vectorized version). I'd like to know the speedup of one version vs. a baseline version. Both versions are measured in the same run, so the compare tool is not an option.
Describe the solution you'd like
A Teardown function for each benchmark run, which is called for each set of arguments. The current implementation lacks a way to get the measured time.