Open
Description
Right now the benchmark CI workflow prints useful information about grind time and runtime to the CI output. This CI workflow always passes, so long as the code executes.
I would like the CI to fail if the master and PR runtime or grind times are sufficiently different (within some tolerance). Without this, it is easy to have a performance regression but not see it because one did not check CI carefully enough.