-
Notifications
You must be signed in to change notification settings - Fork 6
Add configurable tags to runners #417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@@ -112,7 +114,9 @@ jobs: | |||
python-version: "3.11" | |||
- name: Building Python and running pyperformance | |||
run: | | |||
python workflow_bootstrap.py ${{ inputs.fork }} ${{ inputs.ref }} ${{ inputs.machine }} ${{ inputs.benchmarks || 'all' }} ${{ env.flags }} ${{ inputs.force && '--force' || '' }} ${{ inputs.pgo && '--pgo' || '' }} ${{ inputs.perf && '--perf' || '' }} --run_id ${{ github.run_id }} | |||
python workflow_bootstrap.py ${{ inputs.fork }} ${{ inputs.ref }} \ | |||
${{ (inputs.machine == 'all' || inputs.machine == '__really_all') && inputs.machine || '"$BENCHMARK_RUNNER_NAME"' }} \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can just be $BENCHMARK_RUNNER_NAME
, and the special logic in workflow.should_run
to handle all
and __really_all
could be removed. This is a bit of vestige from when the individual jobs were more entwined.
@@ -80,7 +80,7 @@ jobs: | |||
benchmarks: ${{ inputs.benchmarks }} | |||
pgo: true | |||
perf: false | |||
force: true | |||
force: ${{ ! startsWith(inputs.machine, 'group ') }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the reasoning here. The idea is that the benchmark
always recreates the HEAD, but will only recreate the BASE if necessary. (And there's a "advanced" mode of using _benchmark
directly if you don't want to force HEAD
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't considered using _benchmark directly, because I usually do want to run the base commit and generate graphs, I just don't want to replace all the "head" runs if they already exist.
My process has involved a lot of experimental configurations (different flags, compilers, etc), all as separate runners. I've also been tweaking the baselines. I disable runners that don't work out (compiler segfaults, tools that need updating, etc), and then re-enable them later when those issues are fixed/worked around. That creates a lot of gaps in the coverage of some of those runners. Running them individually is a bit awkward. The 'backfill' script doesn't handle flags correctly. With this change, I can just select 'host: centurion' as group for a particular revision/flags and the workflow does everything it needs to do to make sure all runners have results for that revision/flags, without rerunning and overwriting results that are already there.
However, I haven't encountered situations where I want to replace results. Maybe I haven't run into those yet. I'm happy to revert this part of the change altogether, if you think it's better to keep forcing 'head' runs. (I should go and fix the backfill script anyway; I should have a couple new runner machines arriving this week, so it'll see a lot of use as I populate results on those.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I think the least confusing way forward is probably to hardcode this to false
, then. Needing to force re-run with the exact same commit is pretty rare, so we can relegate that case to _benchmark
.
(force
used to be a checkbox, but then was hardcoded to true
when we hit GitHub's maximum of 10 parameters in the GUI.)
It occurs to me that this will be useful for the new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request adds configurable groups for runners so that benchmark runs can be scheduled on multiple machines simultaneously, while ensuring that 'force' is disabled for group selections. Key changes include updating workflow template files to account for group selection logic, modifying the install and merge base scripts to integrate group handling, and updating the runner configuration and changelog.
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
bench_runner/templates/benchmark.src.yml | Quotes the machine input and uses a group-based force logic update |
bench_runner/templates/_benchmark.src.yml | Adopts a multi-line command pattern for bootstrap commands |
bench_runner/scripts/install.py | Adds configuration support for groups and updates runner choices |
bench_runner/scripts/get_merge_base.py | Implements group handling for merge base determination |
bench_runner/runners.py | Updates Runner instantiation to include groups |
CHANGELOG.md | Documents changes regarding configurable groups |
I'm rethinking this approach. I'm not sure if it's really useful for a runner to be in more than one group. I mean, it makes some sense to group them together by compiler or OS, but looking at how I've actually used the "run on all these runners" feature is by host. The reason I'm rethinking it is because I also need better organisation of the results (52 runners and counting is getting a bit unwieldy) so I'd like to organise them in groups and have either separate pages or collapsible sections per group. (I'd also use the same groups to generate one longitudinal plot per group, so I don't have 52 runners in a single one.) So I'd like to add a single "group" to runners, instead, and use that in a few places, probably including scheduling weekly runs. The feature in this PR could be turned into "tags" instead, though, if you think it's useful. I don't know if having just one 'group' will be enough for your idea for tailcall runners. I think it might be useful, but I don't have an immediate use for it, so I can also just shelve it. |
2fc278b
to
8396175
Compare
I've added tag expansion to the list of runners in the weekly config, and verified it behaves as expected. (Still need to write tests, but as I said, I'm not sure I need this feature myself now.) |
I'm happy to defer this until there's something that fully meets your needs. I don't have an immediate need (and the tailcalling thing can be done almost as easily by manually listing the runners in the weekly config). |
As I have a growing collection of runners (different configurations on a couple of hosts) that I'm constantly tweaking and pruning, I've had the desire to schedule runs on collections of them, and also not rerun commits/config combos that already have results. Adding 'force' as a configurable option is problematic because of the annoying GHA input limit, so this does both at the same time.
This change adds tags that you can add to runners. Group of tags show up in the list of benchmark 'machines', and selecting them schedules runs on all the matching runners. It plays well with #416, which makes the selected tag show up in the workflow name.
This also sets 'force' to false when scheduling runs from the benchmark workflow, as suggested in review. Force can still be set to true by running directly via the _benchmark workflow instead.
This could probably do with a test or two.