Skip to content

Add configurable tags to runners #417

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

Yhg1s
Copy link
Collaborator

@Yhg1s Yhg1s commented Apr 21, 2025

As I have a growing collection of runners (different configurations on a couple of hosts) that I'm constantly tweaking and pruning, I've had the desire to schedule runs on collections of them, and also not rerun commits/config combos that already have results. Adding 'force' as a configurable option is problematic because of the annoying GHA input limit, so this does both at the same time.

This change adds tags that you can add to runners. Group of tags show up in the list of benchmark 'machines', and selecting them schedules runs on all the matching runners. It plays well with #416, which makes the selected tag show up in the workflow name.

This also sets 'force' to false when scheduling runs from the benchmark workflow, as suggested in review. Force can still be set to true by running directly via the _benchmark workflow instead.

This could probably do with a test or two.

@@ -112,7 +114,9 @@ jobs:
python-version: "3.11"
- name: Building Python and running pyperformance
run: |
python workflow_bootstrap.py ${{ inputs.fork }} ${{ inputs.ref }} ${{ inputs.machine }} ${{ inputs.benchmarks || 'all' }} ${{ env.flags }} ${{ inputs.force && '--force' || '' }} ${{ inputs.pgo && '--pgo' || '' }} ${{ inputs.perf && '--perf' || '' }} --run_id ${{ github.run_id }}
python workflow_bootstrap.py ${{ inputs.fork }} ${{ inputs.ref }} \
${{ (inputs.machine == 'all' || inputs.machine == '__really_all') && inputs.machine || '"$BENCHMARK_RUNNER_NAME"' }} \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can just be $BENCHMARK_RUNNER_NAME, and the special logic in workflow.should_run to handle all and __really_all could be removed. This is a bit of vestige from when the individual jobs were more entwined.

@@ -80,7 +80,7 @@ jobs:
benchmarks: ${{ inputs.benchmarks }}
pgo: true
perf: false
force: true
force: ${{ ! startsWith(inputs.machine, 'group ') }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the reasoning here. The idea is that the benchmark always recreates the HEAD, but will only recreate the BASE if necessary. (And there's a "advanced" mode of using _benchmark directly if you don't want to force HEAD).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't considered using _benchmark directly, because I usually do want to run the base commit and generate graphs, I just don't want to replace all the "head" runs if they already exist.

My process has involved a lot of experimental configurations (different flags, compilers, etc), all as separate runners. I've also been tweaking the baselines. I disable runners that don't work out (compiler segfaults, tools that need updating, etc), and then re-enable them later when those issues are fixed/worked around. That creates a lot of gaps in the coverage of some of those runners. Running them individually is a bit awkward. The 'backfill' script doesn't handle flags correctly. With this change, I can just select 'host: centurion' as group for a particular revision/flags and the workflow does everything it needs to do to make sure all runners have results for that revision/flags, without rerunning and overwriting results that are already there.

However, I haven't encountered situations where I want to replace results. Maybe I haven't run into those yet. I'm happy to revert this part of the change altogether, if you think it's better to keep forcing 'head' runs. (I should go and fix the backfill script anyway; I should have a couple new runner machines arriving this week, so it'll see a lot of use as I populate results on those.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I think the least confusing way forward is probably to hardcode this to false, then. Needing to force re-run with the exact same commit is pretty rare, so we can relegate that case to _benchmark.

(force used to be a checkbox, but then was hardcoded to true when we hit GitHub's maximum of 10 parameters in the GUI.)

@mdboom
Copy link
Contributor

mdboom commented Apr 22, 2025

It occurs to me that this will be useful for the new weekly configuration. For example, for weekly builds of the tail calling interpreter, it could run on all machines with in the group "tailcallable" (which would be all machines configured for a compiler that supports the tail calling interpreter).

@mdboom mdboom requested a review from Copilot April 23, 2025 13:34
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request adds configurable groups for runners so that benchmark runs can be scheduled on multiple machines simultaneously, while ensuring that 'force' is disabled for group selections. Key changes include updating workflow template files to account for group selection logic, modifying the install and merge base scripts to integrate group handling, and updating the runner configuration and changelog.

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
bench_runner/templates/benchmark.src.yml Quotes the machine input and uses a group-based force logic update
bench_runner/templates/_benchmark.src.yml Adopts a multi-line command pattern for bootstrap commands
bench_runner/scripts/install.py Adds configuration support for groups and updates runner choices
bench_runner/scripts/get_merge_base.py Implements group handling for merge base determination
bench_runner/runners.py Updates Runner instantiation to include groups
CHANGELOG.md Documents changes regarding configurable groups

@Yhg1s
Copy link
Collaborator Author

Yhg1s commented Apr 27, 2025

I'm rethinking this approach. I'm not sure if it's really useful for a runner to be in more than one group. I mean, it makes some sense to group them together by compiler or OS, but looking at how I've actually used the "run on all these runners" feature is by host.

The reason I'm rethinking it is because I also need better organisation of the results (52 runners and counting is getting a bit unwieldy) so I'd like to organise them in groups and have either separate pages or collapsible sections per group. (I'd also use the same groups to generate one longitudinal plot per group, so I don't have 52 runners in a single one.) So I'd like to add a single "group" to runners, instead, and use that in a few places, probably including scheduling weekly runs.

The feature in this PR could be turned into "tags" instead, though, if you think it's useful. I don't know if having just one 'group' will be enough for your idea for tailcall runners. I think it might be useful, but I don't have an immediate use for it, so I can also just shelve it.

@Yhg1s Yhg1s force-pushed the configurable-groups branch from 2fc278b to 8396175 Compare April 27, 2025 22:15
@Yhg1s Yhg1s changed the title Add configurable groups of runners Add configurable tags to runners Apr 27, 2025
@Yhg1s
Copy link
Collaborator Author

Yhg1s commented Apr 27, 2025

It occurs to me that this will be useful for the new weekly configuration. For example, for weekly builds of the tail calling interpreter, it could run on all machines with in the group "tailcallable" (which would be all machines configured for a compiler that supports the tail calling interpreter).

I've added tag expansion to the list of runners in the weekly config, and verified it behaves as expected. (Still need to write tests, but as I said, I'm not sure I need this feature myself now.)

@mdboom
Copy link
Contributor

mdboom commented Apr 28, 2025

I'm happy to defer this until there's something that fully meets your needs. I don't have an immediate need (and the tailcalling thing can be done almost as easily by manually listing the runners in the weekly config).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants