Skip to content

Commit e42b34e

Browse files
author
Caspar van Leeuwen
committed
Update readme for new node_type_map functionality
1 parent 2f3c0ae commit e42b34e

File tree

1 file changed

+46
-28
lines changed

1 file changed

+46
-28
lines changed

README.md

Lines changed: 46 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -777,37 +777,38 @@ for signing. The bot calls the script with the two arguments:
777777
The section `[architecturetargets]` defines for which targets (OS/SUBDIR), (for example `linux/x86_64/amd/zen2`) the EESSI bot should submit jobs, and which additional `sbatch` parameters will be used for requesting a compute node with the CPU microarchitecture needed to build the software stack.
778778

779779
```ini
780-
arch_target_map = {
781-
"linux/x86_64/generic": "--partition x86-64-generic-node",
782-
"linux/x86_64/amd/zen2": "--partition x86-64-amd-zen2-node" }
780+
node_type_map = {
781+
"cpu_zen2": {
782+
"os": "linux",
783+
"cpu_subdir": "x86_64/amd/zen2",
784+
"slurm_params": "-p rome --nodes 1 --ntasks-per-node 16 --cpus-per-task 1",
785+
"repo_targets": ["eessi.io-2023.06-compat","eessi.io-2023.06-software"]
786+
},
787+
"gpu_h100": {
788+
"os": "linux",
789+
"cpu_subdir": "x86_64/amd/zen4",
790+
"accel": "nvidia/cc90",
791+
"slurm_params": "-p gpu_h100 --nodes 1 --tasks-per-node 16 --cpus-per-task 1 --gpus-per-node 1",
792+
"repo_targets": ["eessi.io-2023.06-compat","eessi.io-2023.06-software"]
793+
}}
783794
```
784795

785-
The map has one-to-many entries of the format `OS/SUBDIR:
786-
ADDITIONAL_SBATCH_PARAMETERS`. For your cluster, you will have to figure out
787-
which microarchitectures (`SUBDIR`) are available (as `OS` only `linux` is
788-
currently supported) and how to instruct Slurm to allocate nodes with that
789-
architecture to a job (`ADDITIONAL_SBATCH_PARAMETERS`).
796+
Each entry in the `node_type_map` dictionary describes a build node type. The key is a (descriptive) name for this build node, and its value is a dictionary containing the following build node properties as key-value pairs:
797+
- `os`: its operating system (os)
798+
- `cpu_subdir`: its CPU architecture
799+
- `slurm_params`: the SLURM parameters that need to be passed to submit jobs to it
800+
- `repo_targets`: supported repository targets for this node type
801+
- `accel` (optional): which accelerators this node has
802+
All values are strings, except repo_targets, which is a list of strings. Repository targets listed in `repo_target` should correspond to the repository IDs as defined in the `repos.cfg` file in the `repos_cfg_dir` (see below).
790803

791-
Note, if you do not have to specify additional parameters to `sbatch` to request a compute node with a specific microarchitecture, you can just write something like:
804+
Note that the Slurm parameters should typically be chosen such that a single type of node (with one specific type of CPU and one specific type of GPU) should be allocated.
805+
806+
To command the bot to build on the `cpu_zen2` node type above, one would give the command `bot:build on:arch=zen2 ...`. To command the bot to build on the `gpu_h100` node type, one would give the command `bot:build on:arch=zen4,accel=nvidia/cc90 ...`
792807

793-
```ini
794-
arch_target_map = { "linux/x86_64/generic": "" }
795-
```
796808

797809
#### `[repo_targets]` section
798810

799811
The `[repo_targets]` section defines for which repositories and architectures the bot can run a job.
800-
Repositories are referenced by IDs (or `repo_id`). Architectures are identified
801-
by `OS/SUBDIR` which correspond to settings in the `arch_target_map`.
802-
803-
```ini
804-
repo_target_map = {
805-
"OS_SUBDIR_1": ["REPO_ID_1_1","REPO_ID_1_2"],
806-
"OS_SUBDIR_2": ["REPO_ID_2_1","REPO_ID_2_2"] }
807-
```
808-
809-
For each `OS/SUBDIR` combination a list of available repository IDs can be
810-
provided.
811812

812813
The repository IDs are defined in a separate file, say `repos.cfg` which is
813814
stored in the directory defined via `repos_cfg_dir`:
@@ -911,19 +912,36 @@ event handler will throw an exception when formatting the update of the PR
911912
comment corresponding to the job.
912913

913914
```ini
914-
initial_comment = New job on instance `{app_name}` for architecture `{arch_name}`{accelerator_spec} for repository `{repo_id}` in job dir `{symlink}`
915+
new_job_instance_repo = New job on instance `{app_name}` for repository `{repo_id}`
915916
```
916917

917-
`initial_comment` is used to create a comment to a PR when a new job has been
918-
created. Note, the part '{accelerator_spec}' is only filled-in by the bot if the
919-
argument 'accelerator' to the `bot: build` command has been used.
918+
`new_job_instance_repo` is used as the first line in a comment to a PR when a new job has been created.
919+
920+
```ini
921+
build_on_arch = Building on: `{on_arch}`{on_accelerator}
922+
```
923+
924+
`build_on_arch` is used as the second line in a comment to a PR when a new job has been created. Note that the `on_accelerator` spec is only filled-in by the bot if the `on:...,accel=...` has been passed to the bot.
925+
926+
```ini
927+
build_for_arch = Building for: `{for_arch}`{for_accelerator}
928+
```
929+
930+
`build_for_arch` is used as the third line in a comment to a PR when a new job has been created. Note that the `for_accelerator` spec is only filled-in by the bot if the `for:...,accel=...` has been passed to the bot.
931+
932+
```ini
933+
jobdir = Job dir: `{symlink}`
934+
```
935+
936+
`jobdir` is used as the fourth line in a comment to a PR when a new job has been created.
937+
920938

921939
```ini
922940
with_accelerator =  and accelerator `{accelerator}`
923941
```
924942

925943
`with_accelerator` is used to provide information about the accelerator the job
926-
should build for if and only if the argument `accelerator:X/Y` has been provided.
944+
should build for if and only if the argument `on:...,accel=...` or `for:...,accel=...` has been provided.
927945

928946
#### `[new_job_comments]` section
929947

0 commit comments

Comments
 (0)