Skip to content

Commit 54247a5

Browse files
authored
Merge pull request #103 from stackhpc/fix/jobcomp
Fix jobcompletion logfile existance
2 parents 5822d34 + e012801 commit 54247a5

File tree

7 files changed

+117
-6
lines changed

7 files changed

+117
-6
lines changed

.github/workflows/ci.yml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,8 @@ jobs:
3232
- test8
3333
- test9
3434
- test10
35+
- test11
36+
- test12
3537

3638
exclude:
3739
- image: 'centos:7'
@@ -46,7 +48,10 @@ jobs:
4648
scenario: test9
4749
- image: 'centos:7'
4850
scenario: test10
49-
51+
- image: 'centos:7'
52+
scenario: test11
53+
- image: 'centos:7'
54+
scenario: test12
5055

5156
steps:
5257
- name: Check out the codebase.

README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@ package in the image.
6565

6666
#### Accounting
6767

68-
By default, no accounting storage is configured. OpenHPC v1.x and un-updated OpenHPC v2.0 clusters support file-based accounting storage which can be selected by setting the role variable `openhpc_slurm_accounting_storage_type` to `accounting_storage/filetxt`<sup id="accounting_storage">[1](#slurm_ver_footnote)</sup>. Accounting for OpenHPC v2.1 and updated OpenHPC v2.0 clusters requires the Slurm database daemon, `slurmdbd`. To enable this:
68+
By default, no accounting storage is configured. OpenHPC v1.x and un-updated OpenHPC v2.0 clusters support file-based accounting storage which can be selected by setting the role variable `openhpc_slurm_accounting_storage_type` to `accounting_storage/filetxt`<sup id="accounting_storage">[1](#slurm_ver_footnote)</sup>. Accounting for OpenHPC v2.1 and updated OpenHPC v2.0 clusters requires the Slurm database daemon, `slurmdbd` (although job completion may be a limited alternative, see [below](#Job-accounting). To enable accounting:
6969

7070
* Configure a mariadb or mysql server as described in the slurm accounting [documentation](https://slurm.schedmd.com/accounting.html) on one of the nodes in your inventory and set `openhpc_enable.database `to `true` for this node.
7171
* Set `openhpc_slurm_accounting_storage_type` to `accounting_storage/slurmdbd`.
@@ -86,16 +86,16 @@ For more advanced customisation or to configure another storage type, you might
8686
#### Job accounting
8787

8888
This is largely redundant if you are using the accounting plugin above, but will give you basic
89-
accounting data such as start and end times.
89+
accounting data such as start and end times. By default no job accounting is configured.
90+
91+
`openhpc_slurm_job_comp_type`: Logging mechanism for job accounting. Can be one of
92+
`jobcomp/filetxt`, `jobcomp/none`, `jobcomp/elasticsearch`.
9093

9194
`openhpc_slurm_job_acct_gather_type`: Mechanism for collecting job accounting data. Can be one
9295
of `jobacct_gather/linux`, `jobacct_gather/cgroup` and `jobacct_gather/none`
9396

9497
`openhpc_slurm_job_acct_gather_frequency`: Sampling period for job accounting (seconds)
9598

96-
`openhpc_slurm_job_comp_type`: Logging mechanism for job accounting. Can be one of
97-
`jobcomp/filetxt`, `jobcomp/none`, `jobcomp/elasticsearch`.
98-
9999
`openhpc_slurm_job_comp_loc`: Location to store the job accounting records. Depends on value of
100100
`openhpc_slurm_job_comp_type`, e.g for `jobcomp/filetxt` represents a path on disk.
101101

molecule/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ test8 | 1 | N | 2x compute node, 2x login-only
1919
test9 | 1 | N | As test8 but uses `--limit=testohpc-control,testohpc-compute-0` and checks login nodes still end up in slurm.conf
2020
test10 | 1 | N | As for #5 but then tries to add an additional node
2121
test11 | 1 | N | As for #5 but then deletes a node (actually changes the partition due to molecule/ansible limitations)
22+
test12 | 1 | N | As for #5 but enabling job completion and testing `sacct -c`
2223

2324
# Local Installation & Running
2425

molecule/test12/converge.yml

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
---
2+
- name: Converge
3+
hosts: all
4+
tasks:
5+
- name: "Include ansible-role-openhpc"
6+
include_role:
7+
name: "{{ lookup('env', 'MOLECULE_PROJECT_DIRECTORY') | basename }}"
8+
vars:
9+
openhpc_enable:
10+
control: "{{ inventory_hostname in groups['testohpc_login'] }}"
11+
batch: "{{ inventory_hostname in groups['testohpc_compute'] }}"
12+
runtime: true
13+
openhpc_slurm_control_host: "{{ groups['testohpc_login'] | first }}"
14+
openhpc_slurm_partitions:
15+
- name: "compute"
16+
openhpc_cluster_name: testohpc
17+
openhpc_slurm_configless: true
18+
openhpc_slurm_job_comp_type: jobcomp/filetxt

molecule/test12/molecule.yml

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
---
2+
name: single partition, group is partition
3+
driver:
4+
name: docker
5+
platforms:
6+
- name: testohpc-login-0
7+
image: ${MOLECULE_IMAGE}
8+
pre_build_image: true
9+
groups:
10+
- testohpc_login
11+
command: /sbin/init
12+
tmpfs:
13+
- /run
14+
- /tmp
15+
volumes:
16+
- /sys/fs/cgroup:/sys/fs/cgroup:ro
17+
networks:
18+
- name: net1
19+
- name: testohpc-compute-0
20+
image: ${MOLECULE_IMAGE}
21+
pre_build_image: true
22+
groups:
23+
- testohpc_compute
24+
command: /sbin/init
25+
tmpfs:
26+
- /run
27+
- /tmp
28+
volumes:
29+
- /sys/fs/cgroup:/sys/fs/cgroup:ro
30+
networks:
31+
- name: net1
32+
- name: testohpc-compute-1
33+
image: ${MOLECULE_IMAGE}
34+
pre_build_image: true
35+
groups:
36+
- testohpc_compute
37+
command: /sbin/init
38+
tmpfs:
39+
- /run
40+
- /tmp
41+
volumes:
42+
- /sys/fs/cgroup:/sys/fs/cgroup:ro
43+
networks:
44+
- name: net1
45+
provisioner:
46+
name: ansible
47+
verifier:
48+
name: ansible

molecule/test12/verify.yml

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
---
2+
3+
- name: Check slurm hostlist
4+
hosts: testohpc_login
5+
tasks:
6+
- name: Get slurm partition info
7+
command: sinfo --noheader --format="%P,%a,%l,%D,%t,%N" # using --format ensures we control whitespace
8+
register: sinfo
9+
changed_when: false
10+
- name: Assert slurm running ok
11+
assert: # PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
12+
that: "sinfo.stdout_lines == ['compute*,up,60-00:00:00,2,idle,testohpc-compute-[0-1]']"
13+
fail_msg: "FAILED - actual value: {{ sinfo.stdout_lines }}"
14+
- name: Run a slurm job
15+
command:
16+
cmd: "sbatch -N2 --wrap 'srun hostname'"
17+
register: sbatch
18+
- name: Set fact for slurm jobid
19+
set_fact:
20+
jobid: "{{ sbatch.stdout.split()[-1] }}"
21+
- name: Get job completion info
22+
command:
23+
cmd: "sacct --completion --noheader --parsable2"
24+
changed_when: false
25+
register: sacct
26+
- assert:
27+
that: "(jobid + '|0|wrap|compute|2|testohpc-compute-[0-1]|COMPLETED') in sacct.stdout"
28+
fail_msg: "Didn't find expected output for {{ jobid }} in sacct output: {{ sacct.stdout }}"
29+

tasks/runtime.yml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,16 @@
5151
notify:
5252
- Restart Munge service
5353

54+
- name: Ensure JobComp logfile exists
55+
file:
56+
path: "{{ openhpc_slurm_job_comp_loc }}"
57+
state: touch
58+
owner: slurm
59+
group: slurm
60+
access_time: preserve
61+
modification_time: preserve
62+
when: openhpc_slurm_job_comp_type == 'jobcomp/filetxt'
63+
5464
- name: Template slurmdbd.conf
5565
template:
5666
src: slurmdbd.conf.j2

0 commit comments

Comments
 (0)