d8, d8b d8b
`8P ?88 d8P 88P
88b d888888P.d88
88bd88b 88b d888b8b 888888b ?88' 888 ?88 d8P
88P' ?8b 88P,d8P' ?88 88P `?8b 88P ?88 d88 88
d88 88P d88 88b ,88b d88 88P.;88b 88b ?8( d88
d88' 88b.88' `?88P'`88.d88' 88b `?8b 88b,`?88P'?8b
)88 )88
,88P ,d8P
`?8888P `?888P'
This repository contains scripts to build and run picongpu on multiple systems in multiple test configurations.
Results are output in a custom work directory.
The top-level program for automated runs is nightly.py
.
It checks recent code commits and only runs what is needed.
To adapt this to your own workflow, read this document,
run these scripts on a test repository to check
your understanding, then edit the scripts in templates/
directory.
-
python packages:
- pyyaml
- jinja2
-
shell utilities:
- git
-
nightly.py <config.yaml> <machine name>
Change to the "repo" dir for the given machine and execute `git fetch`. Then check for new commits and execute build and run steps as appropriate.
-
build.py <config.yaml> <build info>
Carry out a build and log the results. -
run.py <config.yaml> <run info>
Carry out a run and log the results. -
results.py <config.yaml>
Gather all results from completed runs. -
sum.py <config.yaml>
Create a human-readable summary of the build and run results.
Configurations are a 2-level tree with buildID and runID parts.
The buildID is composed from all buildvars present in the config.yaml
,
plus machine
and commit
as two mandatory fields.
In the example config.yaml
shipping with this program,
the variables are as follows:
- buildID
- machine = machine name
- commit = picongpu commit hash
- compiler = compiler module name
- accel = accelerator module name (cuda or rocm)
- mpi = mpi module name
- problem = physics problem name
- phash = physics problem commit hash (for FOM)
The runID is composed from all runvars present in the config.yaml
,
plus the information from the buildID. In the example config.yaml
,
these include grid sizes for problems:
- runID
- buildID tuple
- nx, ny, nz = processor grid size (x, y, z)
- grid = global grid size "x y z"
- periodic = 0/1 flags indicating periodicity "x y z"
NOTE
Planned, future config. variables:
- particle type
- Maxwell solver type
The tuple of all information in a buildID is called the buildID tuple
.
It is inlined into the first several elements of the runID.
When a string ID is needed, the "build ID" and "run ID"-s are hashed.
For details, see the derive
function in helpers.py
.
These configuration options present in config.yaml
are
are used to define file locations, and run information throughout
the build/run process. In particular, they are present during
rendering of the jinja2 templates in templates/*.j2
.
More details are provided below.
The build takes place in build_dir
= $WORK/buildID
.
All build scripts are launched from this directory.
Both build_dir
and buildID
are available to the templates
in case they need absolute paths. No files should be changed
outside this directory, however.
The build.py
script carries out these steps, aborting the process on error:
-
mkdir
build_dir
and runtemplates/clone.sh <git repo dir> <commit>
- this script should clone picongpu at the specified commit into
picongpu
- its output is captured to
clone.log
- nonzero return aborts the build
- this script should clone picongpu at the specified commit into
-
create shell scripts in
build_dir
templates/<machine>.env.sh.j2 % buildID tuple
~>env.sh
- this is for you to re-use to setup the compile / run shell environment
templates/setup.sh.j2 % buildID tuple
~>setup.sh
templates/build.sh.j2 % buildID tuple
~>build.sh
- This step has no separate logfile. Only
status.txt
records its completion.
-
execute
./setup.sh
- this script should copy-in the physics problem at the required commit hash
- its output is captured to
setup.log
- nonzero return aborts the build
-
execute
./build.sh
- this script should do:
- source env.sh
- run pic-create
- mkdir + cd to
build
- execute cmake
- this script should do:
- output is captured by
build.log
- nonzero return code indicates an error
- report success / failure to
$WORK/builds.csv
Runs take place in run_dir
= $WORK/buildID/runID
.
All run scripts are launched from this directory.
Both, run_dir
and runID
are available to the templates
in case they need absolute paths.
No files should be changed outside run_dir
, however.
The run.py
script carries out these steps, aborting the process on error:
-
mkdir
run_dir
and create its shell scriptstemplates/<machine>.run.sh.j2 % runID tuple
~>run.sh
- this script should create a batch script and submit it to the queue
templates/result.sh.j2 % runID tuple
~>result.sh
- This script is run later to check run results.
- This script may report extra information in a file, "result.txt"
- each line of the file becomes a record in
results.csv
- the results header (i.e. record labels) come from config.yaml's
resultvars
- each line of the file becomes a record in
- It should be idempotent, returning 99 if the run has not completed yet.
-
execute
./run.sh
- its output is captured to
run.log
- nonzero return aborts the run
- its output is captured to
-
report success / failure to
$WORK/buildID/runs.csv
file
The results.py
program simply works through all run directories
that are incomplete and executes their result.sh
script
(from within the run_dir
). As usual, script
output is logged to result.log
.
It reports success / failure to $WORK/buildID/results.csv
file.
It also checks whether a result.txt
file exists.
If so, it treats it as a newline-delimited list.
It tokenizes the list, and appends it to the entry in results.csv
.
Note that this only scans runs in the buildID/runs.csv
file
that are not already present (or are present, but
marked incomplete).
To mark a run incomplete, result.sh
should return 99.
build.py
logs to$WORK/builds.csv
run.py
logs to$WORK/buildID/runs.csv
results.py
logs to$WORK/buildID/results.csv
The outputs from each run are stored in several places:
-
build directory @
$WORK/buildID
- contains a
status.txt
documenting the history of the compile state
- contains a
-
run-directory @
$WORK/runID
- contains a
status.txt
documenting the history of the run state
- contains a
-
build information summary @
$WORK/builds.csv
- Schema: buildID, date, compile return code (0 if OK), buildID tuple elements
-
run information summary @
$WORK/buildID/runs.csv
- Schema: runID, date, run return code, runID tuple elements
-
completed job information summary @
$WORK/results.csv
- Schema: runID, date, results return code, resultvars (gathered from results.txt)