Description
Hello,
I would like to report a problem when submitting jobs to a slurm-scheduled system. On my attempts working with the most recent V1.3 Swift/T, I always get the sbatch failed!
message whenever I provoke a hypothetical tst.swift
program (or its compiled version).
Here is an example to demonstrate:
$ cat settings.sh
export QUEUE=normal
export PPN=4
###
$ swift-t -m slurm -l -n 3 -s settings.sh tst.swift
TURBINE-SLURM SCRIPT
NODES=3
PROCS=3
PPN=1
sourcing: settings.sh
done: settings.sh
TURBINE_OUTPUT=/home1/04525/tg838247/turbine-output/2017/08/05/19/42/16
TURBINE_HOME=/home1/04525/tg838247/software/swift-t/swift-t-1.3_no_mpe/turbine
wrote: /home1/04525/tg838247/turbine-output/2017/08/05/19/42/16/turbine-slurm.sh
sbatch failed!
Now, the code stored in ${TURBINE_OUTPUT}/submit.txt
does actually work on its own (i.e. when it is manually run, it does the right thing: reserving the right memory, time, ..etc and also implements the right swift/t code).
Looking somewhat closely at the code in scripts/submit/slurm/turbine-slurm-run.zsh, this meant jobs were not getting IDs, because they were never submitted to the scheduler to begin with, which can be seen in line 75. Here is a quick attempt to rectify:
JOB_ID=$( ${SUBMIT_COMMAND} | grep "[1-9][0-9]*$" )
This solution is half working: it does submit the swift/t job, and reserves the right resources, but doesn't properly extract the JOB_ID. It is kind of tolerable however, as I can squeue
anyway.
I know there must be a neater way to do this, so I'm eager to know it, but I hope this is nonetheless helpful.
Thank you,
Azza