Easily manage and submit robust jobs to Slurm using Python and Bash.
- Freezes source code by copying to separate
$JOB_DIR
. - Auto-submits another job if current job times out.
- Exposes hooks for custom bash code:
setup
/setup_resume
,on_run
/on_run_resume
, andteardown
. - Format job names using parameters from config files.
- Interactive jobs supported for easy debugging.
pip install easy-slurm
Easy Slurm provides a CLI / YAML interface, as well as a Python interface.
To submit a job, simply fill in the various parameters shown in the example below.
import easy_slurm
easy_slurm.submit_job(
job_dir="$HOME/jobs/{date}-{job_name}",
src=["./src", "./assets"],
setup="""
virtualenv "$SLURM_TMPDIR/env"
source "$SLURM_TMPDIR/env/bin/activate"
pip install -r "$SLURM_TMPDIR/src/requirements.txt"
""",
setup_resume="""
# Runs only on subsequent runs. Call setup and do anything else needed.
setup
""",
on_run="cd src && python main.py",
on_run_resume="cd src && python main.py --resume",
teardown="""
# Do any cleanup tasks here.
""",
sbatch_options={
"job-name": "example-simple",
"account": "your-username",
"time": "3:00:00",
"nodes": "1",
},
resubmit_limit=64, # Automatic resubmission limit.
)
All job files will be kept in the job_dir
directory. Provide directory paths to src
-- these will be archived and copied to the job_dir
directory. Also provide Bash code in the hooks, which will be run in the following order:
First run: | Subsequent runs: |
---|---|
setup |
setup_resume |
on_run |
on_run_resume |
teardown |
teardown |
Full examples can be found here, including a simple example to run "training epochs" on a cluster.
Jobs can also be fully configured using YAML files. See examples/simple_yaml
.
job.yaml |
job_dir: "$HOME/jobs/{date}-{job_name}"
src: ["./src", "./assets"]
setup: |
virtualenv "$SLURM_TMPDIR/env"
source "$SLURM_TMPDIR/env/bin/activate"
pip install -r "$SLURM_TMPDIR/src/requirements.txt"
setup_resume: |
# Runs only on subsequent runs. Call setup and do anything else needed.
setup
on_run: "cd src && python main.py"
on_run_resume: "cd src && python main.py --resume"
teardown: |
# Do any cleanup tasks here.
sbatch_options:
job-name: "example-simple"
account: "your-username"
time: "3:00:00"
nodes: 1
resubmit_limit: 64 # Automatic resubmission limit. |
Then submit the job using:
easy-slurm --job="job.yaml"
One can override the parameters in the YAML file using command-line arguments. For example:
easy-slurm --job="job.yaml" --src='["./src", "./assets", "./extra"]'
One useful feature is formatting paths using custom template strings:
easy_slurm.submit_job(
job_dir="$HOME/jobs/{date:%Y-%m-%d_%H-%M-%S_%3f}-{job_name}",
)
The job names can be formatted using a config dictionary:
easy_slurm.submit_job(
sbatch_options={
"job-name": "bs={hp.batch_size:04},lr={hp.lr:.1e}",
# Equivalent to:
# "job-name": "bs=0032,lr=1.0e-02"
},
config={"hp": {"batch_size": 32, "lr": 1e-2}},
)
This helps in automatically creating descriptive, human-readable job names.
For the CLI / YAML interface, the same can be achieved using the --config
argument:
easy-slurm --job="job.yaml" --config="config.yaml"
See the documentation for more information and examples.