Collection of various machine learning benchmarks together with Slurm scripts for CSC's supercomputers.
The benchmarks themselves (Python code) can be found in the benchmarks
directory. Main run scripts are in the root directory as *.sh files. The Slurm
settings have been separated into their own scripts in the slurm directory.
Typical usage would be to first select a benchmark (e.g., PyTorch synthetic) and then appropriate Slurm settings (e.g., Mahti with 4 GPUs on Mahti, single node, no MPI). The command would then be:
sbatch slurm/mahti-gpu4.sh pytorch-synthetic.shSlurm run scripts can be found in the slurm directory, these are named as
[puhti|mahti]-[cpu|gpu]N.sh where N is the number of CPUs or GPUs reserved.
Scripts are all single-node, single MPI task unless it ends with -mpi.sh.
Tasks with the -mpi.sh ending launch a separate MPI task for each GPU,
assuming 4 GPUs per node. For example mahti-gpu8-mpi.sh reserves two nodes,
with 4 GPUs (and thus 4 MPI tasks) per node, giving a total of 8 GPUs (and 8 MPI
tasks).
| Benchmark | Script name | Data |
|---|---|---|
| PyTorch synthetic | pytorch-synthetic.sh |
synthetic |
| PyTorch DDP | pytorch-ddp.sh |
synthetic/ImageNet |
| PyTorch DDP Lightning | pytorch-ddp-lightning.sh |
synthetic/ImageNet |
| PyTorch DeepSpeed | pytorch-deepspeed.sh |
synthetic/ImageNet |
| run_clm | pytorch-clm.sh |
WikiText-2 |
| TensorFlow CNN | tensorflow-cnn.sh |
synthetic/ImageNet |
The different benchmarks are described below in more detail.
Originally based on Horovod's example script with the same name. Note that the original script used a single fixed random batch which was feed to the network again and again. Some systems and setups are able to optimize this scenario giving very unrealistic results. We have modified the script to generate a new random batch each time.
Runs with "resnet50" model by default, but also supports "inception_v3" and other models from torchvision.models.
Run example with single GPU:
sbatch slurm/mahti-gpu1.sh pytorch-synthetic.shRun example with 4 GPUs. Note that you can also add arguments to be given to the Python script:
sbatch slurm/mahti-gpu4.sh pytorch-synthetic.sh --batch-size=32Using 8 GPUs (i.e., 2 nodes) with Horovod and MPI (not supported in newer PyTorch installations):
sbatch slurm/mahti-gpu8-mpi.sh pytorch-synthetic.shPyTorch benchmark using Distributed Data Parallel for handling multiple GPUs.
Run example with 4 GPUs on Puhti using synthetic data:
sbatch slurm/puhti-gpu4.sh pytorch-ddp.shRun example with 8 GPUs (on 2 nodes) using real ImageNet data:
sbatch slurm/puhti-gpu8.sh pytorch-ddp.sh --dataRun example with 8 GPUs (2 nodes) with fp16:
sbatch slurm/puhti-gpu8.sh pytorch-ddp.sh --fp16PyTorch Lightning example using DDP. Runs with "resnet50" model by default, but also supports "inception_v3" and other models from torchvision.models.
DDP on Lightning (as of PyTorch 1.13) needs to be run as single task per GPU:
sbatch slurm/puhti-gpu4-mpi.sh pytorch-ddp-lightning.sh # single node
sbatch slurm/puhti-gpu8-mpi.sh pytorch-ddp-lightning.sh # two nodesThe scripts supports --data option to use real ImageNet data instead
of synthetic data and --fp16 to enable 16-bit precision for some
operations.
DeepSpeed example, 4 GPUs with synthetic data (note: one node = one task):
sbatch slurm/puhti-gpu4.sh pytorch-deepspeed.sh8 GPUs, 2 nodes with ImageNet data (note one GPU = one task):
sbatch slurm/puhti-gpu8-mpi.sh pytorch-deepspeed.sh --dataFine-tuning GPT-like model on WikiText-2, directly from Huggingface Language modeling examples.
Run example with a full node GPUs (in this case 8 GPUs on LUMI):
sbatch slurm/lumi-gpu8.sh pytorch-clm.shRun example with two full nodes GPUs (in this case 16 GPUs on LUMI):
sbatch slurm/lumi-gpu16.sh pytorch-clm.shUses tf_cnn_benchmarks.py directly from TensorFlow's GitHub (as a git
submodule here).
Run example:
sbatch slurm/mahti-gpu1.sh tensorflow-cnn.shHorovod:
sbatch slurm/mahti-gpu8-mpi.sh tensorflow-cnn.shWith real data:
sbatch slurm/mahti-gpu1.sh tensorflow-cnn.sh --dataHorovod with real data:
sbatch slurm/mahti-gpu8-mpi.sh tensorflow-cnn.sh --data


