This repository contains files and instructions to compile PyTorch on ANL Aurora with the MPI distributed backend.
To compile PyTorch, follow these instructions on a compute node:
- Start an interactive job and setup access to the internet.
- Load the
frameworksmodule:module load frameworks - Clone PyTorch from Github (tested with v2.8).
- Add "xpu" to list of devices supported by the MPI backend here.
- Copy the
pytorch_build.shscript from this repository into the root of the PyTorch directory and run it to compile PyTorch (./pytorch_build.sh). - Install your PyTorch build as a user package in the
frameworksmodule:pip install --user dist/*. - To test your installation, run
run_dist_test.pyon a multinode interactive job. - To understand how to use the MPI backend in your application, use
dist_test.pyas an example.