|
| 1 | +Running MPI jobs with Slurm |
| 2 | +=========================== |
| 3 | + |
| 4 | +.. JMS How can I create a TOC just for this page here at the top? |
| 5 | +
|
| 6 | +///////////////////////////////////////////////////////////////////////// |
| 7 | + |
| 8 | +.. _faq-slurm-support-label: |
| 9 | + |
| 10 | +Does Open MPI support running jobs under Slurm? |
| 11 | +----------------------------------------------- |
| 12 | + |
| 13 | +Yes. |
| 14 | + |
| 15 | +Open MPI supports two modes of launching parallel MPI jobs under |
| 16 | +Slurm: |
| 17 | + |
| 18 | +#. Using ``mpirun``. Open MPI's ``mpirun`` will detect that it is |
| 19 | + inside of a Slurm job, and will automatically utilize the SLURM |
| 20 | + infrastructure for launching and controlling the individual MPI |
| 21 | + processes. |
| 22 | + |
| 23 | + Using this method, you get the full power and extensive features of |
| 24 | + Open MPI's ``mpirun`` command (see the ``mpirun(1)`` man page for |
| 25 | + details). |
| 26 | + |
| 27 | +#. Using ``srun``. Assuming that Slurm installed its Open MPI plugin, |
| 28 | + you can use ``srun`` to "direct launch" Open MPI applications |
| 29 | + without the use of Open MPI's ``mpirun`` command. |
| 30 | + |
| 31 | + Using direct launch can be *slightly* faster when launching very, |
| 32 | + very large MPI processes (i.e., thousands or millions of MPI |
| 33 | + processes in a single job). But it has significantly fewer |
| 34 | + features than Open MPI's ``mpirun``. |
| 35 | + |
| 36 | + .. note:: In versions of Open MPI prior to |ompi_series|, using |
| 37 | + ``srun`` for direct launch was faster than using |
| 38 | + ``mpirun``. **This is no longer true.** |
| 39 | + |
| 40 | +Unless there is a strong reason to use ``srun`` for direct launch, the |
| 41 | +Open MPI team recommends using ``mpirun`` for launching under Slurm jobs. |
| 42 | + |
| 43 | +///////////////////////////////////////////////////////////////////////// |
| 44 | + |
| 45 | +What's the difference between using ``mpirun`` and ``srun``? |
| 46 | +------------------------------------------------------------ |
| 47 | + |
| 48 | +.. error:: JMS Ralph to provide content here. |
| 49 | + |
| 50 | +///////////////////////////////////////////////////////////////////////// |
| 51 | + |
| 52 | +How do I run use ``mpirun`` to launch jobs under Slurm? |
| 53 | +------------------------------------------------------- |
| 54 | + |
| 55 | +Pretty much exactly as you would if you were not in a Slurm job. |
| 56 | + |
| 57 | +For example, you can launch Open MPI's ``mpirun`` in an interactive |
| 58 | +Slurm allocation (via the ``salloc`` command) or you can submit a |
| 59 | +script to Slurm (via the ``sbatch`` command) that includes an |
| 60 | +invocation of the ``mpirun`` command. |
| 61 | + |
| 62 | +Regardless of how ``mpirun`` is invoked, if it detects that it is |
| 63 | +running in a Slurm job, ``mpirun`` automatically obtains both the list |
| 64 | +of hosts and how many processes to start on each host from Slurm |
| 65 | +directly. Hence, it is unnecessary to specify the ``--hostfile``, |
| 66 | +``--host``, or ``-np`` options to ``mpirun``. Open MPI will also use |
| 67 | +Slurm-native mechanisms to launch and kill processes -- |
| 68 | +``ssh`` is not required. |
| 69 | + |
| 70 | +For example: |
| 71 | + |
| 72 | +.. code-block:: sh |
| 73 | + :linenos: |
| 74 | +
|
| 75 | + # Allocate a Slurm job with 4 nodes |
| 76 | + shell$ salloc -N 4 |
| 77 | + # Now run an Open MPI job on all the nodes allocated by Slurm |
| 78 | + shell$ mpirun my_mpi_application |
| 79 | +
|
| 80 | +This will run the 4 MPI processes on the nodes that were allocated by |
| 81 | +Slurm. |
| 82 | + |
| 83 | +Or, if submitting a script: |
| 84 | + |
| 85 | +.. code-block:: sh |
| 86 | + :linenos: |
| 87 | +
|
| 88 | + shell$ cat my_script.sh |
| 89 | + #!/bin/sh |
| 90 | + mpirun my_mpi_application |
| 91 | + shell$ sbatch -N 4 my_script.sh |
| 92 | + srun: jobid 1234 submitted |
| 93 | + shell$ |
| 94 | +
|
| 95 | +Similar to the ``salloc`` case, no command line options specifing |
| 96 | +number of MPI processes were necessary, since Open MPI will obtain |
| 97 | +that information directly from Slurm at run time. |
| 98 | + |
| 99 | +///////////////////////////////////////////////////////////////////////// |
| 100 | + |
| 101 | +How do I use ``srun`` to directly launch Open MPI applications? |
| 102 | +--------------------------------------------------------------- |
| 103 | + |
| 104 | +.. note:: Per :ref:`this FAQ entry <faq-slurm-support-label>`, the |
| 105 | + Open MPI team generally recommends using ``mpirun`` for |
| 106 | + launching MPI jobs. |
| 107 | + |
| 108 | +First, you must ensure that Slurm was built and installed with PMI-2 |
| 109 | +support. |
| 110 | + |
| 111 | +.. note:: Please ask your friendly neighborhood Slurm developer to |
| 112 | + support PMIx. PMIx is the current generation of run-time |
| 113 | + support API; PMI-2 is the legacy / antiquated API. Open MPI |
| 114 | + *only* supports PMI-2 for SLURM. :-) |
| 115 | + |
| 116 | +Yes, if you have configured OMPI ``--with-pmi=foo``, where ``foo`` is |
| 117 | +the path to the directory where ``pmi2.h`` is located. |
| 118 | + |
| 119 | +.. error:: JMS Ralph -- what else do we need to say here? |
| 120 | + |
| 121 | +Open MPI applications can then be launched directly via the ``srun`` |
| 122 | +command. For example: |
| 123 | + |
| 124 | +.. code-block:: sh |
| 125 | + :linenos: |
| 126 | +
|
| 127 | + shell$ srun -N 4 my_mpi_application |
| 128 | +
|
| 129 | +Or you can use ``sbatch`` with a script: |
| 130 | + |
| 131 | +.. code-block:: sh |
| 132 | + :linenos: |
| 133 | +
|
| 134 | + shell$ cat my_script.sh |
| 135 | + #!/bin/sh |
| 136 | + srun my_mpi_application |
| 137 | + shell$ sbatch -N 4 my_script.sh |
| 138 | + srun: jobid 1235 submitted |
| 139 | + shell$ |
| 140 | +
|
| 141 | +Similar using ``mpirun`` inside of an ``sbatch`` batch script, no |
| 142 | +``srun`` command line options specifing number of processes were |
| 143 | +necessary, because ``sbatch`` set all the relevant Slurm-level |
| 144 | +parameters about number of processes, cores, partition, etc. |
| 145 | + |
| 146 | +///////////////////////////////////////////////////////////////////////// |
| 147 | + |
| 148 | +I use Slurm on a cluster with the OpenFabrics or UCX network stacks. Do I need to do anything special? |
| 149 | +------------------------------------------------------------------------------------------------------- |
| 150 | + |
| 151 | +Yes. |
| 152 | + |
| 153 | +You need to ensure that Slurm sets up the locked memory |
| 154 | +limits properly. |
| 155 | + |
| 156 | +.. error:: JMS Need to point to general web pages about setting locked |
| 157 | + memory limits. |
| 158 | + |
| 159 | + They used to be at |
| 160 | + ``category=openfabrics#ib-locked-pages`` and |
| 161 | + ``category=openfabrics#ib-locked-pages-more``. |
| 162 | + |
| 163 | + This should probably be in a general networking section -- |
| 164 | + not specific to verbs/openib. |
| 165 | + |
| 166 | +///////////////////////////////////////////////////////////////////////// |
| 167 | + |
| 168 | +My job fails / performs poorly when using mpirun under Slurm 20.11 |
| 169 | +------------------------------------------------------------------ |
| 170 | + |
| 171 | +There were some changes in Slurm behavior that were introduced in |
| 172 | +Slurm 20.11.0 and subsequently reverted out in Slurm 20.11.3. |
| 173 | + |
| 174 | +SchedMD (the makers of Slurm) strongly suggest that all Open MPI users |
| 175 | +avoid using Slurm versions 20.11.0 through 20.11.2. |
| 176 | + |
| 177 | +Indeed, you will likely run into problems using just about any version |
| 178 | +of Open MPI these problematic Slurm releases. |
| 179 | + |
| 180 | +.. important:: Please either downgrade to an older version or upgrade |
| 181 | + to a newer version of Slurm. |
0 commit comments