-
Notifications
You must be signed in to change notification settings - Fork 0
Using Slurm to submit jobs on Cannon (Part 1)
The Cannon cluster uses the Slurm to manage its computational resources. Here we provide a brief overview of how you can use Slurm to schedule jobs.
We also recommend that you read the Running Jobs page on the FASRC documentation site, which contains more detailed information about Slurm.
As of Jan 1, 2023, all of the hardware owned by PIs in Harvard-SEAS has been pooled together into the following partitions:
- huce_cascade: Approximately 6400 compute cores. Suitable for GEOS-Chem and/or GCHP simulations.
- seas_compute: Approximately 5000 compute cores. Suitable for GEOS-Chem and/or GCHP simulations.
- sapphire: Approximately 21,500 compute cores. Suitable for GEOS-Chem and/or GCHP simulations.
- seas_gpu: Approximately 2500 Graphical Processor Units (GPUs). Suitable for Machine Learning and related applications.
For more information, please see the SEAS compute resources page at the FASRC documentation site.
You can submit interactive jobs (i.e. a command-line window on a computational node) to any partition. Interactive jobs are particularly useful for compiling GEOS-Chem and/or GCHP, or for running interative data analysis/plotting code.
Submitting your interactive jobs to the seas_compute partition might result in long wait times, and might also increase your fairshare score. For this reason, we recommend using the test partition for all interactive sessions. The test partition allows you to use the following resources:
- Up to 5 simultaneous interactive sessions
- Up to 12 hours of requested time
- 96 cores per user
- 384 GB memory per user
Cannon has other partitions (described here in detail) that you can use. However, your job will be competing for resources with users across the entire Cannon cluster.
Before we go too much further, please take a moment to review some of the more commonly used SLURM commands
When you log into Cannon, you will be placed into a login node. The login nodes are sufficient for light computation, but for more CPU-intensive tasks (e.g. running GEOS-Chem in interactive mode, compiling with more than one processor, running IDL scripts), you should request resources with the SLURM salloc
command, and then log into the resources provided with ssh
.
salloc --x11=all -c 4 -N 1 --mem=8000 -t 0-02:00 -p test
source ~/envs/gcc_cmake.gfortran102_cannon.env
salloc --x11=all -c 8 -N 1 --mem=12000 -t 0-08:00 -p test
source ~/envs/gcc_cmake.gfortran102_cannon.env
The SLURM salloc
command takes these arguments:
salloc --x11=all -c <NUMBER-OF-CORES> -N <NUMBER-OF-NODES> --mem=<MEM> -t <TIME> -p <PARTITION>
where:
-p <PARTITION-NAME>
- Requests a specific partition (aka queue) for the resource allocation. We recommend starting all interactive sessions in the Cannon
test
partition. |
--x11=all
- Starts X11 display (for graphical window display).
-c <NUMBER-OF-CORES>
- Specifies the number of cores per node that your job will use.
-N <NUMBER-OF-NODES>
- Requests the number of nodes that will be allocated to this job.
- For GEOS-Chem "Classic" simulations, you can only use 1 node due to limitations of the OpenMP parallelization.
- For GCHP simulations, you may use more than one node.
--mem=<MB>
- Specifies the real memory required per node in MegaBytes.
-t <TIME>
- Specifies the time limit for the interactive job in minutes. Acceptable formats for time are
minutes
,hours:minutes:seconds
, anddays-hours:minutes
.
After you request an interactive session, you may notice that your login prompt may change. For example, when you log into cannon using login.rc.fas.harvard.edu
, your unix prompt may have looked like this:
USER@holylogin04 $
But in the interactive session, your prompt may look something like this:
USER@holyc19315 $
NOTE: if you are on the one of the holy*
nodes on cannon, then this means you are on a machine in holyoke, ma (about 100 miles from Harvard).
Note that SLURM only requests a number of CPUs from the system, but it will not actually tell GEOS-Chem how many cores to use. Parallelized GEOS-Chem simulations will use the number of cores specified by the environment variable $OMP_NUM_THREADS
.
$OMP_NUM_THREADS
will be set automatically for you when you source one of the GEOS-Chem environment files. This will set $OMP_NUM_THREADS
to the same number of CPUs that you requested in your
interactive session.
If for some reason you wanted to change the value of $OMP_NUM_THREADS
within an interactive session, simply type:
export OMP_NUM_THREADS=<NUMBER-OF_CORES>
where <NUMBER-OF-CORES>
is the new number of cores that you want to use.
Cannon interactive sessions will freeze if left idle for more than an hour. An easy way to prevent this from happening is to open a new tmux session once your interactive job starts. Use this command:
$ tmux -a -t my_session
Before logging out of the interactive session, terminate the tmux session by typing:
$ exit
We have found that forwarding your SSH private key from your PC or Mac does not propagate to Cannon interactive sessions properly. We recommend setting up another keypair on Cannon and adding the corresponding public key to any sites that you need to access via ssh (such as GitHub). Once your Cannon interactive job starts, run the ssh-agent
with these commands:
$ eval $(ssh-agent -s)
$ ssh-add ~/.ssh/YOUR-PRIVATE-KEY-ON-CANNON
Then add the corresponding public key to all websites (e.g. GitHub) that you would like to access from within the interactive session.
Note that YOUR-PRIVATE-KEY-ON-CANNON
must be readable and editable only by you (i.e. with rw-------
, aka chmod 600
). For more information, please see the Set up SSH keys page.