-
Notifications
You must be signed in to change notification settings - Fork 1
clima
clima.gps.caltech.edu
is a GPU node with 8x NVIDIA A100 GPUs.
Email help-gps@caltech.edu
and request access
Unlike central, clima
has a handful of modules available. The recommended approach is to install in your home directory.
Add to your local ~/.ssh/config
file
Host clima
HostName clima.gps.caltech.edu
User [username]
To access from outside the network, either use the Caltech VPN
Match final host !ssh.caltech.edu,*.caltech.edu !exec "nc -z -G 1 login.hpc.caltech.edu 22"
ProxyJump ssh.caltech.edu
-
/home/[username]
(capped at 1TB): mounted fromsampo
, and is backed up -
/net/sampo/data1
(200TB): mounted fromsampo
. Not backed up, but somewhat protected by redundant RAID partition -
/scratch
(70TB): fast SSD, not backed up and no RAID redundancy
top
clima
has 8×NVIDIA 80GB A100 GPUs, connected via NVlink.
-
nvidia-smi
gives a summary of all the GPUs-
nvidia-smi topo -m
shows the connections between GPUs and CPUs
-
-
nvtop
gives you a live-refresh of current GPU usage
It has a single-node installation of slurm.
We have set up a common environment. You can load this by
module load common
which currently loads
openmpi/4.1.5-cuda julia/1.9.3 cuda/julia-pref
This will set the appropriate Julia preferences, so you should not need to e.g. call MPIPreferences.use_system_binary()
.
Please avoid using clima
for long-running CPU-only jobs. The Resnick HPC cluster is better for that.
While GPUs can be used directly, it is always recommended to schedule jobs using Slurm: this prevents allocation of multiple jobs on the same GPU, which can cause significant performance degradation.
For example
$ srun --gpus=2 --pty bash -l # request a session with 2 GPUs
$ nvidia-smi -L
GPU 0: NVIDIA A100-SXM4-80GB (UUID: GPU-1768fcec-d945-7435-1f8e-85d30cdf310e)
GPU 1: NVIDIA A100-SXM4-80GB (UUID: GPU-6420b6b9-bb34-a58d-8090-61887fd97931)
See also notes on interactive jobs via Caltech-HPC: https://www.hpc.caltech.edu/documentation/slurm-commands