-
Notifications
You must be signed in to change notification settings - Fork 0
perlmutter

This document is a continuous work-in-progress, intended to provide up-to-date information on a public install maintained by (or in collaboration with) the UPC++ team. However, systems are constantly changing. So, please report any errors or omissions in the issue tracker.
Typically installs of UPC++ are maintained only for the current default versions of the system-provided environment modules such as for PrgEnv, CUDA and compiler.
This document is not a replacement for the documentation provided by the centers, and assumes general familiarity with the use of the system.
Stable installs are available through environment modules. A wrapper is used
to transparently dispatch commands such as upcxx to an install appropriate to
the currently loaded PrgEnv-{gnu,cray,nvidia,aocc} and compiler (gcc,
cce, nvidia or aocc) environment modules.
In order to access the UPC++ installation on Perlmutter, one must run
$ module load contribto add a non-default directory to the MODULEPATH before the UPC++ environment
modules will be accessible. We recommend inclusion of this command in one's
shell startup files, such as $HOME/.login or $HOME/.bash_profile.
If not adding the command to one's shell startup files, the module load contrib
command will be required once per login shell in which you need a upcxx
environment module.
Environment modules provide two alternative configurations of the UPC++ library:
-
upcxx-cuda
This module supports memory kinds, a UPC++ feature that enables communication to/from GPU memory viaupcxx::copyonupcxx::global_ptr<T, memory_kind::cuda_device>. When using this module,copyoperations oncuda_devicememory leverage GPUDirect RDMA ("native" memory kinds). -
upcxx
This module omits support for constructing an activeupcxx::device_allocator<upcxx::cuda_device>object, resulting in a small potential speed-up for applications which do not require a "CUDA-aware" build of UPC++.
By default each module above will select the latest recommended version of the
UPC++ library. One can see the installed versions with a command like module avail upcxx and optionally explicitly select a particular version with a
command of the form: module load upcxx/20XX.YY.ZZ.
On Perlmutter, the UPC++ environment modules select a default network of ofi.
You can optionally specify this explicitly on the compile line with
upcxx -network=ofi ....
The installs provided on Perlmutter utilize the Cray Programming Environment,
and the cc and CC compiler wrappers in particular. It is possible to use
upcxx (or CC and upcxx-meta) to link code compiled with the "native
compliers" such as g++ and nvc++ (provided they match the PrgEnv-*
module). However, direct use of the native compilers to link UPC++ code is not
supported with these installs.
The upcxx-run utility provided with UPC++ is a relatively simple wrapper,
which in the case of Perlmutter uses srun via an additional wrapper
upcxx-srun (see below). To have full control over process placement, thread
pinning and GPU allocation, users are advised to launch their UPC++
applications using upcxx-srun, which works like srun with the addition of
providing NIC binding. One should do so with the upcxx or upcxx-cuda
environment module loaded.
Whenever using srun in place of upcxx-run, if you would
normally have passed -shared-heap to upcxx-run, then it is particularly
important that both UPCXX_SHARED_HEAP_SIZE and GASNET_MAX_SEGSIZE be set
accordingly. The values of those and other potentially relevant environment
variables set (or inherited) by upcxx-run can be listed by adding -show to
your upcxx-run command (which will print useful information but not run
anything).
Additional information is available in the
Advanced Job Launch
chapter of the UPC++ v1.0 Programmer's Guide.
Each Perlmutter GPU node contains 64 CPU cores and 4 Slingshot-11 NICs (and 4 GPUs). Currently each UPC++ process can use at most one Slingshot NIC. In order for a job to utilize all four NICs on a Perlmutter GPU node, all of the following are necessary:
- run at least four processes per node
- ensure each process is bound to distinct CPU cores out of the 64 available
- set environment variables directing each process to use the NIC most appropriate to its core binding
The upcxx-srun launch wrapper helps to automate those three items.
The first purpose of the upcxx-srun wrapper installed on Perlmutter is to
set the GASNET_OFI_DEVICE* family of environment variables as
appropriate for the current Perlmutter partition (i.e. GPU nodes vs CPU nodes),
satisfying requirement 3 above.
The second purpose of the script is to ensure the job launch command requests
a suitable core binding, unless one has already been requested by the environment or
command line, thus satisfying requirement 2 above.
Subject to the following differences, the use of upcxx-srun
should be otherwise identical to srun:
- One must use
--ntasksor its short form-n. Cases in whichsrunwould normally compute a task count from other arguments are not supported. - One is required to place
--between the srun options and the executable name, to prevent application options from being parsed by the wrapper as if they weresrunoptions. - The
-shared-heapand-backtraceoptions toupcxx-runare accepted, but must appear before the required--.
On a system like Perlmutter, there are multiple complications related to launch
of executables compiled for -network=smp such that no use of srun (or
simple wrappers around it) can provide a satisfactory solution in general.
Therefore, we recommend that for single-node (shared memory) application runs
on Perlmutter, one should compile for the default network (ofi). It is also
acceptable to use -network=mpi, such as may be required for some hybrid
applications (UPC++ and MPI in the same executable). However, note that in
multi-node runs -network=mpi imposes a significant performance penalty.
By default, batch jobs on Perlmutter inherit both $PATH and the $MODULEPATH
from the environment at the time the job is submitted/requested using sbatch
or salloc. So, no additional steps are needed to use upcxx-run if a
upcxx environment module was loaded when sbatch or salloc ran.
perlmutter$ module load contrib
perlmutter$ module load upcxx
perlmutter$ upcxx --version
UPC++ version 2025.10.0 / gex-2025.8.0-0-ge3628f258
Citing UPC++ in publication? Please see: https://upcxx.lbl.gov/publications
Copyright (c) 2025, The Regents of the University of California,
through Lawrence Berkeley National Laboratory.
https://upcxx.lbl.gov
g++-13 (SUSE Linux) 13.2.1 20240206 [revision 67ac78caf31f7cb3202177e6428a46d829b70f23]
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
perlmutter$ upcxx -O hello-world.cpp -o hello-world.x
perlmutter$ salloc -C gpu -q interactive --nodes 2
salloc: Granted job allocation 1722947
salloc: Waiting for resource configuration
salloc: Nodes nid[002700-002701] are ready for job
nid002700$ upcxx-run -n 4 -N 2 ./hello-world.x
Hello world from process 0 out of 4 processes
Hello world from process 1 out of 4 processes
Hello world from process 2 out of 4 processes
Hello world from process 3 out of 4 processesA UPCXX CMake package is provided in the UPC++ install on Perlmutter, as
described in README.md. Thus with the upcxx environment
module loaded, CMake should "just work".
Currently, there are known issues with the vendor's communications software stack below UPC++ and GASNet-EX which may negatively impact certain communication-intensive UPC++ applications (e.g. those concurrently sending large numbers of RPCs to one or more processes).
Impacts observed have included crashes and hangs of correct UPC++ applications. Or course, either of those failure modes can be the result of other issues. If you believe your application is impacted, please follow the steps below.
- Try running your application on a system with a network other than Slingshot-11 (but not Slingshot-10 which has a similar, but distinct, issue). If the failures persist, then the problem is not the one described here. You should look for defects in your application, or for other defects in UPC++ or external software.
- If you have observed crashes, but not hangs, then try running your
application with
GASNET_OFI_RECEIVE_BUFF_SIZE=recvin the environment. This disables use of a feature linked to the known source of crashes, but may result in a small reduction in RPC performance. - If you have observed hangs, then try running your application with
all of the following environment variable settings:
GASNET_OFI_RECEIVE_BUFF_SIZE=recv
FI_OFI_RXM_RX_SIZE=8192
FI_CXI_DEFAULT_CQ_SIZE=13107200
FI_MR_CACHE_MONITOR=memhooks
FI_CXI_RX_MATCH_MODE=software
FI_CXI_REQ_BUF_MIN_POSTED=10
FI_CXI_REQ_BUF_SIZE=25165824
These settings will have negative impact on both performance and on memory use. However, in most cases they have been seen to be sufficient to eliminate the problem(s).
If none of the options above resolves crashes or hangs of your communication-intensive UPC++ application, you can seek assistance using the issue tracker.
Information about UPC++ installs on other production systems
Please report any errors or omissions in the issue tracker.