Skip to content

NERSC Perlmutter

Matt Landreman edited this page Apr 20, 2024 · 3 revisions

There are several ways to use simsopt on Perlmutter. If you plan to use simsopt in your own driver scripts without editing simsopt itself, you can either use the Shifter container (typically the easiest approach) or install a pre-compiled binary package. If however you plan to edit simsopt itself, you should install from source.

NERSC's documentation describes several approaches for using python. "Option 3" is discussed in the Shifter section below, and "Option 2" will be used in the later sections. For the later sections we will use the conda package manager to install many of the required packages.

These instructions were current as of January 28 2022.

Shifter container

Shifter is a "container" technology that allows you to use simsopt, VMEC, and SPEC at NERSC without compiling any code. Shifter was developed at NERSC to circumvent the security issues associated with Docker containers. Shifter allows to you use the simsopt Docker image files hosted on Docker Hub.

Shifter Images

Shifter converts Docker images and virtual machines into a common format. After connecting to a NERSC login node check for the simsopt shifter images:

shifterimg images | grep simsopt

You should see multiple images similar to hiddensymmetries/simsopt:v0.7.0. If the version you are interested in is not available, you can pull it by running

shifterimg -v pull docker:hiddensymmetries/simsopt:<version_no>

where <version_no> is the version of your choice, which is referred to as tag in docker parlance. Once the image is pulled, the corresponding shifter image is made available to all users at NERSC.

The master branch has the tag latest. The image shown by shifterimg images may be stale because the master branch is always changing. Always re-pull the image if you want to use master branch, but keep in mind the results may not be reproducible. For reproducible data, users are strongly encouraged to use a container with specific version number.

Python executable and environment

Simsopt is installed inside a python virtual environment within the simsopt Docker/Shifter container. On entry, the Docker container automatically activates the python virtual environment. However, the Shifter container does not run entrypoint commands unless explicitly told, so the virtual environment is not activated. The full path for the python executable installed inside the virtual environment /venv/bin/python has to be used.

Running on login nodes

One can run Shifter on login nodes for small serial jobs. To run a simsopt python driver script (located in your usual filesystem), you can type

shifter --image=docker:hiddensymmetries/simsopt:latest /venv/bin/python <script_name>

You can also run the simsopt Shifter container interactively, with

shifter --image=docker:hiddensymmetries/simsopt:latest /venv/bin/python

to enter the python interpreter, or

shifter --image=docker:hiddensymmetries/simsopt:latest /bin/bash

for a shell. In the latter scenario, even though you enter the container, the prompt may not change. To check if you are inside the simsopt Shifter container, you can run

cat /etc/lsb-release

The output should show DISTRIB_ID=Ubuntu along with some other lines.

Please do not abuse the interactive capability by running large scale jobs on login nodes.

Slurm script

The main reason for using Shifter is to run simsopt in parallel with multiple MPI processes on NERSC. Here is an example script for submitting a slurm job using the simsopt Shifter container:

#!/bin/bash
#SBATCH --qos=debug
#SBATCH --time=00:10:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=32
#SBATCH --constraint=cpu
#SBATCH --image=hiddensymmetries/simsopt:latest

srun shifter /venv/bin/python simsopt_driver

where simsopt_driver can be replaced with the name of your driver script.

Interactive session on compute nodes

To use the simsopt Shifter container in an interactive session on the compute nodes, first run

salloc --constraint=cpu -N 1 -p debug --image=hiddensymmetries/simsopt:latest -t 00:30:00

In the above command, the image option is passed to the slurm commands directly. The --constraint=cpu option means we want to run our job on cpu nodes (rather than gpu nodes) on perlmutter. The -N 1 option specifies that we want one node, -p debug indicates the debug queue, and -t 00:30:00 specifies 30 minutes of allocation time for this job. After some time, resources are allocated and you can run your jobs. If you have navigated to a clone of the simsopt repository, you can run the one of the examples as

srun -n 4 shifter  /venv/bin/python examples/1_Simple/tracing_fieldline.py

One perlmutter cpu node has 128 cores, so you can use any number up to 128 in place of 4 in the above command. You can also run the parallel unit tests by entering

srun -n 4 shifter  /venv/bin/python -m unittest discover -v -k mpi -s tests

Setting up a conda virtual environment

The remainder of this document discusses "Option 2" for using python at NERSC, based on a conda virtual environment. First, load one of the python/3.x modules, e.g. module load python. When writing up these instructions, the module python/3.8-anaconda-2020.11 was used.

Next, create a conda virtual environment using

conda create -n 20220112-01-simsoptFromConda
conda activate 20220112-01-simsoptFromConda

Here, 20220112-01-simsoptFromConda is a name we are giving to the virtual environment, and you can replace this string with another name of your choice if you like.

Conda can install packages from several "channels". We want to have conda use the default channel with highest priority, then use the conda-forge channel with lower priority. To add conda-forge with lower priority, enter

conda config --append channels conda-forge

To confirm the channels that conda will use and their order of priority, enter conda info. The result should look like this:

     active environment : 20220112-01-simsoptFromConda
    active env location : /global/homes/l/landrema/.conda/envs/20220112-01-simsoptFromConda
            shell level : 1
       user config file : /global/homes/l/landrema/.condarc
 populated config files : /global/homes/l/landrema/.condarc
          conda version : 4.9.2
    conda-build version : 3.20.5
         python version : 3.8.5.final.0
       virtual packages : __glibc=2.26=0
                          __unix=0=0
                          __archspec=1=x86_64
       base environment : /usr/common/software/python/3.8-anaconda-2020.11  (read only)
           channel URLs : https://repo.anaconda.com/pkgs/main/linux-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/linux-64
                          https://repo.anaconda.com/pkgs/r/noarch
                          https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
          package cache : /usr/common/software/python/3.8-anaconda-2020.11/pkgs
                          /global/homes/l/landrema/.conda/pkgs
       envs directories : /global/homes/l/landrema/.conda/envs
                          /usr/common/software/python/3.8-anaconda-2020.11/envs
               platform : linux-64
             user-agent : conda/4.9.2 requests/2.24.0 CPython/3.8.5 Linux/4.12.14-150.75-default sles/15 glibc/2.26
                UID:GID : 43298:43298
             netrc file : /global/homes/l/landrema/.netrc
           offline mode : False

Note that in the channel URLs section, the repo.anaconda.com lines (which represent the default channel) appear above the conda.anaconda.org/conda-forge lines, indicating that conda-forge has lower priority. If you have previously added the conda-forge channel with higher priority than default, you can remove it using conda config --remove channels conda-forge before conda config --append channels conda-forge

Installing pre-compiled simsopt binary package with conda

We can install simsopt using

conda install -c hiddensymmetries simsopt

You will be asked by conda to confirm that you want to proceed - press enter. Installation will take a minute or so.

Simsopt should now be installed. You can confirm using

python -c "import simsopt; print(simsopt.__version__, 'Success')"

If you do not need VMEC and MPI (for instance if you are doing stage-2 coil optimization), then you can stop here.

Installing simsopt from source

If you wish to edit and develop the simsopt source code, you should install simsopt from source. To do this, we first install some packages that simsopt depends on:

conda install python numpy scipy cmake ninja pybind11 jax jaxlib scikit-build matplotlib monty nptyping Deprecated randomgen ruamel.yaml sympy h5py f90nml pyevtk setuptools_scm

Press enter when you are asked if you wish to proceed; installation will take about a minute. Now, navigate to where you wish to install the simsopt repository (e.g. your home directory) and then clone the repository using

git clone https://github.com/hiddenSymmetries/simsopt.git

Enter the directory with cd simsopt. Now we compile and install the code using

pip install -e .

This performs an "editable" install, so any changes you make to the python source are immediately reflected when you import simsopt from any directory. However, if you make changes to the C++ source, you must re-run pip install -e . before those changes take effect. Note that the compiled code is put in the build directory, so if you wish to do a clean build, you can delete this directory with rm -r build before running pip install -e ..

If you do not need VMEC and MPI (for instance if you are doing stage-2 coil optimization), then you can stop here.

MPI

To use MPI with simsopt, we must build mpi4py using the system's MPI. To do this, run

env CC=cc MPICC=cc pip install --no-cache-dir mpi4py

(The --no-cache-dir option is usually unnecessary, but it ensures that a clean build is performed in case any temporary files are left from previous unsuccessful build attempts.)

Note that on Perlmutter, simsopt modules that use MPI can only be imported in python from a compute node (either via a batch script or using srun in an interactive session), not from a login node. The reason is that Perlmutter does not allow MPI to be initialized from a login node. If you try, python will exit with an error like this:

[Thu Dec 30 06:30:08 2021] [unknown] Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(537): 
MPID_Init(246).......: channel initialization failed
MPID_Init(647).......:  PMI2 init failed: 1 
Aborted

To check simsopt components that use MPI, you can start an interactive job using

salloc --nodes 1 --qos interactive --time 00:05:00 --constraint knl

and, once the interactive session begins, try something like the following:

srun python -c "import simsopt.util.mpi; print('success')"

VMEC

If you wish to use VMEC with simsopt, you must install the python-wrapped VMEC from source. (A pre-compiled VMEC module is not yet available.) To do this, we need a netcdf module loaded; either the serial or parallel version should work. We also need a cmake module. It is also necessary to unload the default module craype-hugepages2M, which is known to cause problems for python. For the instructions here, the following module commands were used (in addition to the earlier module load python):

module unload craype-hugepages2M
module load cray-netcdf-hdf5parallel cmake

which resulted in the following modules being loaded:

Currently Loaded Modulefiles:
  1) modules/3.2.11.4                                 13) xpmem/2.2.20-7.0.1.1_4.28__g0475745.ari
  2) altd/2.0                                         14) job/2.2.4-7.0.1.1_3.55__g36b56f4.ari
  3) darshan/3.2.1                                    15) dvs/2.12_2.2.167-7.0.1.1_17.11__ge473d3a2
  4) craype-network-aries                             16) alps/6.6.58-7.0.1.1_6.30__g437d88db.ari
  5) intel/19.0.3.199                                 17) rca/2.2.20-7.0.1.1_4.74__g8e3fb5b.ari
  6) craype/2.6.2                                     18) atp/2.1.3
  7) cray-libsci/19.06.1                              19) PrgEnv-intel/6.0.5
  8) udreg/2.3.2-7.0.1.1_3.61__g8175d3d.ari           20) craype-haswell
  9) ugni/6.0.14.0-7.0.1.1_7.63__ge78e5b0.ari         21) cray-mpich/7.7.10
 10) pmi/5.0.14                                       22) python/3.8-anaconda-2020.11
 11) dmapp/7.1.1-7.0.1.1_4.72__g38cf134.ari           23) cray-netcdf-hdf5parallel/4.6.3.2
 12) gni-headers/5.0.12.0-7.0.1.1_6.46__g3b1768f.ari  24) cmake/3.21.3

Next, you must install a few additional packages that the vmec module depends on:

pip install scikit-build f90wrap ninja

(It is better to install f90wrap with pip than conda since if it is installed with conda, lower performance blas/lapack libraries are used for every package in the conda environment, such as numpy.) Next, navigate to where you wish to install the VMEC2000 repository (e.g. your home directory) and then clone the repository using

git clone https://github.com/hiddenSymmetries/VMEC2000.git

Change into the VMEC2000 directory. Copy the file cmake/machines/cori.json on top of the file cmake_config_file.json, replacing it. The file cmake_config_file.json should now read

{
    "cmake_args": [
        "-DNETCDF_INC_PATH=/opt/cray/pe/netcdf-hdf5parallel/4.6.3.2/INTEL/19.0/include",
        "-DNETCDF_LIB_PATH=/opt/cray/pe/netcdf-hdf5parallel/4.6.3.2/INTEL/19.0/lib",
        "-DSCALAPACK_LIB_DIR=/opt/cray/pe/libsci/19.06.1/INTEL/16.0/x86_64/lib",
        "-DSCALAPACK_LIB_NAME=sci_intel_mpi",
        "-DCMAKE_C_COMPILER=cc",
        "-DCMAKE_CXX_COMPILER=CC",
        "-DCMAKE_Fortran_COMPILER=ftn"
    ]
}

Now you can build and install the vmec python module by running

python setup.py install

If any problems arise during compilation, it is recommended to run rm _skbuild to delete temporary files from earlier unsuccessful installation attempts.

booz_xform

If you wish to use the booz_xform module, it can be installed using

pip install --no-cache-dir booz_xform

Troubleshooting

If you get an error resembling

break adjusted to free malloc space: 0x0000010000000000 ***

this means the craype-hugepages2M is loaded, which interferes with the vmec python module. Run module unload craype-hugepages2M and try again.