Skip to content

Installing and Running FDS on a Linux Cluster

Kevin McGrattan edited this page May 31, 2022 · 20 revisions

A common platform for running FDS simulations is a Linux cluster which consists of multiple computers referred to as "compute nodes" which are controlled by a single "head node." You typically login to the head node and launch jobs on the various compute nodes using a batch queuing system.

There are two ways that you can use FDS on a Linux cluster. You can either download and install the pre-compiled FDS and Smokeview binaries, or you can clone the firemodels/fds repository following these instructions. If you are just interested in running FDS, you probably want to do the former. If, however, you are interested in doing research, or working with the FDS developers, you should do the latter.

Installing the Pre-Compiled FDS and Smokeview Programs

  1. Open a terminal session.

  2. "cd" to the directory where the downloaded bundle is located, typically your home directory.

  3. Run the installer script using the bash shell:

    $ bash FDS6.7.4_SMV6.7.14_lnx.sh
    

    Note that the version number for the file that you downloaded might be different. When you execute this command, there are some options for installation that will follow.

  4. Make sure you unlimit your stack size. This is a common problem in getting jobs to run with Linux.

Starting with FDS 6.6.0, the installer no longer edits your startup files. If you have FDS-related entries from previous versions in your .basrhc or .bash_profile, remove them. Remove the no longer needed .bashrc_fds file. The installer now gives two options for updating your startup file. If you have modules installed on your system, then add the following lines to your .bashrc file:

export MODULEPATH=$HOME/FDS/FDS6/bin/modules:$MODULEPATH
module load FDS6
module load SMV6

If you do not have modules then add the following line to your .bashrc file:

source $HOME/FDS/FDS6/bin/FDS6VARS.sh
source $HOME/FDS/FDS6/bin/SMV6VARS.sh

Be sure to change the paths accordingly to match where you actually installed FDS. If something does not work properly, take a look at the FDS6VARS.sh and SMV6VARS.sh scripts to understand what path variables are being set.

Testing the Installation

To make sure that FDS has installed properly, just type

fds

at the command prompt. You should see information about the version and date of compilation. If you are working at a single computer that is running Linux, you can now use FDS as you would have on a Windows PC. The FDS User's Guide provides some more details.

It is more than likely, however, that you are working on a Linux cluster, and if you just type fds at the command line, you will only launch a single process on the head node, which is not the way you want to use the cluster, except if you just have a short run that you want to debug or if you are developing an input file. Once you are ready to start longer jobs, you need to invoke the MPI (Message Passing Interface) functionality, which is taken up in the next section.

Writing a job control script

Suppose you want to run a job that uses 4 MPI processes, with 2 processes per node. If your compute cluster uses SLURM for scheduling, the job is launched using a bash script (call it script.sh, for example) like the following:

#!/bin/bash
#SBATCH -J job_name
#SBATCH -e /home/userid/.../job_name.err
#SBATCH -o /home/userid/.../job_name.log
#SBATCH --partition=<name of queue>
#SBATCH --ntasks=4
#SBATCH --nodes=2
#SBATCH --cpus-per-task=1
#SBATCH --ntasks-per-node=2
#SBATCH --time=2:0:0
export OMP_NUM_THREADS=1
cd <pwd>
srun -N 2 -n 4 --ntasks-per-node 2 /home/userid/FDS/FDS6/bin/fds job_name.fds 

If you use PBS, a typical job script is like the following:

#!/bin/bash
#PBS -N job_name
#PBS -e /home/userid/.../job_name.err
#PBS -o /home/userid/.../job_name.log
#PBS -l nodes=2:ppn=2
#PBS -l walltime=2:0:0
export OMP_NUM_THREADS=1
cd <pwd>
mpiexec -np 4 /home/userid/FDS/FDS6/bin/fds job_name.fds

The job_name is the base name of the input file, the .err and .log files contain what is usually spilled onto the screen when you run FDS. These files are typically created when the job is done. You can assign them to any directory you want because some Linux clusters have specific work spaces that are separate from the user directories. The parameter nodes indicates the name of nodes you want to use, and ntasks-per-node or ppn is the number of processes per node. The time or walltime in this case is 2 hours. The job is typically killed after that, so choose wisely. The setting of OMP_NUM_THREADS is intended to overwrite any existing environment variable. For this example, we are not going to invoke OpenMP. The cd command changes directory to the present working directory.

If you are compiling FDS yourself, you can change the path to the executable file. The one given in the example is the one that comes with the installed version of FDS.

Running the job

Once you have written your job control script, submit the job using the command

sbatch script.sh

for SLURM, and

qsub script.sh

for PBS. Monitor your job by typing

squeue -a

for SLURM and

qstat -a

Kill your job by typing

scancel jobid

for SLURM and

qdel jobid

for PBS, where the jobid is given by the squeue or qstat command. There are many more options for these commands. Just do an Internet search and you'll see that many computing centers have listed them in detail. The ones listed here are the most important.

Special Topic: A script that writes the job control script

At NIST, we use a bash script that is located in the fds repo called Utilities/Scripts/qfds.sh. This script allows you to run jobs quickly because it automatically writes and submits the PBS job control script. We change this script often, so it is best to first get a list of options:

qfds.sh -H

Next, if you want to run a job that uses 8 MPI processes, type

qfds.sh -v -p 8 job_name.fds

The -v option will just write out the script but not submit it. You can then see what qfds.sh does, and modify the script if need be. The qfds.sh script has been customized for our use at NIST, but it should get you close to a working job control script.

Clone this wiki locally