Skip to content

Commit ed88cc3

Browse files
committed
trillium
1 parent 84ab821 commit ed88cc3

File tree

3 files changed

+70
-3
lines changed

3 files changed

+70
-3
lines changed

docs/assets/meta/costtool.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,11 +39,11 @@ <h3>Basic parameters</h3>
3939
value="3"
4040
list="dim-ticks"
4141
/>
42-
<div class="datalist" id="dim-ticks">
42+
<datalist id="dim-ticks">
4343
<option value="1" label="1D"></option>
4444
<option value="2" label="2D"></option>
4545
<option value="3" label="3D"></option>
46-
</div>
46+
</datalist>
4747
</div>
4848

4949
<!-- Coordinates -->

docs/content/useful/cluster-setups.md

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -676,6 +676,73 @@ This section goes over some instructions on how to compile & run the `Entity` on
676676

677677
_Last updated: 6/19/2025_
678678

679+
=== "`Trillium` (SciNet, Canada)"
680+
681+
Trillium is a large parallel cluster built by Lenovo Canada and hosted by SciNet at the University of Toronto, the GPU subcluster has 61 nodes each with 4 x Nvidia H100 SXM (80 GB memory) (HOPPER90 architecture) and 1 x AMD EPYC 9654 (Zen 4) @ 2.4 GHz, 384MB cache L3 (96 cores). `Entity` works largely out of the box on trillium with the exception of the HDF5 format and requiring GPU aware MPI to disabled.
682+
683+
**Compiling & running the code**
684+
The following modules are confirmed to have worked for building, compilation, running and restarting
685+
686+
```sh
687+
module load gcc/12.3 cmake/3.31.0 cuda/12.6 openmpi/4.1.5
688+
```
689+
690+
To disable hdf5, modify the following file in the entity source directory
691+
692+
```sh
693+
/path_to_src/entity/cmake/adios2Config.cmake
694+
```
695+
696+
changing
697+
```cmake
698+
# Format/compression support
699+
set(ADIOS2_USE_HDF5
700+
OFF # <-- set this to OFF
701+
CACHE BOOL "Use HDF5 for ADIOS2")
702+
```
703+
704+
When configuring ensure to set the flag
705+
706+
```sh
707+
-D gpu_aware_mpi=OFF
708+
```
709+
710+
as the nodes are not properly configured to perform gpu to gpu direct communication (the code will still run, but errors will arise at mesh block boundaries, and the code itself will run much slower).
711+
712+
A typical pbs script for running entity on the gpu subcluster is
713+
714+
```sh
715+
#!/bin/bash
716+
#SBATCH --nodes=2
717+
#SBATCH --gpus-per-node=4
718+
#SBATCH --ntasks-per-node=4 # Keep all GPUs active
719+
#SBATCH --time=23:59:59
720+
#SBATCH --partition=compute_full_node
721+
#SBATCH -o outjob_test.o%j
722+
#SBATCH -e outjob_test.e%j
723+
#SBATCH -J test
724+
725+
module load gcc/12.3 cmake/3.31.0 cuda/12.6 openmpi/4.1.5
726+
727+
mpirun --map-by ppr:4:node --bind-to core ./entity.xc -input fluxtube.toml
728+
```
729+
730+
where here we have requested 2x4 gpus for the full 24 hour wall time. Note one can request 1, 4, and 8 gpus for brief interactive debug jobs with
731+
732+
```
733+
$debugjob
734+
$debugjob 1
735+
$debugjob 2
736+
```
737+
738+
To pip install the version of nt2py which works with the adios2 output format you will need to load the following modules
739+
740+
```sh
741+
module load python texlive gcc arrow/21.0.0
742+
```
743+
744+
_Last updated: 9/12/2025_
745+
679746
!!! warning "Mind the dates"
680747

681748
At the bottom of each section, there are tags indicating when was the last date this instruction was updated. Some of them may be outdated due to clusters being constantly updated and changed. If so, please feel free to reach out with questions or contribute updated instructions.

sass/pages/_costtool.scss

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@
4040
}
4141
}
4242

43-
div.datalist {
43+
datalist {
4444
display: flex;
4545
justify-content: space-between;
4646
padding: 0 2px;

0 commit comments

Comments
 (0)