You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/content/useful/cluster-setups.md
+67Lines changed: 67 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -676,6 +676,73 @@ This section goes over some instructions on how to compile & run the `Entity` on
676
676
677
677
_Last updated: 6/19/2025_
678
678
679
+
=== "`Trillium` (SciNet, Canada)"
680
+
681
+
Trillium is a large parallel cluster built by Lenovo Canada and hosted by SciNet at the University of Toronto, the GPU subcluster has 61 nodes each with 4 x Nvidia H100 SXM (80 GB memory) (HOPPER90 architecture) and 1 x AMD EPYC 9654 (Zen 4) @ 2.4 GHz, 384MB cache L3 (96 cores). `Entity` works largely out of the box on trillium with the exception of the HDF5 format and requiring GPU aware MPI to disabled.
682
+
683
+
**Compiling & running the code**
684
+
The following modules are confirmed to have worked for building, compilation, running and restarting
To disable hdf5, modify the following file in the entity source directory
691
+
692
+
```sh
693
+
/path_to_src/entity/cmake/adios2Config.cmake
694
+
```
695
+
696
+
changing
697
+
```cmake
698
+
# Format/compression support
699
+
set(ADIOS2_USE_HDF5
700
+
OFF # <-- set this to OFF
701
+
CACHE BOOL "Use HDF5 for ADIOS2")
702
+
```
703
+
704
+
When configuring ensure to set the flag
705
+
706
+
```sh
707
+
-D gpu_aware_mpi=OFF
708
+
```
709
+
710
+
as the nodes are not properly configured to perform gpu to gpu direct communication (the code will still run, but errors will arise at mesh block boundaries, and the code itself will run much slower).
711
+
712
+
A typical pbs script for running entity on the gpu subcluster is
713
+
714
+
```sh
715
+
#!/bin/bash
716
+
#SBATCH --nodes=2
717
+
#SBATCH --gpus-per-node=4
718
+
#SBATCH --ntasks-per-node=4 # Keep all GPUs active
where here we have requested 2x4 gpus for the full 24 hour wall time. Note one can request 1, 4, and 8 gpus for brief interactive debug jobs with
731
+
732
+
```
733
+
$debugjob
734
+
$debugjob 1
735
+
$debugjob 2
736
+
```
737
+
738
+
To pip install the version of nt2py which works with the adios2 output format you will need to load the following modules
739
+
740
+
```sh
741
+
module load python texlive gcc arrow/21.0.0
742
+
```
743
+
744
+
_Last updated: 9/12/2025_
745
+
679
746
!!! warning "Mind the dates"
680
747
681
748
At the bottom of each section, there are tags indicating when was the last date this instruction was updated. Some of them may be outdated due to clusters being constantly updated and changed. If so, please feel free to reach out with questions or contribute updated instructions.
0 commit comments