-
Notifications
You must be signed in to change notification settings - Fork 129
AMDFlang compiler for MFC #1116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
CodeAnt AI is reviewing your PR. Thanks for using CodeAnt! 🎉We're free for open-source projects. if you're enjoying it, help us grow by sharing. Share on X · |
📝 WalkthroughWalkthroughAdds Frontier AMD (famd) support: new CI matrix entries and workflow scripts, AFAR/OLCF module bootstraps and toolchain templates, CMake/toolchain adjustments for LLVMFlang/Cray, and widespread AMD-guarded conditional array-shape and control-flow changes across many Fortran sources. Changes
Sequence Diagram(s)sequenceDiagram
participant GH as GitHub Actions
participant Build as frontier_amd/build.sh
participant Submit as frontier_amd/submit.sh
participant Slurm as Slurm Scheduler
participant Node as Compute Node
participant Env as mfc.sh / toolchain env
participant ROCm as rocm-smi
participant Bench as mfc.sh bench/test
GH->>Build: start build (device/interface)
Build->>Env: mfc.sh load -c famd -m g
Build-->>GH: build artifacts
GH->>Submit: run submit (script, device)
Submit->>Slurm: sbatch (embedded script, job_slug)
Slurm->>Node: allocate node & start job
Node->>Env: source mfc.sh (AFAR/ROCm)
Node->>ROCm: rocm-smi (detect GPUs)
Node->>Bench: run mfc.sh bench/test with device_opts
Bench-->>GH: upload artifacts/logs
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
High-level Suggestion
Replace hardcoded array sizes (e.g., dimension(3), dimension(10)) with preprocessor macros derived from case-specific configurations. This will prevent potential runtime errors when simulation parameters change, making the amdflang support more robust. [High-level, importance: 9]
Solution Walkthrough:
Before:
! in src/simulation/m_riemann_solvers.fpp
subroutine s_hllc_flux(...)
...
#:if not MFC_CASE_OPTIMIZATION and USING_AMD
real(wp), dimension(3) :: alpha_L, alpha_R
real(wp), dimension(10) :: Ys_L, Ys_R
...
#:else
real(wp), dimension(num_fluids) :: alpha_L, alpha_R
real(wp), dimension(num_species) :: Ys_L, Ys_R
...
#:endif
...
end subroutine s_hllc_flux
After:
! in case.fpp or similar config header
#define NUM_FLUIDS_STATIC 3
#define NUM_SPECIES_STATIC 10
! in src/simulation/m_riemann_solvers.fpp
subroutine s_hllc_flux(...)
...
#:if not MFC_CASE_OPTIMIZATION and USING_AMD
real(wp), dimension(NUM_FLUIDS_STATIC) :: alpha_L, alpha_R
real(wp), dimension(NUM_SPECIES_STATIC) :: Ys_L, Ys_R
...
#:else
real(wp), dimension(num_fluids) :: alpha_L, alpha_R
real(wp), dimension(num_species) :: Ys_L, Ys_R
...
#:endif
...
end subroutine s_hllc_flux
Nitpicks 🔍
|
|
CodeAnt AI finished reviewing your PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
16 issues found across 44 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="src/simulation/m_bubbles_EE.fpp">
<violation number="1" location="src/simulation/m_bubbles_EE.fpp:169">
P1: Rtmp/Vtmp are truncated to three elements under AMD builds, but the loops still write up to nb entries, so nb>3 overflows these buffers.</violation>
<violation number="2" location="src/simulation/m_bubbles_EE.fpp:170">
P1: myalpha/myalpha_rho are limited to three elements under AMD builds, but the code still iterates to num_fluids, so cases with >3 fluids overflow the arrays.</violation>
</file>
<file name="src/simulation/m_ibm.fpp">
<violation number="1" location="src/simulation/m_ibm.fpp:183">
P1: Limiting `alpha_rho_IP`/`alpha_IP` to three elements causes out-of-bounds writes when a case configures more than three fluids, because the loops below still run to `num_fluids`. Use the maximum supported fluid count instead of a hard-coded `3`.</violation>
</file>
<file name="toolchain/bootstrap/modules.sh">
<violation number="1" location="toolchain/bootstrap/modules.sh:126">
P1: Add a space between `if` and `[` so the new Frontier AMD branch executes instead of causing a syntax error.</violation>
</file>
<file name=".github/workflows/frontier_amd/test.sh">
<violation number="1" location=".github/workflows/frontier_amd/test.sh:4">
P2: `wc -c` counts characters, so nodes with multi-digit GPU IDs produce an inflated `ngpus`, causing `mfc.sh test` to request too many parallel jobs. Count IDs by words instead of characters.</violation>
</file>
<file name="src/simulation/m_acoustic_src.fpp">
<violation number="1" location="src/simulation/m_acoustic_src.fpp:147">
P1: The AMD path sizes `myalpha` arrays to length 3, but the code still writes up to `num_fluids` elements (up to 10), leading to out‑of‑bounds memory corruption for cases with more than three fluids on AMD builds.</violation>
</file>
<file name="src/simulation/m_pressure_relaxation.fpp">
<violation number="1" location="src/simulation/m_pressure_relaxation.fpp:150">
P1: The AMD branch fixes `pres_K_init`/`rho_K_s` to length 3 but the subsequent logic still iterates up to `num_fluids`, so any case with more than three fluids will index past the allocated storage. Use a compile-time constant such as `num_fluids_max` to keep the arrays static without truncating their capacity.</violation>
<violation number="2" location="src/simulation/m_pressure_relaxation.fpp:221">
P1: Limiting `alpha_rho`/`alpha` to length 2 under AMD allows only two fluids, yet the routine still loops to `num_fluids` and contains branches for `num_fluids > 2`. This causes out-of-bounds writes whenever more than two fluids are present. Allocate these arrays to `num_fluids_max` (or another compile-time upper bound) instead of 2.</violation>
</file>
<file name="src/simulation/m_hyperelastic.fpp">
<violation number="1" location="src/simulation/m_hyperelastic.fpp:102">
P1: The AMD-specific tensors only have 9 elements but the routine still writes the determinant at `tensorb(tensor_size)` (`tensor_size = num_dims**2 + 1`), so AMD builds write past the end of `tensorb`. Allocate enough elements for the determinant as well.</violation>
</file>
<file name=".github/workflows/frontier_amd/submit-bench.sh">
<violation number="1" location=".github/workflows/frontier_amd/submit-bench.sh:49">
P2: CPU submissions are ignored—the helper always loads the GPU environment, so CPU bench jobs will run with the wrong modules.</violation>
</file>
<file name="src/common/m_boundary_common.fpp">
<violation number="1" location="src/common/m_boundary_common.fpp:1082">
P1: The y-direction Jacobian buffer updates are now compiled only when `num_dims > 2`, so 2‑D case-optimized builds skip all y-direction boundary handling in `s_populate_F_igr_buffers`. Guard this block with `num_dims > 1` so it still runs for 2‑D cases.</violation>
</file>
<file name="src/simulation/m_cbc.fpp">
<violation number="1" location="src/simulation/m_cbc.fpp:659">
P1: AMD-only CBC arrays for fluid properties are limited to length 3, but the code still iterates up to `num_fluids`, so cases with more than three fluids will overflow these buffers and produce incorrect fluxes.</violation>
</file>
<file name="src/common/m_variables_conversion.fpp">
<violation number="1" location="src/common/m_variables_conversion.fpp:1191">
P1: The AMD-specific version of `s_convert_primitive_to_flux_variables` truncates `alpha_rho_K/alpha_K` to length 3 while the routine still fills them for every fluid (`contxe = num_fluids`), causing buffer overruns on multi-fluid cases.</violation>
<violation number="2" location="src/common/m_variables_conversion.fpp:1338">
P1: `s_compute_species_fraction` now limits its output arrays to length 3 while still iterating over all `num_fluids`, so AMD builds will corrupt memory as soon as a case uses more than three fluids.</violation>
</file>
<file name="src/simulation/m_qbmm.fpp">
<violation number="1" location="src/simulation/m_qbmm.fpp:728">
P1: Hardcoding the quadrature arrays to fixed size `(4,3)` while loops still iterate up to `nb` causes out-of-bounds writes for `nb > 3`, leading to memory corruption in non-case-optimized builds.</violation>
</file>
<file name="src/simulation/m_data_output.fpp">
<violation number="1" location="src/simulation/m_data_output.fpp:272">
P1: The AMD-specific alpha array is limited to 3 elements, so runs with >3 fluids overwrite the stack when s_compute_enthalpy fills alpha(1:num_fluids). Use num_fluids_max (or another upper bound ≥ num_fluids) for the AMD static allocation.</violation>
</file>
Since this is your first cubic review, here's how it works:
- cubic automatically reviews your code and comments on bugs and improvements
- Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
- Ask questions if you need clarification on any suggestion
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| #!/bin/bash | ||
|
|
||
| gpus=`rocm-smi --showid | awk '{print $1}' | grep -Eo '[0-9]+' | uniq | tr '\n' ' '` | ||
| ngpus=`echo "$gpus" | tr -d '[:space:]' | wc -c` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P2: wc -c counts characters, so nodes with multi-digit GPU IDs produce an inflated ngpus, causing mfc.sh test to request too many parallel jobs. Count IDs by words instead of characters.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At .github/workflows/frontier_amd/test.sh, line 4:
<comment>`wc -c` counts characters, so nodes with multi-digit GPU IDs produce an inflated `ngpus`, causing `mfc.sh test` to request too many parallel jobs. Count IDs by words instead of characters.</comment>
<file context>
@@ -0,0 +1,20 @@
+#!/bin/bash
+
+gpus=`rocm-smi --showid | awk '{print $1}' | grep -Eo '[0-9]+' | uniq | tr '\n' ' '`
+ngpus=`echo "$gpus" | tr -d '[:space:]' | wc -c`
+
+device_opts=""
</file context>
| ngpus=`echo "$gpus" | tr -d '[:space:]' | wc -c` | |
| ngpus=`echo "$gpus" | wc -w` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (8)
src/simulation/m_start_up.fpp (1)
1376-1387:if (.not. igr .or. dummy)should be grouped to avoid double init.
With the current condition,dummy = .true.causes both branches to run, undermining the guard and potentially double-initializing modules. Use.not. (igr .or. dummy)(or.not. igr .and. .not. dummy) instead.🔧 Suggested fix
- if (.not. igr .or. dummy) then + if (.not. (igr .or. dummy)) then if (recon_type == WENO_TYPE) then call s_initialize_weno_module() elseif (recon_type == MUSCL_TYPE) then call s_initialize_muscl_module() end if call s_initialize_cbc_module() call s_initialize_riemann_solvers_module() end ifsrc/simulation/m_time_steppers.fpp (1)
712-718: Preventdummyfrom triggering conversion whenq_prim_vfisn’t allocated.
q_prim_vfis only allocated when.not. igr. With.not. igr .or. dummy, a truedummywould call the conversion and write into unallocated storage. You can keep the workaround but routedummyinto a no-op branch.🔧 Suggested fix
- if (.not. igr .or. dummy) then - call s_convert_conservative_to_primitive_variables( & - q_cons_ts(1)%vf, & - q_T_sf, & - q_prim_vf, & - idwint) - end if + if (.not. igr) then + call s_convert_conservative_to_primitive_variables( & + q_cons_ts(1)%vf, & + q_T_sf, & + q_prim_vf, & + idwint) + else if (dummy) then + continue + end ifsrc/simulation/m_bubbles_EL.fpp (1)
785-798: Fix uninitialized velocity components in AMD code path.In the AMD non-optimized path,
velis dimensioned(3) but onlyvel(1:num_dims)are assigned. For 1D/2D simulations (num_dims < 3), the remaining components contain garbage thatdot_product(vel, vel)includes, corrupting the speed of sound calculation. Initializevel = 0._wpor slice thedot_productcalls tovel(1:num_dims).Suggested fix
#:if not MFC_CASE_OPTIMIZATION and USING_AMD real(wp), dimension(3) :: vel #:else real(wp), dimension(num_dims) :: vel #:endif +vel = 0._wp ... -E = gamma*pinf + pi_inf + 0.5_wp*rhol*dot_product(vel, vel) +E = gamma*pinf + pi_inf + 0.5_wp*rhol*dot_product(vel(1:num_dims), vel(1:num_dims)) ... -cson = sqrt((H - 0.5_wp*dot_product(vel, vel))/gamma) +cson = sqrt((H - 0.5_wp*dot_product(vel(1:num_dims), vel(1:num_dims)))/gamma)src/simulation/m_viscous.fpp (1)
1023-1063: dummy in reconstruction can execute multiple schemes.With
recon_type == TYPE .or. dummyinside the SCHEME loops, a truedummywould run both WENO and MUSCL branches and overwriterecon_dir/bounds. If dummy is meant to stay false, consider hardening it (e.g., parameter constant or guard) to prevent accidental activation.Also applies to: 1065-1108, 1168-1211
src/simulation/m_rhs.fpp (1)
342-590: dummy-gated branches can run mutually exclusive paths in the same step.With
.or. dummyadded across allocation and execution branches, a truedummywould execute both IGR and FV paths and invoke viscous routines even when their buffers are not allocated (e.g.,tau_Re_vf), and allocations are still keyed only to.not. igr/viscous. Please ensuredummyis hard-wired false at runtime or refactor to keep these branches mutually exclusive and mirror allocations/deallocations if dummy can ever be true.Also applies to: 662-718, 731-744, 754-988, 1059-1071, 1636-1668, 1745-1758, 1879-1917, 1934-1995
src/simulation/m_surface_tension.fpp (1)
331-394: dummy in reconstruction can override selected scheme.If
dummyever becomes true, both scheme branches run andrecon_dir/bounds end up using the last scheme’s polynomial width. Consider ensuring dummy is always false or guarding to run only the configured scheme.src/simulation/m_data_output.fpp (1)
329-333:dummyis defined and defaults to false; however, it appears to be placeholder code.
dummyis declared as a module variable inm_global_parameters.fppand is initialized to.false.by default. It is never set totrueanywhere in the codebase, making the conditionif (viscous .or. dummy)functionally equivalent toif (viscous). The addition of.or. dummyappears unnecessary unless this variable is intended for future use. Consider simplifying toif (viscous)for clarity, or document the purpose ofdummyif it serves as a placeholder for future functionality.src/simulation/m_cbc.fpp (1)
712-747: Add a comment explaining thedummyvariable workaround for future maintainers.The
dummyvariable is properly defined inm_global_parametersand initialized to.false.. It serves as a workaround for an AMD compiler case-optimization bug where conditionals with parameters followed by GPU kernels can be incorrectly optimized away. Adding a brief comment at lines 712 and 747 (or once at the top of the affected block) explaining this temporary workaround would help maintainers understand its purpose and facilitate removal when the compiler bug is resolved.For example, consider adding above the first occurrence:
! Workaround for AMD compiler case-optimization bug: dummy ensures ! conditional branches are evaluated at runtime, preventing incorrect ! optimization of GPU kernel launch conditions.
🤖 Fix all issues with AI agents
In `@src/simulation/m_compute_cbc.fpp`:
- Around line 38-47: The AMD-specific branches declare mf and dalpha_rho_ds with
a fixed dimension(3) which can overflow when num_fluids > 3; update the
declarations guarded by the condition "not MFC_CASE_OPTIMIZATION and USING_AMD"
to use dimension(num_fluids) (matching the non-AMD branch) or else add a
compile-time check enforcing num_fluids ≤ 3; change all occurrences in this file
where mf and dalpha_rho_ds are declared in the AMD path (symbols: mf,
dalpha_rho_ds, L, and the preprocessor flags MFC_CASE_OPTIMIZATION and
USING_AMD) so loops that iterate 1..num_fluids no longer index out of bounds.
In `@src/simulation/m_hyperelastic.fpp`:
- Around line 101-105: The AMD branch declares tensora and tensorb as
dimension(9) but the kernel indexes up to tensor_size (num_dims**2 + 1), causing
out-of-bounds writes/reads; fix by making the AMD arrays match tensor_size
(declare tensora and tensorb dimension(tensor_size)) or alternatively keep
tensora/tensorb as dimension(num_dims**2) and move the determinant storage into
a separate scalar (e.g., det) used where tensorb(tensor_size) is written/read;
update all references that currently use tensorb(tensor_size) and any loops over
tensor indices (the code around tensora/tensorb usage and determinant writes at
lines referenced) to use the adjusted array size or the new scalar.
In `@src/simulation/m_riemann_solvers.fpp`:
- Around line 3765-3778: The GPU parallel region macro
GPU_PARALLEL_LOOP(collapse=3, private='[...]', copyin='[norm_dir]') is missing
the inner loop index i in its private list causing potential data races for the
inner do i loops that assign alpha_rho_L/R and vel%L; update the private list
passed to GPU_PARALLEL_LOOP to include the loop variable i so that i is private
to each thread (ensure the same change is applied wherever GPU_PARALLEL_LOOP is
used around the do j/k/l blocks that contain the do i loops).
In `@toolchain/bootstrap/modules.sh`:
- Line 126: The if test is malformed: fix the conditional that reads if[ "$u_c"
'==' 'famd' ]; then by adding required spaces around [ and ] and using a
POSIX-compatible string operator, e.g. change the test to use [ "$u_c" = "famd"
] with proper spacing and quoting of the variable; update the conditional that
references variable u_c accordingly so the shell no longer throws a syntax
error.
🟠 Major comments (23)
toolchain/modules-48-51 (1)
48-51: Addfamd-gpuGPU variant entry.The
toolchain/modulesfile format (documented at lines 6-7) explicitly requires[slug]-gpuentries for GPU-capable systems. All GPU systems in the file follow this pattern:b-gpu,a-gpu,w-gpu,e-gpu,p-gpu,f-gpu, andtuo-gpu. Since OLCF Frontier has GPU acceleration,famd(Frontier AMD) should include afamd-gpuentry with GPU-specific modules and settings alongside the existingfamd-allentries.src/simulation/m_ibm.fpp-184-196 (1)
184-196: Same out-of-bounds risk with hardcoded array sizes.This follows the same pattern as
m_bubbles_EE.fpp. WhenUSING_AMDis true andMFC_CASE_OPTIMIZATIONis false, arrays are sized to fixed values (3, 18, 12). Loops iterate overnum_fluids,nb,nb*nmom, andnb*nnode, which may exceed these hardcoded sizes at runtime.Ensure that AMD builds without case optimization enforce:
num_fluids ≤ 3nb ≤ 3nb*nmom ≤ 18nb*nnode ≤ 12toolchain/bootstrap/modules.sh-136-139 (1)
136-139: Fix undefined variables in AMD (famd) configuration block.Lines 136-139 reference variables that are not defined within the famd block:
CRAY_MPICH_PREFIX(line 136): Used but never set, resulting in "-L/lib" instead of proper pathCRAY_PMI_POST_LINK_OPTS(line 137): Used but never setCRAY_LD_LIBRARY_PATH(line 139): Explicitly excluded from the earlier Cray path setup (line 121 condition skips famd), but then referenced hereThese variables should either be defined within the famd block using
${OLCF_AFAR_ROOT}values, loaded from modules, or the references should be removed if unnecessary for AMD compilation.src/simulation/m_bubbles_EE.fpp-168-174 (1)
168-174: Add validation to prevent out-of-bounds access in AMD non-optimized GPU builds.The hardcoded size of 3 for
Rtmp,Vtmp,myalpha, andmyalpha_rhois a known workaround for AMD GPU compiler limitations with automatic arrays in device code. However, whenUSING_AMDandnot MFC_CASE_OPTIMIZATIONare both true, there is no runtime validation preventingnum_fluids > 3ornb > 3from being set by the user.Loops at lines 215, 223, 234, 242 (iterating
1..nb) and line 245 (iterating1..num_fluids) will access array indices beyond the hardcoded size of 3, causing silent out-of-bounds access if constraints are violated.Add a validation check (e.g., in
s_initialize_gpu_varsor similar setup routine) that either:
- Enforces
num_fluids ≤ 3andnb ≤ 3when building for AMD without case optimization, or- Raises an error if these limits are exceeded in such configurations.
Alternatively, document the constraint prominently in the configuration or error messages.
src/simulation/m_sim_helpers.fpp-100-115 (1)
100-115: Fixed-size alpha buffers risk OOB when num_fluids > 3.
In the AMD branch,alpha,alpha_rho, andGsare sized to 3 while the routine and callees iteratenum_fluids. Ifnum_fluidscan exceed 3, this will overrun; please enforce the cap in host-side input checks (e.g.,@:ASSERT(num_fluids <= 3, ...)) or use a compile-time max constant for these buffers.As per coding guidelines: Use fypp macro
@:ASSERT(predicate, message)for validating conditions:@:ASSERT(predicate, message).src/simulation/m_hyperelastic.fpp-107-111 (1)
107-111: Guard fixed-size alpha arrays against num_fluids > 3.
alpha_k/alpha_rho_kare fixed at 3 in the AMD path but downstream routines iteratenum_fluids, which will overrun ifnum_fluids > 3. Add a host-side guard (or widen to a compile-time max) before launching GPU work.As per coding guidelines: Use fypp macro `@:ASSERT(predicate, message)` for validating conditions: `@:ASSERT(predicate, message)`.🛡️ Example guard
#:if not MFC_CASE_OPTIMIZATION and USING_AMD real(wp), dimension(3) :: alpha_k, alpha_rho_k #:else real(wp), dimension(num_fluids) :: alpha_k, alpha_rho_k #:endif + #:if not MFC_CASE_OPTIMIZATION and USING_AMD + @:ASSERT(num_fluids <= 3, 'USING_AMD path requires num_fluids <= 3') + #:endifsrc/simulation/m_acoustic_src.fpp-146-150 (1)
146-150: Fixed-size myalpha arrays can overflow when num_fluids > 3.
myalpha/myalpha_rhoare sized to 3 under AMD, but the GPU loop iteratesnum_fluids. Add a host-side guard (or widen to a compile-time max) before the kernel runs.As per coding guidelines: Use fypp macro `@:ASSERT(predicate, message)` for validating conditions: `@:ASSERT(predicate, message)`.🛡️ Example guard
#:if not MFC_CASE_OPTIMIZATION and USING_AMD real(wp), dimension(3) :: myalpha, myalpha_rho #:else real(wp), dimension(num_fluids) :: myalpha, myalpha_rho #:endif + #:if not MFC_CASE_OPTIMIZATION and USING_AMD + @:ASSERT(num_fluids <= 3, 'USING_AMD path requires num_fluids <= 3') + #:endifsrc/simulation/m_pressure_relaxation.fpp-220-224 (1)
220-224: alpha/alpha_rho size=2 needs an explicit num_fluids constraint.
The loops iterate1..num_fluids; ifnum_fluidsexceeds 2, these arrays will overrun. If the AMD workaround is intentionally limited to 2 fluids, enforce that in a host-side check; otherwise size to a shared max constant.As per coding guidelines: Use fypp macro `@:ASSERT(predicate, message)` for validating conditions: `@:ASSERT(predicate, message)`.🛡️ Example host-side guard (e.g., in s_initialize_pressure_relaxation_module)
+#:if not MFC_CASE_OPTIMIZATION and USING_AMD + @:ASSERT(num_fluids <= 2, 'Pressure relaxation AMD path requires num_fluids <= 2') +#:endifsrc/simulation/m_bubbles_EL.fpp-580-582 (1)
580-582: AddReto the GPU private list.
Reis written inside the loop (vias_convert_species_to_mixture_variables_acc) and should be private to avoid cross-thread races.Based on learnings: Declare loop-local variables with `private='[...]'` in GPU parallel loop macros.🔧 Suggested fix
- $:GPU_PARALLEL_LOOP(private='[k,i,myalpha_rho,myalpha,myVapFlux,preterm1, term2, paux, pint, Romega, term1_fac,myR_m, mygamma_m, myPb, myMass_n, myMass_v,myR, myV, myBeta_c, myBeta_t, myR0, myPbdot, myMvdot,myPinf, aux1, aux2, myCson, myRho,gamma,pi_inf,qv,dmalf, dmntait, dmBtait, dm_bub_adv_src, dm_divu,adap_dt_stop]', & + $:GPU_PARALLEL_LOOP(private='[k,i,myalpha_rho,myalpha,Re,myVapFlux,preterm1, term2, paux, pint, Romega, term1_fac,myR_m, mygamma_m, myPb, myMass_n, myMass_v,myR, myV, myBeta_c, myBeta_t, myR0, myPbdot, myMvdot,myPinf, aux1, aux2, myCson, myRho,gamma,pi_inf,qv,dmalf, dmntait, dmBtait, dm_bub_adv_src, dm_divu,adap_dt_stop]', &src/common/m_boundary_common.fpp-1561-1617 (1)
1561-1617: Y-direction F_igr buffers are skipped in 2D under CASE_OPTIMIZATION.This block is guarded by
num_dims > 2, so in 2D withMFC_CASE_OPTIMIZATIONenabled it compiles out the y-boundary population. That leavesjac_sfy-boundaries uninitialized. The guard should benum_dims > 1.🔧 Suggested fix
-#:if not MFC_CASE_OPTIMIZATION or num_dims > 2 +#:if not MFC_CASE_OPTIMIZATION or num_dims > 1CMakeLists.txt-463-471 (1)
463-471: Split CRAY_MPICH_LIB flags before linking with LLVMFlang.The environment variable
$ENV{CRAY_MPICH_LIB}contains multiple space-separated flags (e.g.,-L<path> -lmpich -lmpi), but passing it directly totarget_link_librariestreats it as a single argument rather than expanding the flags. Useseparate_arguments()to split the string into a proper list, then pass viatarget_link_options()for correct linker flag handling.🔧 Suggested fix
if (MFC_MPI AND ARGS_MPI) find_package(MPI COMPONENTS Fortran REQUIRED) target_compile_definitions(${a_target} PRIVATE MFC_MPI) if(CMAKE_Fortran_COMPILER_ID STREQUAL "LLVMFlang") target_compile_options(${a_target} PRIVATE "$ENV{CRAY_MPICH_INC}") - target_link_libraries(${a_target} PRIVATE $ENV{CRAY_MPICH_LIB}) + set(CRAY_MPICH_LIB_ENV "$ENV{CRAY_MPICH_LIB}") + separate_arguments(CRAY_MPICH_LIB_ENV NATIVE_COMMAND "${CRAY_MPICH_LIB_ENV}") + target_link_options(${a_target} PRIVATE ${CRAY_MPICH_LIB_ENV}) else() target_link_libraries(${a_target} PRIVATE MPI::MPI_Fortran) endif() endif()CMakeLists.txt-489-494 (1)
489-494: Usetarget_link_options()withseparate_arguments()to properly handleCRAY_HIPFORT_LIBas raw linker flags.
CRAY_HIPFORT_LIBcontains space-separated linker flags (-Land-l), not a CMake library target. Passing it directly totarget_link_libraries()is incorrect; CMake will treat it as a single argument and may place it in the wrong position on the link line. Useseparate_arguments()to split the environment variable andtarget_link_options()for raw linker flags.🔧 Suggested fix
elseif(CMAKE_Fortran_COMPILER_ID STREQUAL "LLVMFlang") - target_link_libraries(${a_target} PRIVATE $ENV{CRAY_HIPFORT_LIB}) + set(CRAY_HIPFORT_LIB_ENV "$ENV{CRAY_HIPFORT_LIB}") + separate_arguments(CRAY_HIPFORT_LIB_ENV NATIVE_COMMAND "${CRAY_HIPFORT_LIB_ENV}") + target_link_options(${a_target} PRIVATE ${CRAY_HIPFORT_LIB_ENV}) else() find_package(hipfort COMPONENTS hipfft CONFIG REQUIRED) target_link_libraries(${a_target} PRIVATE hipfort::hipfft) endif()src/simulation/m_time_steppers.fpp-691-698 (1)
691-698: Guard the AMD fixed-sizevel/alphaarrays against size mismatches.The AMD path hard-codes
velto length 3 andalphato length 3. Whilenum_velsis mathematically bounded by spatial dimensions (≤3),num_fluidsis not. Ifnum_fluidsexceeds 3, downstream routines see a shape mismatch that could silently drop or corrupt data. Add an assertion to enforce the constraint:🔧 Suggested fix
#:if not MFC_CASE_OPTIMIZATION and USING_AMD real(wp), dimension(3) :: vel !< Cell-avg. velocity real(wp), dimension(3) :: alpha !< Cell-avg. volume fraction + @:ASSERT(num_vels <= 3 .and. num_fluids <= 3, "AMD fixed-size vel/alpha require num_vels,num_fluids <= 3") #:else real(wp), dimension(num_vels) :: vel !< Cell-avg. velocity real(wp), dimension(num_fluids) :: alpha !< Cell-avg. volume fraction #:endifPer coding guidelines: Use fypp macro
@:ASSERT(predicate, message)for validating conditions.src/simulation/m_bubbles_EL.fpp-540-544 (1)
540-544: Add guard to prevent AMD fixed-size alpha arrays with num_fluids > 3.In the AMD code path,
myalpha_rho/myalphaare statically dimensioned to 3, but downstream routines likes_convert_species_to_mixture_variables_acciterate overnum_fluids. Ifnum_fluidsexceeds 3 at compile-time withUSING_AMDenabled, array access becomes out-of-bounds. Guard against this configuration:🔧 Suggested guard
#:if not MFC_CASE_OPTIMIZATION and USING_AMD real(wp), dimension(3) :: myalpha_rho, myalpha + @:ASSERT(num_fluids <= 3, "AMD fixed-size alpha arrays require num_fluids <= 3") #:else real(wp), dimension(num_fluids) :: myalpha_rho, myalpha #:endifsrc/common/m_chemistry.fpp-22-27 (1)
22-27: AMD fixed-size chemistry arrays can overflow for larger mechanisms.On USING_AMD, the reaction and diffusion kernels allocate 10-element arrays and use
molecular_weights_nonparameter, but the loops still spanchemxb:chemxe. Ifnum_speciesexceeds 10, this will index past bounds. Please enforcenum_species <= 10for AMD builds or generate a non-parameter weights array sized tonum_speciesso the kernels remain safe across mechanisms.Also applies to: 129-167, 177-420
src/simulation/m_viscous.fpp-75-81 (1)
75-81: Guard AMD fixed-size viscous arrays against num_fluids > 3.On the AMD/non-optimization path,
alpha_viscandalpha_rho_viscare size 3 but the loops still iterate1:num_fluids. Ifnum_fluidscan exceed 3, this will index out of bounds. Please enforce the limit or size the arrays tonum_fluids.🛡️ Suggested guard
integer :: i, j, k, l, q !< Generic loop iterator +#:if not MFC_CASE_OPTIMIZATION and USING_AMD + @:ASSERT(num_fluids <= 3, "AMD viscous path supports up to 3 fluids") +#:endifAs per coding guidelines "Use the fypp ASSERT macro for validating conditions: @:ASSERT(predicate, message)".
src/simulation/m_viscous.fpp-102-210 (1)
102-210: Initialize Re_visc for bubbles_euler paths.
Re_viscis computed only in the non-bubbles branch, but it is used immediately afterward in both shear and bulk stress updates. Whenbubbles_euleris true, this leavesRe_viscundefined and can corrupt stresses. Consider computingRe_viscoutside the bubbles/non-bubbles split or initializing it in the bubbles path.Also applies to: 212-315
src/simulation/m_riemann_solvers.fpp-1321-1344 (1)
1321-1344: Avoid uninitialized MHD-relativity states underMFC_CASE_OPTIMIZATION.With
MFC_CASE_OPTIMIZATIONenabled andnum_dims <= 2, the preprocessor strips the entire relativistic MHD block, but later code still consumespres_mag,H_*, andE_*. The energy flux assignment is also gated away for 1D, leavingflux_rs...E_idxunset. Please ensure these variables are always assigned (or explicitly mark the combo unsupported).💡 Example of keeping 2D-safe initialization while guarding 3D-only terms
-#:if not MFC_CASE_OPTIMIZATION or num_dims > 2 - Ga%L = 1._wp/sqrt(1._wp - vel_L_rms) - Ga%R = 1._wp/sqrt(1._wp - vel_R_rms) - vdotB%L = vel_L(1)*B%L(1) + vel_L(2)*B%L(2) + vel_L(3)*B%L(3) - vdotB%R = vel_R(1)*B%R(1) + vel_R(2)*B%R(2) + vel_R(3)*B%R(3) - b4%L(1:3) = B%L(1:3)/Ga%L + Ga%L*vel_L(1:3)*vdotB%L - b4%R(1:3) = B%R(1:3)/Ga%R + Ga%R*vel_R(1:3)*vdotB%R - B2%L = B%L(1)**2._wp + B%L(2)**2._wp + B%L(3)**2._wp - B2%R = B%R(1)**2._wp + B%R(2)**2._wp + B%R(3)**2._wp - pres_mag%L = 0.5_wp*(B2%L/Ga%L**2._wp + vdotB%L**2._wp) - pres_mag%R = 0.5_wp*(B2%R/Ga%R**2._wp + vdotB%R**2._wp) - H_L = 1._wp + (gamma_L + 1)*pres_L/rho_L - H_R = 1._wp + (gamma_R + 1)*pres_R/rho_R - cm%L(1:3) = (rho_L*H_L*Ga%L**2 + B2%L)*vel_L(1:3) - vdotB%L*B%L(1:3) - cm%R(1:3) = (rho_R*H_R*Ga%R**2 + B2%R)*vel_R(1:3) - vdotB%R*B%R(1:3) - E_L = rho_L*H_L*Ga%L**2 - pres_L + 0.5_wp*(B2%L + vel_L_rms*B2%L - vdotB%L**2._wp) - rho_L*Ga%L - E_R = rho_R*H_R*Ga%R**2 - pres_R + 0.5_wp*(B2%R + vel_R_rms*B2%R - vdotB%R**2._wp) - rho_R*Ga%R -#:endif + Ga%L = 1._wp/sqrt(1._wp - vel_L_rms) + Ga%R = 1._wp/sqrt(1._wp - vel_R_rms) + vdotB%L = vel_L(1)*B%L(1) + vel_L(2)*B%L(2) + vdotB%R = vel_R(1)*B%R(1) + vel_R(2)*B%R(2) + B2%L = B%L(1)**2._wp + B%L(2)**2._wp + B2%R = B%R(1)**2._wp + B%R(2)**2._wp +#:if not MFC_CASE_OPTIMIZATION or num_dims > 2 + vdotB%L = vdotB%L + vel_L(3)*B%L(3) + vdotB%R = vdotB%R + vel_R(3)*B%R(3) + B2%L = B2%L + B%L(3)**2._wp + B2%R = B2%R + B%R(3)**2._wp +#:endif + pres_mag%L = 0.5_wp*(B2%L/Ga%L**2._wp + vdotB%L**2._wp) + pres_mag%R = 0.5_wp*(B2%R/Ga%R**2._wp + vdotB%R**2._wp) + H_L = 1._wp + (gamma_L + 1)*pres_L/rho_L + H_R = 1._wp + (gamma_R + 1)*pres_R/rho_R + cm%L(1:3) = (rho_L*H_L*Ga%L**2 + B2%L)*vel_L(1:3) - vdotB%L*B%L(1:3) + cm%R(1:3) = (rho_R*H_R*Ga%R**2 + B2%R)*vel_R(1:3) - vdotB%R*B%R(1:3) + E_L = rho_L*H_L*Ga%L**2 - pres_L + 0.5_wp*(B2%L + vel_L_rms*B2%L - vdotB%L**2._wp) - rho_L*Ga%L + E_R = rho_R*H_R*Ga%R**2 - pres_R + 0.5_wp*(B2%R + vel_R_rms*B2%R - vdotB%R**2._wp) - rho_R*Ga%RAlso applies to: 1531-1537
src/simulation/m_riemann_solvers.fpp-294-308 (1)
294-308: Guard AMD fixed-size arrays and slice full-array operations.Hard-coded sizes (3/10) are now used for fluid/species/bubble/xi-field arrays. If runtime
num_fluids/num_species/nb/num_dimsexceed these, you’ll hit OOB; when they’re smaller, full-array ops (Xs_L(:),sum(...)) include uninitialized tails. Please add explicit bounds assertions and restrict ops to1:num_*(or zero-initialize the tails).✅ Example fix for species arrays (apply similarly in other AMD branches)
#:if USING_AMD + @:ASSERT(num_species <= 10, 'USING_AMD supports up to 10 species') + Ys_L = 0._wp; Ys_R = 0._wp #:endif ... #:if USING_AMD - Xs_L(:) = Ys_L(:)*MW_L/molecular_weights_nonparameter(:) - Xs_R(:) = Ys_R(:)*MW_R/molecular_weights_nonparameter(:) + Xs_L = 0._wp; Xs_R = 0._wp + Xs_L(1:num_species) = Ys_L(1:num_species)*MW_L/molecular_weights_nonparameter(1:num_species) + Xs_R(1:num_species) = Ys_R(1:num_species)*MW_R/molecular_weights_nonparameter(1:num_species) #:else Xs_L(:) = Ys_L(:)*MW_L/molecular_weights(:) Xs_R(:) = Ys_R(:)*MW_R/molecular_weights(:) #:endif ... - gamma_L = sum(Xs_L(:)/(Gamma_iL(:) - 1.0_wp)) - gamma_R = sum(Xs_R(:)/(Gamma_iR(:) - 1.0_wp)) + gamma_L = sum(Xs_L(1:num_species)/(Gamma_iL(1:num_species) - 1.0_wp)) + gamma_R = sum(Xs_R(1:num_species)/(Gamma_iR(1:num_species) - 1.0_wp))As per coding guidelines: Use the fypp ASSERT macro for validating conditions:
@:ASSERT(predicate, message).Also applies to: 496-502, 1071-1087, 1277-1283, 1975-1995, 2020-2030, 2040-2044, 3265-3271, 3722-3727
src/common/m_phase_change.fpp-94-98 (1)
94-98: Add runtime guard to prevent array overflow when num_fluids > 3 in AMD path.The AMD conditional branch (lines 94–98, 302–341, 421–433) sizes arrays
p_infOV,p_infpT,p_infSL,sk,hk,gk,ek,rhok, andp_infpTgto 3 elements, but loops iterate from 1 tonum_fluids. Whennum_fluids > 3, these accesses overflow the arrays. The partial safeguard at lines 328–333 only initializes excess elements whennum_fluids < 3; it does not prevent overflow whennum_fluids > 3.Add
@:ASSERT(num_fluids <= 3, "USING_AMD path supports up to 3 fluids")at the start ofs_infinite_relaxation_kto enforce this constraint.Suggested location for guard
subroutine s_infinite_relaxation_k(q_cons_vf) + #:if not MFC_CASE_OPTIMIZATION and USING_AMD + @:ASSERT(num_fluids <= 3, "USING_AMD path supports up to 3 fluids") + #:endif type(scalar_field), dimension(sys_size), intent(inout) :: q_cons_vfsrc/simulation/m_qbmm.fpp-726-737 (1)
726-737: Add runtime assertions to guard fixed-size AMD buffers against out-of-bounds writes.The AMD code path uses fixed-size arrays (moms=6, nnode=4, nb=3, nterms=32) while loops iterate up to
nmom,nnode,nb, andntermswithout bounds checking. If these parameters exceed the fixed sizes—which is possible since they are runtime-determined—writes will overflow. For example,do i=1,nbaccesseswght(:,i)with dimension(4,3), andf_quad2Dloopsdo i=1,nnodebut receives dimension(4) arrays in the AMD path.Add Fypp ASSERT macros immediately after the array declarations to enforce constraints when USING_AMD is true:
Suggested fix
#:if not MFC_CASE_OPTIMIZATION and USING_AMD real(wp), dimension(6) :: moms, msum real(wp), dimension(4, 3) :: wght, abscX, abscY, wght_pb, wght_mv, wght_ht, ht + @:ASSERT(nmom <= 6, "USING_AMD expects nmom ≤ 6") + @:ASSERT(nnode <= 4, "USING_AMD expects nnode ≤ 4") + @:ASSERT(nb <= 3, "USING_AMD expects nb ≤ 3") #:else real(wp), dimension(nmom) :: moms, msum real(wp), dimension(nnode, nb) :: wght, abscX, abscY, wght_pb, wght_mv, wght_ht, ht #:endif #:if USING_AMD real(wp), dimension(32, 0:2, 0:2) :: coeff + @:ASSERT(nterms <= 32, "USING_AMD expects nterms ≤ 32") #:elsesrc/common/m_variables_conversion.fpp-324-330 (1)
324-330: Guard AMD fixed-size buffers against larger num_fluids/num_vels/num_species/nb.The AMD optimization paths in this file downsize arrays to fixed lengths (3 for fluids/vels, 10 for species), but the code contains loops iterating to
num_fluids,num_vels,num_species, andnbthat directly index these fixed-size arrays. For example:
s_convert_species_to_mixture_variables_acc(line ~344):do i = 1, num_fluidsindexesalpha_rho_K(i)withdimension(3)s_compute_species_fraction(line ~1356):do i = 1, num_fluidsindexesalpha_rho_K(i)withdimension(3)s_flux_variables(line ~1234-1237): loops indexvel_K(i)andY_K(i)with fixed boundsThis will cause out-of-bounds memory access if any limit is exceeded.
Add ASSERT guards in
s_initialize_variables_conversion_moduleto validate these constraints at startup:Suggested fix
impure subroutine s_initialize_variables_conversion_module integer :: i, j + #:if not MFC_CASE_OPTIMIZATION and USING_AMD + @:ASSERT(num_fluids <= 3, "USING_AMD path supports ≤3 fluids") + @:ASSERT(num_vels <= 3, "USING_AMD path supports ≤3 velocity components") + @:ASSERT(num_species <= 10, "USING_AMD path supports ≤10 species") + @:ASSERT(nb <= 3, "USING_AMD path supports ≤3 bubble bins") + #:endif $:GPU_ENTER_DATA(copyin='[is1b,is1e,is2b,is2e,is3b,is3e]')Also applies to lines 590–598, 1191–1200, 1338–1341, 1412–1415.
src/simulation/m_data_output.fpp-271-277 (1)
271-277: Fix hardcoded buffer dimensions on non-optimized AMD builds to prevent overflow whennum_fluids > 3.On AMD builds with
MFC_CASE_OPTIMIZATION = false,alphaand related variables are hardcoded todimension(3). Ifnum_fluidsis set to 4 or higher, this creates an array overrun. Change the hardcoded3to usenum_fluidsornum_fluids_max, or add a compile-time check to enforcenum_fluids ≤ 3for non-optimized AMD builds.
🟡 Minor comments (12)
src/simulation/include/inline_capillary.fpp-4-9 (1)
4-9: Initialize Omega(2,1/1,2/2,2) when the guard is false.With
#:if not MFC_CASE_OPTIMIZATION or num_dims > 1, these entries are left undefined when the condition is false. If any downstream logic (even diagnostics) reads them in 1D, this can propagate garbage values. Consider zeroing them in an#:elsebranch or confirm they are never referenced in that configuration.Suggested guard-safe zeroing
#:if not MFC_CASE_OPTIMIZATION or num_dims > 1 Omega(2, 1) = sigma*w1*w2/normW Omega(1, 2) = Omega(2, 1) Omega(2, 2) = -sigma*(w1*w1 + w3*w3)/normW +#:else + Omega(2, 1) = 0.0_wp + Omega(1, 2) = 0.0_wp + Omega(2, 2) = 0.0_wp #:endifsrc/simulation/m_time_steppers.fpp-533-538 (1)
533-538: Avoid double run_time_info writes ifdummyis ever true.With
dummyset, both branches execute and you’ll emit two runtime-info rows per step. Ifdummyis intended to stay false for the compiler workaround, consider documenting that invariant or adding a guard/ASSERT to prevent accidental enablement.load_amd.sh-5-7 (1)
5-7: Remove or useAFAR_UMS_LATESTto avoid dead variable.It’s computed but unused. Either remove it or wire it into
OLCF_AFAR_ROOTto prevent confusion.🔧 Suggested fix (removal)
-AFAR_UMS_LATEST=$(ls -d --color=never ${AFAR_UMS_BASEDIR}/*/ | tail -n1)load_amd.sh-1-4 (1)
1-4: Add an explicit shell declaration (or document that the file must be sourced).Shellcheck flags the missing shebang. If this is meant to be executed, a shebang avoids
/bin/shrunning without themodulefunction; if it’s meant to be sourced, add an explicit note or a ShellCheck directive.🔧 Suggested fix
+#!/usr/bin/env bash +# shellcheck shell=bashsrc/simulation/m_riemann_solvers.fpp-4762-4769 (1)
4762-4769: Initializedivergence_cylbefore the preprocessor guard.When
MFC_CASE_OPTIMIZATIONstrips the block (e.g., 1D),divergence_cylis still used for stresses and can be undefined. Initialize to zero and only add terms inside the guard.🧯 Safe initialization
-#:if not MFC_CASE_OPTIMIZATION or num_dims > 1 - divergence_cyl = avg_dvdx_int(1) + avg_dvdy_int(2) + avg_v_int(2)/r_eff + divergence_cyl = 0._wp +#:if not MFC_CASE_OPTIMIZATION or num_dims > 1 + divergence_cyl = avg_dvdx_int(1) + avg_dvdy_int(2) + avg_v_int(2)/r_eff if (num_dims > 2) then #:if not MFC_CASE_OPTIMIZATION or num_dims > 2 divergence_cyl = divergence_cyl + avg_dvdz_int(3)/r_eff #:endif end if #:endif.github/workflows/bench.yml-73-80 (1)
73-80: Add frontier_amd to actionlint’s allowed labels (or equivalent).
actionlint flagsfrontier_amdas unknown; if actionlint runs in CI, this will fail. Please whitelist the custom label (or update your actionlint config) to avoid spurious lint failures..github/workflows/frontier_amd/test.sh-7-13 (1)
7-13: Add explicit parameter validation forjob_deviceandjob_interface.These env vars are set by
submit.shduring normal workflow execution, but the script has no validation. Direct invocation without these variables would silently fall back to CPU/no-interface behavior. Adding parameter expansion checks makes the script robust against misuse:Suggested fix
+ : "${job_device:?job_device must be set to cpu|gpu}" + : "${job_interface:=none}" device_opts="" if [ "$job_device" = "gpu" ]; then device_opts+="--gpu".github/workflows/frontier_amd/test.sh-3-5 (1)
3-5: Fix GPU counting for multi-digit GPU IDs.The current logic uses
wc -cto count characters after removing whitespace. This works only for single-digit GPU IDs; with multi-digit IDs (e.g., "10 11 12"), the character count inflates incorrectly. Usewc -wto count tokens instead.Suggested fix
-gpus=`rocm-smi --showid | awk '{print $1}' | grep -Eo '[0-9]+' | uniq | tr '\n' ' '` -ngpus=`echo "$gpus" | tr -d '[:space:]' | wc -c` +gpus=$(rocm-smi --showid | awk '{print $1}' | grep -Eo '[0-9]+' | uniq) +ngpus=$(echo "$gpus" | wc -w) +if [ "$job_device" = "gpu" ] && [ "$ngpus" -eq 0 ]; then + echo "No GPUs detected; aborting GPU test run." >&2 + exit 1 +fi.github/workflows/frontier_amd/submit-bench.sh-5-10 (1)
5-10: Quote the script path and document the interface argument.
cat $1will break on spaces, and$3is used injob_slugbut not documented in usage. Tightening both avoids avoidable operator error.🔧 Proposed fix
usage() { - echo "Usage: $0 [script.sh] [cpu|gpu]" + echo "Usage: $0 [script.sh] [cpu|gpu] [none|acc|omp]" } if [ ! -z "$1" ]; then - sbatch_script_contents=`cat $1` + sbatch_script_contents=$(cat "$1") else usage exit 1 fi @@ -job_slug="`basename "$1" | sed 's/\.sh$//' | sed 's/[^a-zA-Z0-9]/-/g'`-$2-$3" +job_slug="`basename "$1" | sed 's/\.sh$//' | sed 's/[^a-zA-Z0-9]/-/g'`-$2-$3"Also applies to: 27-28
.github/workflows/frontier_amd/submit.sh-5-10 (1)
5-10: Quote the script path and document the interface argument.
cat $1will fail on paths with spaces, and$3is used but not described in usage. Aligning usage and quoting avoids brittle submissions.🔧 Proposed fix
usage() { - echo "Usage: $0 [script.sh] [cpu|gpu]" + echo "Usage: $0 [script.sh] [cpu|gpu] [none|acc|omp]" } if [ ! -z "$1" ]; then - sbatch_script_contents=`cat $1` + sbatch_script_contents=$(cat "$1") else usage exit 1 fi @@ -job_slug="`basename "$1" | sed 's/\.sh$//' | sed 's/[^a-zA-Z0-9]/-/g'`-$2-$3" +job_slug="`basename "$1" | sed 's/\.sh$//' | sed 's/[^a-zA-Z0-9]/-/g'`-$2-$3"Also applies to: 28-29
.github/workflows/frontier_amd/build.sh-19-21 (1)
19-21: Remove unuseddirnameassignment.
dirnameis set on Line 20 but never used, which is already flagged by Shellcheck. Dropping it keeps the loop clean and avoids confusion.🧹 Proposed cleanup
- dirname=$(basename "$dir") ./mfc.sh run "$dir/case.py" --case-optimization -j 8 --dry-run $build_opts.github/workflows/frontier_amd/bench.sh-3-16 (1)
3-16: Fail fast when required job context variables are missing.
job_device,job_interface, andjob_slugare assumed to be set by the submit wrapper; if someone runs this script directly, it silently misconfigures. Add a guard to make the contract explicit.✅ Suggested guard
#!/bin/bash +if [ -z "${job_device:-}" ] || [ -z "${job_slug:-}" ]; then + echo "Usage: job_device=<cpu|gpu> job_interface=<none|acc|omp> job_slug=<slug> $0" + exit 1 +fi + n_ranks=12 device_opts="" if [ "$job_device" = "gpu" ]; then
🧹 Nitpick comments (1)
src/post_process/m_global_parameters.fpp (1)
248-248: Add a comment explaining the purpose of thedummyvariable.Per the PR description, this variable is a temporary workaround for an AMD
amdflangcompiler bug where a conditional with a parameter followed by a GPU kernel can cause a catastrophic error. Without documentation, future maintainers may remove this as apparent dead code.📝 Suggested documentation
logical :: fft_wrt + !> `@note` Temporary workaround for amdflang case-optimization bug. + !! Remove once the compiler is patched. logical :: dummy logical :: pres_wrt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 20
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
src/simulation/m_data_output.fpp (1)
329-334: Potential unallocated array access whendummy=trueandviscous=false.Line 329 changes the condition to
viscous .or. dummy, which meansvcfl_sfandRc_sfwill be accessed even whenviscousis false (ifdummyis true). However, these arrays are only allocated whenviscousis true (sees_initialize_data_output_modulearound lines 1827-1833):if (viscous) then @:ALLOCATE(vcfl_sf(0:m, 0:n, 0:p)) @:ALLOCATE(Rc_sf (0:m, 0:n, 0:p))If
dummy=trueandviscous=false, this will cause an access to unallocated arrays, leading to undefined behavior or a runtime crash.Proposed fix
Either guard the allocation with the same condition:
- if (viscous) then + if (viscous .or. dummy) then @:ALLOCATE(vcfl_sf(0:m, 0:n, 0:p)) @:ALLOCATE(Rc_sf (0:m, 0:n, 0:p))Or keep the runtime check consistent:
- if (viscous .or. dummy) then + if (viscous) thensrc/simulation/m_time_steppers.fpp (1)
692-712: Guard fixed-size AMDvel/alphaarrays against larger counts.With
USING_AMD+ no case optimization,vel/alphaare size 3 but downstream routines still index bynum_vels/num_fluids. If either exceeds 3, GPU kernels can read/write out of bounds. Add a guard (or size to a max constant) before entering the GPU loop.As per coding guidelines: Use s_mpi_abort() for error termination instead of stop.🛠️ Suggested guard
impure subroutine s_compute_dt() + #:if not MFC_CASE_OPTIMIZATION and USING_AMD + if (num_vels > 3 .or. num_fluids > 3) then + call s_mpi_abort('USING_AMD expects num_vels/num_fluids <= 3 in s_compute_dt') + end if + #:endif real(wp) :: rho !< Cell-avg. densitysrc/simulation/m_viscous.fpp (1)
73-118: Fixed-size AMD viscous scratch arrays can overflow whennum_fluids > 3.In the
USING_AMD+ non-optimized path,alpha_visc/alpha_rho_viscare size 3 but loops still iterate1:num_fluids. Ifnum_fluidsexceeds 3, device out-of-bounds access is possible. Add a guard (or use a max-size parameter) before the GPU kernels.As per coding guidelines: Use s_mpi_abort() for error termination instead of stop.🛠️ Suggested guard
subroutine s_compute_viscous_stress_cylindrical_boundary(q_prim_vf, grad_x_vf, grad_y_vf, grad_z_vf, & tau_Re_vf, & ix, iy, iz) + #:if not MFC_CASE_OPTIMIZATION and USING_AMD + if (num_fluids > 3) then + call s_mpi_abort('USING_AMD expects num_fluids <= 3 in s_compute_viscous_stress_cylindrical_boundary') + end if + #:endif type(scalar_field), dimension(sys_size), intent(in) :: q_prim_vfsrc/simulation/m_qbmm.fpp (1)
573-640: AMD fixed-size QBMM scratch buffers need bounds guarding.In AMD builds, several local arrays are hard-capped (
coeffs= 32,moms/msum= 6,wght/absc= 4×3,f_quadinputs = 4×3/4), but loops still run overnterms,nmom,nnode, andnb. If any runtime value exceeds the cap, device out-of-bounds access is possible. Add guards (or size to max constants) before entering the GPU region.As per coding guidelines: Use s_mpi_abort() for error termination instead of stop.🛠️ Suggested guard
#:if not MFC_CASE_OPTIMIZATION and USING_AMD real(wp), dimension(6) :: moms, msum real(wp), dimension(4, 3) :: wght, abscX, abscY, wght_pb, wght_mv, wght_ht, ht #:else real(wp), dimension(nmom) :: moms, msum real(wp), dimension(nnode, nb) :: wght, abscX, abscY, wght_pb, wght_mv, wght_ht, ht #:endif #:if USING_AMD real(wp), dimension(32, 0:2, 0:2) :: coeff #:else real(wp), dimension(nterms, 0:2, 0:2) :: coeff #:endif + #:if USING_AMD + if (nterms > 32) then + call s_mpi_abort('USING_AMD expects nterms <= 32 in QBMM') + end if + #:endif + #:if not MFC_CASE_OPTIMIZATION and USING_AMD + if (nmom > 6 .or. nnode > 4 .or. nb > 3) then + call s_mpi_abort('USING_AMD expects nmom<=6, nnode<=4, nb<=3 in QBMM') + end if + #:endifAlso applies to: 652-676, 726-746, 890-894, 983-1023
src/common/m_variables_conversion.fpp (1)
323-356: Guard AMD fixed-size mixture/species arrays against larger runtime counts.In
USING_AMD+ non-optimized builds,alpha_*arrays are size 3 andY_*arrays are size 10, but loops still iteratenum_fluids/num_species. If either exceeds the cap, device out-of-bounds access is possible. Add guards (or size to max constants) before entering GPU loops.As per coding guidelines: Use s_mpi_abort() for error termination instead of stop.🛠️ Suggested guard
type(int_bounds_info), dimension(1:3), intent(in) :: ibounds + #:if not MFC_CASE_OPTIMIZATION and USING_AMD + if (num_fluids > 3) then + call s_mpi_abort('USING_AMD expects num_fluids <= 3 in variables_conversion') + end if + if (chemistry .and. num_species > 10) then + call s_mpi_abort('USING_AMD expects num_species <= 10 in variables_conversion') + end if + #:endif #:if USING_AMD and not MFC_CASE_OPTIMIZATION real(wp), dimension(3) :: alpha_K, alpha_rho_KAlso applies to: 590-598, 1191-1200, 1337-1376, 1412-1415
🤖 Fix all issues with AI agents
In @.github/workflows/frontier_amd/submit-bench.sh:
- Around line 9-51: The current submission inlines sbatch_script_contents into
the heredoc which causes $/$(...) expansion and breaks paths with spaces;
instead drop sbatch_script_contents and pass the script filename into the job
environment (e.g., compute job_script_basename="$(basename "$1")" and keep
job_slug/job_device/job_interface as you already set) then inside the heredoc
invoke the script at runtime with a quoted path such as bash
"$SLURM_SUBMIT_DIR/$job_script_basename" (or ensure it is executable and use
"$SLURM_SUBMIT_DIR/$job_script_basename"), preserving variable quoting so
runtime SLURM variables are expanded on the compute node rather than on the
submit host.
In @.github/workflows/frontier_amd/submit.sh:
- Around line 5-7: The usage() currently documents only two args but the script
later uses $3, causing job_slug to end with a trailing '-' and job_interface to
be empty; either (A) add a third argument to usage() and validate $3 (like
phoenix/submit.sh) accepting values such as none|acc|omp, set job_interface from
$3 with validation, and include it in job_slug construction, or (B) remove all
references to $3 and change the job_slug construction and job_interface
assignment so they don't append a dangling '-' or rely on an empty interface;
update the usage() text to match the chosen behavior and ensure job_slug and
job_interface are computed safely (no trailing hyphen, job_interface defaulted
or validated) in the script functions.
In @.github/workflows/frontier_amd/test.sh:
- Around line 3-4: The GPU count logic using the variables gpus and ngpus is
fragile because ngpus is computed by stripping whitespace then using wc -c,
which breaks for multi‑digit IDs; update the ngpus calculation to count words
instead (e.g., replace the wc -c approach with wc -w or an equivalent
word-count/field-count method) so that the ngpus variable correctly reflects the
number of GPU IDs parsed into gpus.
In `@src/common/m_boundary_common.fpp`:
- Around line 1561-1617: The y-direction IGR block in s_populate_F_igr_buffers
is being excluded for 2D because the conditional is "not MFC_CASE_OPTIMIZATION
or num_dims > 2"; change this to include 2D by using "not MFC_CASE_OPTIMIZATION
or num_dims >= 2" (or otherwise remove the strict > check) so the bc_y/jac_sf
y-boundary handling compiles for num_dims == 2; update the #:if that surrounds
the y-direction block and ensure symbols referenced (MFC_CASE_OPTIMIZATION,
num_dims, s_populate_F_igr_buffers, bc_y, jac_sf) remain correct.
In `@src/common/m_chemistry.fpp`:
- Around line 22-27: The AMD-only static array molecular_weights_nonparameter is
hard-coded to length 10 and can overflow when loops over chemxb:chemxe use
num_species; update the AMD block (guarded by USING_AMD) to allocate/declare
molecular_weights_nonparameter using the dynamic num_species size (or add a
clear assert that num_species == 10 in AMD builds) and initialize it from the
mechanism's molecular weights at runtime; apply the same fix to all other
AMD-only work arrays referenced in this file (e.g., Xs_*, mass_diffusivities_*,
and any other arrays in the 129-187, 177-187, 203-423 regions) so all AMD
buffers are sized from num_species (or validated against it).
In `@src/common/m_phase_change.fpp`:
- Around line 94-98: The AMD-path declarations use fixed-size arrays
p_infOV,p_infpT,p_infSL,sk,hk,gk,ek,rhok of dimension(3) but
s_infinite_relaxation_k loops over 1..num_fluids, which can overflow if
num_fluids>3; add an assertion at the declaration site to prevent misuse by
checking num_fluids <= 3 when USING_AMD and not MFC_CASE_OPTIMIZATION (e.g., add
an @:ASSERT or equivalent guard next to the real(wp), dimension(3) declarations
for sk,hk,gk,ek,rhok to enforce this constraint and fail-fast).
In `@src/simulation/m_acoustic_src.fpp`:
- Around line 146-150: The arrays myalpha and myalpha_rho are sized to 3 under
the AMD path (guarded by MFC_CASE_OPTIMIZATION and USING_AMD) but later indexed
up to num_fluids, risking out-of-bounds access; update the declaration so both
branches declare real(wp), dimension(num_fluids) :: myalpha, myalpha_rho (i.e.,
remove the fixed size 3) to match uses in the loop that iterates q = 1,
num_fluids, or alternatively add a compile-time/runtime assertion that USING_AMD
implies num_fluids <= 3 and guard all accesses accordingly; reference symbols:
myalpha, myalpha_rho, num_fluids, MFC_CASE_OPTIMIZATION, USING_AMD.
In `@src/simulation/m_bubbles_EL.fpp`:
- Around line 785-789: The vel array (declared under the USING_AMD branch) can
be length 3 while only vel(1:num_dims) are set, causing dot_product(vel,vel) to
read an uninitialized component in 2D; fix by either zero-initializing vel
before the loop that fills vel(1:num_dims) or by changing calls to dot_product
to use the slice dot_product(vel(1:num_dims), vel(1:num_dims)); update
references to vel, num_dims, and dot_product (and keep the conditional branches
USING_AMD / MFC_CASE_OPTIMIZATION unchanged) so no component outside 1:num_dims
is ever read uninitialized.
- Around line 540-544: Add host-side guards for the AMD fixed-size arrays:
before launching GPU kernels that call s_compute_species_fraction or
s_convert_species_to_mixture_variables_acc, assert that when USING_AMD and not
MFC_CASE_OPTIMIZATION the runtime num_fluids <= 3 (or change the AMD path to use
dynamic sizing) to prevent indexing myalpha_rho/myalpha out of bounds; likewise,
in the caller of s_compute_cson_from_pinf add a host-side check that when
USING_AMD the simulation either has num_dims == 3 or initialize/zero the missing
vel components (or assert num_dims == 3) so vel(3) is not left uninitialized
before dot_product is used. Ensure the checks reference the symbols USING_AMD,
MFC_CASE_OPTIMIZATION, num_fluids, num_dims and the routines
s_compute_species_fraction, s_convert_species_to_mixture_variables_acc, and
s_compute_cson_from_pinf.
In `@src/simulation/m_cbc.fpp`:
- Around line 653-674: Add fypp ASSERT guards at the start of the s_cbc routine
(before the local declarations) to ensure AMD fixed-size arrays are large
enough: when USING_AMD is true (and in the not MFC_CASE_OPTIMIZATION branch)
assert sys_size <= 12 for L, assert num_fluids <= 3 for
mf/alpha_rho/dalpha_rho_ds/dadv_dt/dadv_ds/dalpha_rho_dt, assert num_vels <= 3
for vel/dvel_ds, and assert num_species <= 10 for
Ys/h_k/dYs_dt/dYs_ds/Xs/Gamma_i/Cp_i; use the fypp macro @:ASSERT(predicate,
"descriptive message") so builds fail with a clear message if these counts
exceed the hard-coded AMD sizes.
In `@src/simulation/m_compute_cbc.fpp`:
- Around line 26-30: When USING_AMD and not MFC_CASE_OPTIMIZATION, add runtime
fypp ASSERT guards in the module initialization to prevent OOB by validating
global loop bounds against the hard-coded array sizes: assert advxe <= 12 and
chemxe <= 12 and chemxb >= 1 and E_idx <= 3 (or the appropriate max) so
expressions like L(2:advxe-1), mf, dalpha_rho_ds and dYs_ds cannot be indexed
out of range; place these checks near the module init or wherever USING_AMD
conditional declarations occur and reference the symbols advxe, chemxb, chemxe,
E_idx, L, mf, dalpha_rho_ds, and dYs_ds so failures are clear.
In `@src/simulation/m_hyperelastic.fpp`:
- Around line 101-105: The AMD build branch hardcodes tensora and tensorb to
size 9 causing out-of-bounds when tensor_size = num_dims**2 + 1 (e.g. 10 for
3D); update the conditional declaration to allocate tensora and tensorb with
dimension(tensor_size) when USING_AMD so both branches use real(wp),
dimension(tensor_size) :: tensora, tensorb, and replace the hardcoded
tensorb(1:9) initialization/assignments with loops or assignments up to
tensor_size (referencing variables tensor_size, tensora, tensorb, and num_dims
and the routines/blocks that initialize or index tensorb and tensora).
- Around line 107-111: The conditional sizing of alpha_k and alpha_rho_k to 3
when MFC_CASE_OPTIMIZATION is false and USING_AMD is true causes out-of-bounds
writes by s_compute_species_fraction which writes num_fluids entries; fix by
making alpha_k and alpha_rho_k use dimension(num_fluids) unconditionally (remove
the special-case 3 sizing) or, if AMD optimization truly requires num_fluids ≤
3, add a runtime guard that checks num_fluids ≤ 3 at entry (e.g., in the routine
that declares alpha_k/alpha_rho_k) and error/abort with a clear message;
references: alpha_k, alpha_rho_k, s_compute_species_fraction, num_fluids,
MFC_CASE_OPTIMIZATION, USING_AMD.
In `@src/simulation/m_ibm.fpp`:
- Around line 184-196: The AMD branch declares fixed-size locals (Gs,
alpha_rho_IP, alpha_IP, r_IP, v_IP, pb_IP, mv_IP, nmom_IP, presb_IP, massv_IP)
but runtime dimensions (num_fluids, nb, nmom, nnode/contxe) can exceed those
sizes; add fypp ASSERT checks right after the #:if USING_AMD branch to validate
num_fluids <= 3, nb <= 3, nb*nmom <= 18, and nb*nnode <= 12 (or call s_mpi_abort
with an explanatory message) so loops using Gs, alpha_rho_IP, alpha_IP,
r_IP/v_IP/pb_IP/mv_IP, nmom_IP, presb_IP, massv_IP cannot overrun the fixed
buffers; alternatively constrain loop bounds to the corresponding fixed
constants when those ASSERTs fail.
In `@src/simulation/m_pressure_relaxation.fpp`:
- Around line 220-224: The arrays alpha_rho and alpha are declared with
dimension(2) under the conditional branch (when not MFC_CASE_OPTIMIZATION and
USING_AMD) but later loops iterate do i = 1, num_fluids and num_fluids can be 3,
causing out-of-bounds access; update that conditional declaration to use
dimension(3) (matching the maximum possible num_fluids) so alpha_rho and alpha
are large enough for the loops in m_pressure_relaxation.fpp while leaving the
other branch (dimension(num_fluids)) unchanged.
In `@src/simulation/m_riemann_solvers.fpp`:
- Around line 294-308: Add an ASSERT in s_initialize_riemann_solvers_module to
validate runtime counts against the AMD fixed-size locals (e.g. require
num_fluids <= 3, num_vels <= 3, num_species <= 10, nb <= 10 as applicable) and
fail fast if violated; alternatively, if you want to tolerate smaller counts,
explicitly zero the AMD fixed-size arrays (alpha_rho_L/R, vel_L/R, alpha_L/R,
Ys_L/R, Cp_iL/R, Xs_L/R, Gamma_iL/R, Yi_avg, Phi_avg, h_iL/R, h_avg_2) and only
operate on the valid slice (use array slicing for
num_fluids/num_vels/num_species) so whole-array ops do not read uninitialized
elements; use the fypp ASSERT macro for the checks and reference the local names
above to locate the AMD-specific declarations to modify.
- Around line 4761-4769: divergence_cyl may be left uninitialized when
MFC_CASE_OPTIMIZATION is true and num_dims == 1; initialize divergence_cyl to a
safe default (e.g., 0.0) before the macro-guarded block so subsequent uses in
shear/bulk stress calculations are defined; update the region around the
divergence computation (symbols: divergence_cyl, avg_dvdx_int, avg_dvdy_int,
avg_v_int, r_eff, avg_dvdz_int, and the conditional based on
MFC_CASE_OPTIMIZATION/num_dims) to set divergence_cyl = 0.0 prior to the
existing conditional assignments.
In `@src/simulation/m_sim_helpers.fpp`:
- Around line 100-115: The AMD-specific declarations fix alpha, vel, alpha_rho,
and Gs to dimension(3) which can under-allocate when num_fluids > 3; update the
conditional branch (the block guarded by "not MFC_CASE_OPTIMIZATION and
USING_AMD") to use a safe upper bound (e.g., dimension(num_fluids_max) or a
compile-time constant) or add an explicit runtime guard against indexing past
size for alpha, alpha_rho, Gs and vel (references: variables alpha, vel,
alpha_rho, Gs and the conditional USING_AMD/MFC_CASE_OPTIMIZATION).
In `@toolchain/mfc/case.py`:
- Around line 324-325: The template assigns muscl_polyn from the wrong parameter
(it uses muscl_order), so update the assignment for muscl_polyn to read the
muscl_polyn parameter instead; specifically change the second line that sets
muscl_polyn to use int(self.params.get("muscl_polyn", 1)) while leaving
muscl_order unchanged so user-provided muscl_polyn values are honored.
In `@toolchain/templates/frontier_amd.mako`:
- Line 49: Before the sbcast invocation that writes to
/mnt/bb/$USER/${target.name} (the line calling sbcast --send-libs -pf
${target.get_install_binpath(case)} /mnt/bb/$USER/${target.name}), add a step
that creates the per-target subdirectory on all compute nodes using srun mkdir
-p /mnt/bb/$USER/${target.name}; ensure this mkdir step runs with the same
node/task scope as the sbcast (use SLURM_NNODES and --ntasks-per-node=1 as in
the review) so the directory exists on every node before sbcast executes; apply
the same change to the equivalent sbcast call in frontier.mako.
🟡 Minor comments (5)
toolchain/modules-48-51 (1)
48-51: Trailing whitespace on line 48.Minor formatting issue: line 48 has a trailing space after "AMD".
🔧 Proposed fix
-famd OLCF Frontier AMD +famd OLCF Frontier AMDtoolchain/bootstrap/modules.sh-139-139 (1)
139-139: Potential issue:CRAY_LD_LIBRARY_PATHmay be undefined forfamd.Line 121 explicitly skips the CRAY_LD_LIBRARY_PATH handling for
famd, but line 139 appends${CRAY_LD_LIBRARY_PATH}toLD_LIBRARY_PATH. IfCRAY_LD_LIBRARY_PATHis not set, this will append an empty string (harmless but creates a trailing colon) or could cause issues in strict shell modes.Consider checking if the variable is set before using it, or remove it if it's not expected to be available for
famd.🔧 Proposed fix
- export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:${CRAY_LD_LIBRARY_PATH}" + if [ ! -z "${CRAY_LD_LIBRARY_PATH+x}" ]; then + export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:${CRAY_LD_LIBRARY_PATH}" + fi.github/workflows/frontier_amd/submit.sh-37-37 (1)
37-37: Minor: Missing space between-oand filename.The
-oSBATCH directive is missing a space before the filename, which may cause issues with some Slurm versions.Proposed fix
-#SBATCH -o$job_slug.out # Combined output and error messages file +#SBATCH -o $job_slug.out # Combined output and error messages file.github/workflows/frontier_amd/bench.sh-18-22 (1)
18-22: Consider quoting$(nproc)and verify CPU moden_ranksvalue.
$(nproc)on line 21 should be quoted to satisfy shellcheck and prevent potential word-splitting issues (though unlikely with numeric output).In CPU mode,
n_ranksremains at the initialized value of 12, but-j $(nproc)uses the actual processor count. Verify this asymmetry is intentional (perhaps-ncontrols MPI ranks while-jcontrols parallel jobs).Proposed fix for quoting
- ./mfc.sh bench --mem 1 -j $(nproc) -o "$job_slug.yaml" -- -c frontier_amd $device_opts -n $n_ranks + ./mfc.sh bench --mem 1 -j "$(nproc)" -o "$job_slug.yaml" -- -c frontier_amd $device_opts -n $n_ranksCMakeLists.txt-462-472 (1)
462-472: Missing SHELL: prefix for multi-flag environment variables in LLVMFlang MPI and HIPFORT configuration.Lines 468 and 490 pass
$ENV{CRAY_MPICH_LIB}and$ENV{CRAY_HIPFORT_LIB}unquoted totarget_link_libraries, but these variables expand to multiple space-separated flags (e.g.,-L/path/lib -lmpi -lpmi). CMake will treat the entire string as a single argument, causing linker failures. Wrap these in"SHELL:"to preserve whitespace handling:"SHELL:$ENV{CRAY_MPICH_LIB}"and"SHELL:$ENV{CRAY_HIPFORT_LIB}".Alternatively, use CMake variables from
find_package(MPI): line 463 callsfind_package(MPI COMPONENTS Fortran REQUIRED), which populatesMPI_Fortran_INCLUDE_PATHandMPI_Fortran_LIBRARIES. These are the standard CMake approach and avoid raw environment variable dependencies.Applies to line 468 (MPI) and line 490 (HIPFORT).
🧹 Nitpick comments (13)
toolchain/mfc/test/case.py (1)
244-268: Cache trace parts for clarity.
This avoids repeated splitting and centralizes the delimiter logic.♻️ Proposed refactor
tolerance = 1e-12 # Default single = ARG("single") + trace_parts = self.trace.split(" -> ") - if "Example" in self.trace.split(" -> "): + if "Example" in trace_parts: tolerance = 1e-3 - elif "Cylindrical" in self.trace.split(" -> "): + elif "Cylindrical" in trace_parts: tolerance = 1e-9 @@ - elif "Axisymmetric" in self.trace.split(" -> "): + elif "Axisymmetric" in trace_parts: tolerance = 1e-11toolchain/modules (1)
48-51: Consider addingfamd-gpuentry for GPU-specific modules.Other GPU-capable systems (e.g.,
ffor Frontier,pfor Phoenix) define separate-gpuentries with GPU-specific modules and environment variables. Thefamdconfiguration only has-allentries, which may be intentional if GPU support is configured differently for AMD, but this differs from the established pattern.src/pre_process/m_global_parameters.fpp (1)
303-303: Add documentation for thedummyworkaround variable.According to the PR description, this variable is a workaround for an AMD compiler bug where "a conditional with a parameter followed by a GPU kernel can cause a catastrophic error if the conditional is false." The variable name
dummydoesn't convey this purpose.Consider adding a comment explaining the workaround and linking to any relevant issue tracker, so future maintainers understand why this exists and when it can be removed.
📝 Proposed documentation
logical :: fft_wrt + !> Workaround for AMD Flang compiler bug: prevents case-optimization issue + !! when a conditional with a parameter precedes a GPU kernel. + !! TODO: Remove when upstream patch is available. logical :: dummytoolchain/bootstrap/modules.sh (1)
126-144: Hardcoded AFAR toolchain path may become stale.The path
/sw/crusher/ums/compilers/afar/rocm-afar-8873-drop-22.2.0is specific to a particular version of the AFAR toolchain. This hardcoded path will need to be updated when the toolchain is upgraded.Consider documenting this dependency or adding a comment indicating which version is expected and when it was last verified.
📝 Suggested documentation
if [ "$u_c" '==' 'famd' ]; then + # AFAR toolchain for AMD Flang on Frontier + # Last verified: January 2026 with rocm-afar-8873-drop-22.2.0 export OLCF_AFAR_ROOT="/sw/crusher/ums/compilers/afar/rocm-afar-8873-drop-22.2.0"src/simulation/m_muscl.fpp (1)
115-162: Compiler workaround usingdummyflag is functional but warrants a brief comment.The
dummyflag additions on lines 115, 119, and 162 correctly implement the workaround for the AMD amdflang case-optimization bug. Whendummyis true, both themuscl_order == 1andmuscl_order == 2paths will execute. Consider adding a brief inline comment near these conditions explaining the workaround:! Workaround for amdflang case-optimization bug (see PR `#1116`) if (muscl_order /= 1 .or. dummy) thenThis helps future maintainers understand why the
dummyguard is present and when it can be removed once an upstream patch is available.load_amd.sh (3)
1-4: Add shebang or sourcing directive for clarity.The script lacks a shebang line, which makes it unclear whether it's meant to be executed or sourced. Since this script uses
module loadcommands that affect the current shell environment, it should be sourced rather than executed.Proposed fix
Add a header comment indicating the script should be sourced:
+#!/usr/bin/env bash +# This script should be sourced, not executed: +# source load_amd.sh + module load python cmake module load cpe/25.09 module load PrgEnv-amd
5-7: Unused variableAFAR_UMS_LATESTshould be removed or used.
AFAR_UMS_LATESTis computed on line 6 but never used. Line 7 uses a hardcoded path instead. Either use the computed latest path or remove the unused variable:Option 1: Remove unused variable
AFAR_UMS_BASEDIR="/sw/crusher/ums/compilers/afar" -AFAR_UMS_LATEST=$(ls -d --color=never ${AFAR_UMS_BASEDIR}/*/ | tail -n1) export OLCF_AFAR_ROOT=${AFAR_UMS_BASEDIR}/"rocm-afar-8873-drop-22.2.0"Option 2: Use the latest version (if intended)
AFAR_UMS_BASEDIR="/sw/crusher/ums/compilers/afar" AFAR_UMS_LATEST=$(ls -d --color=never ${AFAR_UMS_BASEDIR}/*/ | tail -n1) -export OLCF_AFAR_ROOT=${AFAR_UMS_BASEDIR}/"rocm-afar-8873-drop-22.2.0" +export OLCF_AFAR_ROOT=${AFAR_UMS_LATEST}
25-27: Consider removing commented-out code or adding explanation.The commented-out module load commands (lines 25-27) appear to be legacy code. If they're kept for reference, consider adding a comment explaining why, or remove them to keep the script clean.
.github/workflows/frontier_amd/build.sh (1)
19-21: Remove unuseddirnameto silence shellcheck.
dirnameis never referenced and triggers SC2034. Dropping it keeps the script clean.♻️ Suggested cleanup
- dirname=$(basename "$dir") ./mfc.sh run "$dir/case.py" --case-optimization -j 8 --dry-run $build_opts.github/workflows/frontier_amd/submit.sh (1)
44-45: Nit:$(pwd)expands at submission time, not job runtime.Inside the heredoc with double quotes,
$(pwd)will expand when the script is submitted, not when the job runs. Since the script doescd "$SLURM_SUBMIT_DIR"first, the logged path should still be correct, but for clarity you might want to use\$(pwd)to defer expansion or simply reference$SLURM_SUBMIT_DIR.Proposed fix
cd "\$SLURM_SUBMIT_DIR" -echo "Running in $(pwd):" +echo "Running in \$SLURM_SUBMIT_DIR:".github/workflows/frontier_amd/test.sh (1)
6-14: Variablesjob_deviceandjob_interfaceare expected from caller.The static analyzer flags these as unassigned, but they are intentionally set by
submit.shbefore this script is sourced/executed. This is consistent with the workflow design but could be documented with a comment for clarity.Optional: Add comment for clarity
+# Expected environment variables from submit.sh: +# job_device - "cpu" or "gpu" +# job_interface - "acc", "omp", or empty + device_opts="" if [ "$job_device" = "gpu" ]; then.github/workflows/frontier_amd/bench.sh (2)
8-8: Nit: Redundant string transformation forgpu_ids.The
gpu_idstransformation is convoluted: it converts spaces to newlines, then newlines back to spaces, then removes trailing space. Since$gpusis already space-separated fromtr '\n' ' 'on line 6, this can be simplified.Simplified version
- gpu_ids=$(echo "$gpus" | tr ' ' '\n' | tr '\n' ' ' | sed 's/ $//') # GPU IDs from rocm-smi + gpu_ids=$(echo "$gpus" | xargs) # Trim and normalize whitespaceOr simply use
$gpusdirectly if trailing space is acceptable.
5-5: Variablesjob_device,job_interface,job_slugare expected from caller.These variables are set by
submit.sh/submit-bench.shbefore this script runs. This is the standard pattern for these workflows but could benefit from a brief comment documenting the expected environment.Also applies to: 10-10, 19-19
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
toolchain/bootstrap/modules.sh (1)
41-49: Includefamdin the interactive prompt list.
The selection prompt still omitsfamd, so interactive users won’t see the new option listed.🧩 Suggested update
- log_n "($G""a$W/$G""f$W/$G""s$W/$G""w$W/$B""tuo$W/$C""b$W/$C""e$CR/$C""d/$C""dai$CR/$Y""p$CR/$R""r$CR/$B""cc$CR/$B""c$CR/$B""n$CR/$BR""o"$CR"/$OR""h"$CR"): " + log_n "($G""a$W/$G""f$W/$G""famd$W/$G""s$W/$G""w$W/$B""tuo$W/$C""b$W/$C""e$CR/$C""d/$C""dai$CR/$Y""p$CR/$R""r$CR/$B""cc$CR/$B""c$CR/$B""n$CR/$BR""o"$CR"/$OR""h"$CR"): "
🤖 Fix all issues with AI agents
In `@src/simulation/m_ibm.fpp`:
- Around line 184-196: The AMD branch uses fixed-size GPU-private arrays (Gs,
alpha_rho_IP, alpha_IP, r_IP, v_IP, pb_IP, mv_IP, nmom_IP, presb_IP, massv_IP)
sized 3/18/12 but kernels iterate up to num_fluids, nb, nmom, nnode causing
potential OOB access; fix by either replacing the magic sizes with the
compile-time maximum constants (if available) or, if not, adding fypp ASSERT
checks before the GPU loops (and the corresponding parameter checks at the
function entry) to ensure num_fluids <= 3, nb <= <presb_size>, nmom <=
<nmom_size>, nnode <= <massv_size>; mirror the same ASSERT guard for
dynamic_viscosities in s_compute_ib_forces so both places enforce identical
bounds.
In `@toolchain/bootstrap/modules.sh`:
- Around line 126-140: The LD_LIBRARY_PATH append should be guarded to avoid
creating empty path entries: check if CRAY_LD_LIBRARY_PATH is non-empty before
appending to LD_LIBRARY_PATH (use a conditional and only add the colon when
LD_LIBRARY_PATH already has content), and apply the same guarded-append pattern
for PATH (with OLCF_AFAR_ROOT/bin) and CMAKE_PREFIX_PATH (with OLCF_AFAR_ROOT)
so you never produce leading/trailing or empty-colon components; update the
block around the exports for OLCF_AFAR_ROOT, PATH, LD_LIBRARY_PATH, and
CMAKE_PREFIX_PATH accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
CMakeLists.txt (1)
365-381: MakeMFC_CASE_OPTIMIZATIONconfigurable instead of hard‑disabling it.
-D MFC_CASE_OPTIMIZATION=Falseforces the optimization off for every compiler/target; if the workaround is LLVMFlang‑specific, this likely overreaches. Consider gating it by compiler or by a CMake option and pass the option value into Fypp.💡 Example adjustment
@@ - -D MFC_CASE_OPTIMIZATION=False + -D MFC_CASE_OPTIMIZATION=${MFC_CASE_OPTIMIZATION}@@ option(MFC_MIXED_PRECISION "Build mixed precision" OFF) +option(MFC_CASE_OPTIMIZATION "Enable case-optimization in Fypp" ON) @@ +if (CMAKE_Fortran_COMPILER_ID STREQUAL "LLVMFlang") + set(MFC_CASE_OPTIMIZATION OFF) +endif()
🤖 Fix all issues with AI agents
In `@CMakeLists.txt`:
- Around line 455-458: Validate that the CRAY_* env vars are set and split their
space-separated contents into CMake lists before passing to
target_compile_options/target_link_libraries: check for CRAY_MPICH_LIB and
CRAY_HIPFORT_LIB presence and produce clear error messages if missing, use
string(REPLACE " " ";" ...) or string(REGEX REPLACE "\\s+" ";" ...) to convert
the environment string into a CMake list, then pass that list variable to
target_compile_options(${a_target} PRIVATE ...) and
target_link_libraries(${a_target} PRIVATE ...) instead of the raw "$ENV{...}" so
each flag is treated as a separate argument (apply same handling for both the
LLVMFlang branch and the other branch around lines where
CMAKE_Fortran_COMPILER_ID is tested).
🧹 Nitpick comments (1)
CMakeLists.txt (1)
433-434: Avoid per‑target compiler ID log spam.
This message fires once per target in the loop; consider logging once at configure time or add context (target name) to keep logs clean.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #1116 +/- ##
==========================================
- Coverage 44.16% 44.11% -0.06%
==========================================
Files 71 71
Lines 20417 20643 +226
Branches 1991 2036 +45
==========================================
+ Hits 9018 9106 +88
- Misses 10252 10363 +111
- Partials 1147 1174 +27 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
CodeAnt AI is running Incremental review Thanks for using CodeAnt! 🎉We're free for open-source projects. if you're enjoying it, help us grow by sharing. Share on X · |
|
CodeAnt AI Incremental review completed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 issue found across 5 files (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="src/common/m_checker_common.fpp">
<violation number="1" location="src/common/m_checker_common.fpp:33">
P2: Guard the `s_check_amd` call with the same `MFC_SIMULATION` preprocessor condition as its definition, otherwise AMD simulation builds will reference a subroutine that is not compiled.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/common/m_checker_common.fpp`:
- Around line 33-35: s_check_inputs_common calls s_check_amd under the #:if
USING_AMD guard but s_check_amd itself is conditionally omitted when
MFC_SIMULATION is defined, causing undefined symbol errors; fix by ensuring
s_check_amd is always compiled when USING_AMD is set (move or duplicate the
s_check_amd implementation outside the MFC_SIMULATION conditional or wrap its
declaration with the same USING_AMD guard), and update the other similar call
sites (the block mentioned around lines 53-61) so all calls to s_check_amd and
its implementation share the same preprocessor condition.
🧹 Nitpick comments (1)
src/common/m_checker_common.fpp (1)
5-6: Placemacros.fppinclude first.
case.fppnow precedes the required macro include. Ifcase.fppuses any Fypp helpers, this ordering can break preprocessing and also violates the module-start convention.♻️ Suggested reorder
-#:include 'case.fpp' -#:include 'macros.fpp' +#:include 'macros.fpp' +#:include 'case.fpp'As per coding guidelines, start module files with Fypp include for macros:
#:include 'macros.fpp'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/common/m_checker_common.fpp`:
- Around line 5-6: Move the include order so #:include 'macros.fpp' comes first
in the module file; currently #:include 'case.fpp' is above it—swap them so
'macros.fpp' is the first include to satisfy the fpp guideline and avoid missing
macro availability when processing this module.
| #:include 'case.fpp' | ||
| #:include 'macros.fpp' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move macros.fpp include back to the top.
The fpp guideline requires module files to start with #:include 'macros.fpp'. Putting case.fpp first can also cause macro availability surprises.
🛠️ Suggested fix
-#:include 'case.fpp'
-#:include 'macros.fpp'
+#:include 'macros.fpp'
+#:include 'case.fpp'As per coding guidelines, start module files with #:include 'macros.fpp'.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| #:include 'case.fpp' | |
| #:include 'macros.fpp' | |
| #:include 'macros.fpp' | |
| #:include 'case.fpp' |
🤖 Prompt for AI Agents
In `@src/common/m_checker_common.fpp` around lines 5 - 6, Move the include order
so #:include 'macros.fpp' comes first in the module file; currently #:include
'case.fpp' is above it—swap them so 'macros.fpp' is the first include to satisfy
the fpp guideline and avoid missing macro availability when processing this
module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 issue found across 7 files (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="src/simulation/m_compute_cbc.fpp">
<violation number="1" location="src/simulation/m_compute_cbc.fpp:39">
P1: The fixed `dimension(20)` for AMD is too small for valid configurations (e.g., model_eqns=2 with 10 fluids yields advxe=24), so loops that write up to advxe/chemxe can overrun L. Use a size that covers the maximum sys_size on AMD (or keep `sys_size`) to avoid out-of-bounds writes.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
User description
Description
This PR enables the use of amdflang compiler for MFC on OLCF Frontier.
Primary changes include static private arrays instead of automatic arrays in GPU kernels.
With case-optimization turned on, a conditional statement with a parameter followed by a GPU kernel can sometimes lead to a catastrophic error when the conditional is false. This is guarded using a 'dummy' variable for now until it gets patched.
Fixes #(issue) [optional]
Type of change
Please delete options that are not relevant.
Scope
If you cannot check the above box, please split your PR into multiple PRs that each have a common goal.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes.
Provide instructions so we can reproduce.
Please also list any relevant details for your test configuration
Test Configuration:
Checklist
docs/)examples/that demonstrate my new feature performing as expected.They run to completion and demonstrate "interesting physics"
./mfc.sh formatbefore committing my codeIf your code changes any code source files (anything in
src/simulation)To make sure the code is performing as expected on GPU devices, I have:
nvtxranges so that they can be identified in profiles./mfc.sh run XXXX --gpu -t simulation --nsys, and have attached the output file (.nsys-rep) and plain text results to this PR./mfc.sh run XXXX --gpu -t simulation --rsys --hip-trace, and have attached the output file and plain text results to this PR.Summary by CodeRabbit
New Features
Chores
Behavior Changes
✏️ Tip: You can customize this high-level summary in your review settings.
CodeAnt-AI Description
Enable AMD compiler support and guard against a case-optimization bug that crashes GPU kernels
What Changed
Impact
✅ Run on AMD Frontier with amdflang✅ Fewer GPU kernel crashes caused by case-optimization✅ Consistent viscous and multi-physics results when compiling with AMD toolchain💡 Usage Guide
Checking Your Pull Request
Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.
Talking to CodeAnt AI
Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:
This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.
Example
Preserve Org Learnings with CodeAnt
You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:
This helps CodeAnt AI learn and adapt to your team's coding style and standards.
Example
Retrigger review
Ask CodeAnt AI to review the PR again, by typing:
Check Your Repository Health
To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.