Skip to content

Conversation

@anandrdbz
Copy link
Contributor

@anandrdbz anandrdbz commented Jan 29, 2026

User description

Description

This PR enables the use of amdflang compiler for MFC on OLCF Frontier.
Primary changes include static private arrays instead of automatic arrays in GPU kernels.
With case-optimization turned on, a conditional statement with a parameter followed by a GPU kernel can sometimes lead to a catastrophic error when the conditional is false. This is guarded using a 'dummy' variable for now until it gets patched.

Fixes #(issue) [optional]

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Something else

Scope

  • This PR comprises a set of related changes with a common goal

If you cannot check the above box, please split your PR into multiple PRs that each have a common goal.

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.
Provide instructions so we can reproduce.
Please also list any relevant details for your test configuration

  • Test A
  • Test B

Test Configuration:

  • What computers and compilers did you use to test this:

Checklist

  • I have added comments for the new code
  • I added Doxygen docstrings to the new code
  • I have made corresponding changes to the documentation (docs/)
  • I have added regression tests to the test suite so that people can verify in the future that the feature is behaving as expected
  • I have added example cases in examples/ that demonstrate my new feature performing as expected.
    They run to completion and demonstrate "interesting physics"
  • I ran ./mfc.sh format before committing my code
  • New and existing tests pass locally with my changes, including with GPU capability enabled (both NVIDIA hardware with NVHPC compilers and AMD hardware with CRAY compilers) and disabled
  • This PR does not introduce any repeated code (it follows the DRY principle)
  • I cannot think of a way to condense this code and reduce any introduced additional line count

If your code changes any code source files (anything in src/simulation)

To make sure the code is performing as expected on GPU devices, I have:

  • Checked that the code compiles using NVHPC compilers
  • Checked that the code compiles using CRAY compilers
  • Ran the code on either V100, A100, or H100 GPUs and ensured the new feature performed as expected (the GPU results match the CPU results)
  • Ran the code on MI200+ GPUs and ensure the new features performed as expected (the GPU results match the CPU results)
  • Enclosed the new feature via nvtx ranges so that they can be identified in profiles
  • Ran a Nsight Systems profile using ./mfc.sh run XXXX --gpu -t simulation --nsys, and have attached the output file (.nsys-rep) and plain text results to this PR
  • Ran a Rocprof Systems profile using ./mfc.sh run XXXX --gpu -t simulation --rsys --hip-trace, and have attached the output file and plain text results to this PR.
  • Ran my code using various numbers of different GPUs (1, 2, and 8, for example) in parallel and made sure that the results scale similarly to what happens if you run without the new code/feature

Summary by CodeRabbit

  • New Features

    • End-to-end support for AMD Frontier: build/submit/benchmark helpers and runtime template for Frontier AMD runs; AMD-targeted performance/configuration optimizations.
  • Chores

    • CI matrix and log-archiving extended to include Frontier AMD; toolchain and environment bootstrapping updated for AMD/Cray targets.
  • Behavior Changes

    • MUSCL defaults adjusted.
    • New boolean flag (default false) added to enable alternate execution/test paths.

✏️ Tip: You can customize this high-level summary in your review settings.


CodeAnt-AI Description

Enable AMD compiler support and guard against a case-optimization bug that crashes GPU kernels

What Changed

  • Add an AMD-specific build path that uses fixed-size local arrays and a non-parameter molecular weights array when compiling for amdflang, reducing reliance on automatic arrays.
  • Introduce a global "dummy" flag and use it to ensure code paths that precede GPU kernels (viscous and related conditionals) remain safe when case-optimization would otherwise remove guarded branches.
  • Wrap many dimension- and optimization-sensitive blocks with conditional macros so viscous, MHD/relativistic, reconstruction, and bubble routines behave correctly under AMD toolchains.
  • Ensure molecular weight lookups and species-related temporary arrays use AMD-safe variants to avoid runtime errors on Frontier.

Impact

✅ Run on AMD Frontier with amdflang
✅ Fewer GPU kernel crashes caused by case-optimization
✅ Consistent viscous and multi-physics results when compiling with AMD toolchain

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

@codeant-ai
Copy link
Contributor

codeant-ai bot commented Jan 29, 2026

CodeAnt AI is reviewing your PR.


Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 29, 2026

📝 Walkthrough

Walkthrough

Adds Frontier AMD (famd) support: new CI matrix entries and workflow scripts, AFAR/OLCF module bootstraps and toolchain templates, CMake/toolchain adjustments for LLVMFlang/Cray, and widespread AMD-guarded conditional array-shape and control-flow changes across many Fortran sources.

Changes

Cohort / File(s) Summary
CI Matrix & Orchestration
\.github/workflows/bench.yml, \.github/workflows/test.yml
Added frontier_amd matrix entries and lbl_comp routing; distinct Build/Test/Archive steps for frontier_amd.
Frontier AMD Workflow Scripts
\.github/workflows/frontier_amd/...
\.github/workflows/frontier_amd/build.sh, bench.sh, test.sh, submit.sh, submit-bench.sh
New scripts for build/test/bench/submit: GPU detection (rocm-smi), device/interface flags, sbatch generation, job slugs, and invoking mfc.sh with computed device options.
Toolchain Templates & Modules
toolchain/templates/frontier_amd.mako, toolchain/bootstrap/modules.sh, toolchain/modules
Added famd template and module entries; bootstrap/modules updated to configure AFAR/ROCm/Cray paths and famd branch.
Environment Loaders
load_amd.sh, toolchain/bootstrap/modules.sh
Reworked AMD environment bootstrap: AFAR root detection, PATH/LD_LIBRARY_PATH updates, CRAY_* include/lib variables, CMAKE_PREFIX_PATH, FC override, runtime/offload flags, and exported AFAR vars.
Build System
CMakeLists.txt
Added compiler-ID aware branches for LLVMFlang (MPI includes/linking, hipfort/FFTW/CUDA handling) and extended HANDLE_SOURCES macro signature to accept useOpenACC/useOpenMP.
Preprocessor & Macro Layer
src/common/include/shared_parallel_macros.fpp
Added compiler ID constants and boolean flags (USING_AMD, etc.), stronger macro input validation/parsing.
Global Flags / Dummy
src/pre_process/m_global_parameters.fpp, src/post_process/m_global_parameters.fpp, src/simulation/m_global_parameters.fpp
Introduced public logical dummy with default .false.; used to broaden conditional branches.
Variable Conversion & Public Signatures
src/common/m_variables_conversion.fpp
Made G optional in s_convert_to_mixture_variables; introduced AMD-conditional signature/array-shape variants (3 vs num_fluids) for multiple public routines.
Widespread AMD-conditioned Sources
src/common/..., src/simulation/...
e.g. src/common/m_boundary_common.fpp, m_chemistry.fpp, m_phase_change.fpp, m_riemann_solvers.fpp, m_weno.fpp, m_viscous.fpp, m_rhs.fpp, m_qbmm.fpp
Large-scale addition of USING_AMD / not MFC_CASE_OPTIMIZATION guards: alternate array dimensionalities, updated private lists, inline replacements, removed helpers, and many conditional local-declaration changes; several public signatures adjusted where required.
Benchmarking / CI Glue & Filters
\.github/workflows/frontier_amd/*.sh, toolchain/templates/frontier_amd.mako, .github/file-filter.yml
Templates/scripts coordinate sbatch/srun, MPI vs non‑MPI execution, profiling hooks, artifact naming; file-filter extended to include frontier_amd path.
Toolchain & Tests tweaks
toolchain/mfc/case.py, toolchain/mfc/test/case.py
Default muscl_order/muscl_polyn defaults changed to 1; Axisymmetric tolerance special-case added in test heuristic.
Misc / Diagnostics
CMakeLists.txt, small formatting
Added Fortran compiler-ID status message and minor formatting/alignment changes.

Sequence Diagram(s)

sequenceDiagram
  participant GH as GitHub Actions
  participant Build as frontier_amd/build.sh
  participant Submit as frontier_amd/submit.sh
  participant Slurm as Slurm Scheduler
  participant Node as Compute Node
  participant Env as mfc.sh / toolchain env
  participant ROCm as rocm-smi
  participant Bench as mfc.sh bench/test

  GH->>Build: start build (device/interface)
  Build->>Env: mfc.sh load -c famd -m g
  Build-->>GH: build artifacts
  GH->>Submit: run submit (script, device)
  Submit->>Slurm: sbatch (embedded script, job_slug)
  Slurm->>Node: allocate node & start job
  Node->>Env: source mfc.sh (AFAR/ROCm)
  Node->>ROCm: rocm-smi (detect GPUs)
  Node->>Bench: run mfc.sh bench/test with device_opts
  Bench-->>GH: upload artifacts/logs
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Suggested reviewers

  • sbryngelson
  • wilfonba

"🐇 I hopped through modules, flags in hand,
Frontier AMD now hums across the land.
Scripts submit, templates fold the run,
Small buffers trimmed, the builds are spun.
Carrots for CI — a sprint well-planned!"

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies the primary change as adding AMDFlang compiler support to MFC, which is directly reflected throughout the changeset with AMD-specific build paths, conditional macros, and Frontier-specific configurations.
Description check ✅ Passed The PR description provides context about enabling amdflang compiler support and mentions the main technical issue (case-optimization GPU kernel bug), the solution (dummy variable guard), and the scope of changes. However, critical testing checklist items remain unchecked (GPU validation, regression tests, compiler compatibility checks), indicating testing may not be complete.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codeant-ai codeant-ai bot added the size:XXL This PR changes 1000+ lines, ignoring generated files label Jan 29, 2026
@qodo-code-review
Copy link
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Possible Issue

Several arrays are hard-sized under USING_AMD (e.g., L(12), Ys(10), alpha_rho(3), vel(3)). This risks out-of-bounds or silent truncation if sys_size, num_species, num_fluids, num_vels, or num_dims exceed those constants on any configuration, and it can also break assumptions when chemistry/species counts vary.

#:if USING_AMD
    real(wp), dimension(12) :: L
#:else
    real(wp), dimension(sys_size) :: L
#:endif
#:if not MFC_CASE_OPTIMIZATION and USING_AMD
    real(wp), dimension(3) :: alpha_rho, dalpha_rho_ds, mf
    real(wp), dimension(3) :: vel, dvel_ds
    real(wp), dimension(3) :: adv_local, dadv_ds
    real(wp), dimension(3) :: dadv_dt
    real(wp), dimension(3) :: dvel_dt
    real(wp), dimension(3) :: dalpha_rho_dt
    real(wp), dimension(10) :: Ys, h_k, dYs_dt, dYs_ds, Xs, Gamma_i, Cp_i
#:else
    real(wp), dimension(num_fluids) :: alpha_rho, dalpha_rho_ds, mf
    real(wp), dimension(num_vels) :: vel, dvel_ds
    real(wp), dimension(num_fluids) :: adv_local, dadv_ds
    real(wp), dimension(num_fluids) :: dadv_dt
    real(wp), dimension(num_dims) :: dvel_dt
    real(wp), dimension(num_fluids) :: dalpha_rho_dt
    real(wp), dimension(num_species) :: Ys, h_k, dYs_dt, dYs_ds, Xs, Gamma_i, Cp_i
#:endif
real(wp), dimension(2) :: Re_cbc
Duplicate Code

The shear_stress and bulk_stress GPU loops duplicate substantial logic (loading alpha_*, bubbles/mpp_lim branches, computing mixture variables, and Re_visc). This increases maintenance burden and the chance of future divergence/bugs; consider extracting common parts into a shared routine/macro or factoring the repeated per-cell computations.

#:if not MFC_CASE_OPTIMIZATION or num_dims > 1
    if (shear_stress) then    ! Shear stresses
        $:GPU_PARALLEL_LOOP(collapse=3, private='[i,j,k,l,rho_visc, gamma_visc, pi_inf_visc, alpha_visc_sum ,alpha_visc, alpha_rho_visc, Re_visc, tau_Re]')
        do l = is3_viscous%beg, is3_viscous%end
            do k = -1, 1
                do j = is1_viscous%beg, is1_viscous%end

                    $:GPU_LOOP(parallelism='[seq]')
                    do i = 1, num_fluids
                        alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l)
                        if (bubbles_euler .and. num_fluids == 1) then
                            alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l)
                        else
                            alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l)
                        end if
                    end do

                    if (bubbles_euler) then
                        rho_visc = 0._wp
                        gamma_visc = 0._wp
                        pi_inf_visc = 0._wp

                        if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then
                            $:GPU_LOOP(parallelism='[seq]')
                            do i = 1, num_fluids
                                rho_visc = rho_visc + alpha_rho_visc(i)
                                gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
                                pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
                            end do
                        else if ((model_eqns == 2) .and. (num_fluids > 2)) then
                            $:GPU_LOOP(parallelism='[seq]')
                            do i = 1, num_fluids - 1
                                rho_visc = rho_visc + alpha_rho_visc(i)
                                gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
                                pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
                            end do
                        else
                            rho_visc = alpha_rho_visc(1)
                            gamma_visc = gammas(1)
                            pi_inf_visc = pi_infs(1)
                        end if
                    else
                        rho_visc = 0._wp
                        gamma_visc = 0._wp
                        pi_inf_visc = 0._wp

                        alpha_visc_sum = 0._wp

                        if (mpp_lim) then
                            $:GPU_LOOP(parallelism='[seq]')
                            do i = 1, num_fluids
                                alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i))
                                alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp)
                                alpha_visc_sum = alpha_visc_sum + alpha_visc(i)
                            end do

                            alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps)

                        end if

                        $:GPU_LOOP(parallelism='[seq]')
                        do i = 1, num_fluids
                            rho_visc = rho_visc + alpha_rho_visc(i)
                            gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
                            pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
                        end do

                        if (viscous) then
                            $:GPU_LOOP(parallelism='[seq]')
                            do i = 1, 2
                                Re_visc(i) = dflt_real

                                if (Re_size(i) > 0) Re_visc(i) = 0._wp
                                $:GPU_LOOP(parallelism='[seq]')
                                do q = 1, Re_size(i)
                                    Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) &
                                                 + Re_visc(i)
                                end do

                                Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps)

                            end do
                        end if
                    end if

                    tau_Re(2, 1) = (grad_y_vf(1)%sf(j, k, l) + &
                                    grad_x_vf(2)%sf(j, k, l))/ &
                                   Re_visc(1)

                    tau_Re(2, 2) = (4._wp*grad_y_vf(2)%sf(j, k, l) &
                                    - 2._wp*grad_x_vf(1)%sf(j, k, l) &
                                    - 2._wp*q_prim_vf(momxb + 1)%sf(j, k, l)/y_cc(k))/ &
                                   (3._wp*Re_visc(1))
                    $:GPU_LOOP(parallelism='[seq]')
                    do i = 1, 2
                        tau_Re_vf(contxe + i)%sf(j, k, l) = &
                            tau_Re_vf(contxe + i)%sf(j, k, l) - &
                            tau_Re(2, i)

                        tau_Re_vf(E_idx)%sf(j, k, l) = &
                            tau_Re_vf(E_idx)%sf(j, k, l) - &
                            q_prim_vf(contxe + i)%sf(j, k, l)*tau_Re(2, i)
                    end do
                end do
            end do
        end do
        $:END_GPU_PARALLEL_LOOP()
    end if
#:endif

#:if not MFC_CASE_OPTIMIZATION or num_dims > 1
    if (bulk_stress) then    ! Bulk stresses
        $:GPU_PARALLEL_LOOP(collapse=3, private='[i,j,k,l,rho_visc, gamma_visc, pi_inf_visc, alpha_visc_sum ,alpha_visc, alpha_rho_visc, Re_visc, tau_Re]')
        do l = is3_viscous%beg, is3_viscous%end
            do k = -1, 1
                do j = is1_viscous%beg, is1_viscous%end

                    $:GPU_LOOP(parallelism='[seq]')
                    do i = 1, num_fluids
                        alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l)
                        if (bubbles_euler .and. num_fluids == 1) then
                            alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l)
                        else
                            alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l)
                        end if
                    end do

                    if (bubbles_euler) then
                        rho_visc = 0._wp
                        gamma_visc = 0._wp
                        pi_inf_visc = 0._wp

                        if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then
                            $:GPU_LOOP(parallelism='[seq]')
                            do i = 1, num_fluids
                                rho_visc = rho_visc + alpha_rho_visc(i)
                                gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
                                pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
                            end do
                        else if ((model_eqns == 2) .and. (num_fluids > 2)) then
                            $:GPU_LOOP(parallelism='[seq]')
                            do i = 1, num_fluids - 1
                                rho_visc = rho_visc + alpha_rho_visc(i)
                                gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
                                pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
                            end do
                        else
                            rho_visc = alpha_rho_visc(1)
                            gamma_visc = gammas(1)
                            pi_inf_visc = pi_infs(1)
                        end if
                    else
                        rho_visc = 0._wp
                        gamma_visc = 0._wp
                        pi_inf_visc = 0._wp

                        alpha_visc_sum = 0._wp

                        if (mpp_lim) then
                            $:GPU_LOOP(parallelism='[seq]')
                            do i = 1, num_fluids
                                alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i))
                                alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp)
                                alpha_visc_sum = alpha_visc_sum + alpha_visc(i)
                            end do

                            alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps)

                        end if

                        $:GPU_LOOP(parallelism='[seq]')
                        do i = 1, num_fluids
                            rho_visc = rho_visc + alpha_rho_visc(i)
                            gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
                            pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
                        end do

                        if (viscous) then
                            $:GPU_LOOP(parallelism='[seq]')
                            do i = 1, 2
                                Re_visc(i) = dflt_real

                                if (Re_size(i) > 0) Re_visc(i) = 0._wp
                                $:GPU_LOOP(parallelism='[seq]')
                                do q = 1, Re_size(i)
                                    Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) &
                                                 + Re_visc(i)
                                end do

                                Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps)

                            end do
                        end if
                    end if

                    tau_Re(2, 2) = (grad_x_vf(1)%sf(j, k, l) + &
                                    grad_y_vf(2)%sf(j, k, l) + &
                                    q_prim_vf(momxb + 1)%sf(j, k, l)/y_cc(k))/ &
                                   Re_visc(2)

                    tau_Re_vf(momxb + 1)%sf(j, k, l) = &
                        tau_Re_vf(momxb + 1)%sf(j, k, l) - &
                        tau_Re(2, 2)

                    tau_Re_vf(E_idx)%sf(j, k, l) = &
                        tau_Re_vf(E_idx)%sf(j, k, l) - &
                        q_prim_vf(momxb + 1)%sf(j, k, l)*tau_Re(2, 2)

                end do
            end do
        end do
        $:END_GPU_PARALLEL_LOOP()
    end if
#:endif
Guard Logic

The added dummy-based guards (e.g., if (weno_order == 1 .or. dummy) and similar patterns) change control-flow structure and can cause multiple reconstruction branches to execute depending on how dummy is defined. This should be validated carefully to ensure no unintended extra work, no double-writes to outputs, and no change in numerical results when the guard is meant to be inert.

if (weno_order /= 1 .or. dummy) then
    call s_initialize_weno(v_vf, &
                           weno_dir)
end if

if (weno_order == 1 .or. dummy) then
    if (weno_dir == 1) then
        $:GPU_PARALLEL_LOOP(collapse=4)
        do i = 1, ubound(v_vf, 1)
            do l = is3_weno%beg, is3_weno%end
                do k = is2_weno%beg, is2_weno%end
                    do j = is1_weno%beg, is1_weno%end
                        vL_rs_vf_x(j, k, l, i) = v_vf(i)%sf(j, k, l)
                        vR_rs_vf_x(j, k, l, i) = v_vf(i)%sf(j, k, l)
                    end do
                end do
            end do
        end do
        $:END_GPU_PARALLEL_LOOP()
    else if (weno_dir == 2) then
        $:GPU_PARALLEL_LOOP(collapse=4)
        do i = 1, ubound(v_vf, 1)
            do l = is3_weno%beg, is3_weno%end
                do k = is2_weno%beg, is2_weno%end
                    do j = is1_weno%beg, is1_weno%end
                        vL_rs_vf_y(j, k, l, i) = v_vf(i)%sf(k, j, l)
                        vR_rs_vf_y(j, k, l, i) = v_vf(i)%sf(k, j, l)
                    end do
                end do
            end do
        end do
        $:END_GPU_PARALLEL_LOOP()
    else if (weno_dir == 3) then
        $:GPU_PARALLEL_LOOP(collapse=4)
        do i = 1, ubound(v_vf, 1)
            do l = is3_weno%beg, is3_weno%end
                do k = is2_weno%beg, is2_weno%end
                    do j = is1_weno%beg, is1_weno%end
                        vL_rs_vf_z(j, k, l, i) = v_vf(i)%sf(l, k, j)
                        vR_rs_vf_z(j, k, l, i) = v_vf(i)%sf(l, k, j)
                    end do
                end do
            end do
        end do
        $:END_GPU_PARALLEL_LOOP()
    end if
end if
if (weno_order == 3 .or. dummy) then
    #:for WENO_DIR, XYZ in [(1, 'x'), (2, 'y'), (3, 'z')]
        if (weno_dir == ${WENO_DIR}$) then
            $:GPU_PARALLEL_LOOP(collapse=4,private='[beta,dvd,poly,omega,alpha,tau]')
            do l = is3_weno%beg, is3_weno%end
                do k = is2_weno%beg, is2_weno%end
                    do j = is1_weno%beg, is1_weno%end
                        do i = 1, v_size
                            ! reconstruct from left side

                            alpha(:) = 0._wp
                            omega(:) = 0._wp
                            beta(:) = weno_eps

                            dvd(0) = v_rs_ws_${XYZ}$ (j + 1, k, l, i) &
                                     - v_rs_ws_${XYZ}$ (j, k, l, i)
                            dvd(-1) = v_rs_ws_${XYZ}$ (j, k, l, i) &
                                      - v_rs_ws_${XYZ}$ (j - 1, k, l, i)

                            poly(0) = v_rs_ws_${XYZ}$ (j, k, l, i) &
                                      + poly_coef_cbL_${XYZ}$ (j, 0, 0)*dvd(0)
                            poly(1) = v_rs_ws_${XYZ}$ (j, k, l, i) &
                                      + poly_coef_cbL_${XYZ}$ (j, 1, 0)*dvd(-1)

                            beta(0) = beta_coef_${XYZ}$ (j, 0, 0)*dvd(0)*dvd(0) &
                                      + weno_eps
                            beta(1) = beta_coef_${XYZ}$ (j, 1, 0)*dvd(-1)*dvd(-1) &
                                      + weno_eps

                            if (wenojs) then
                                alpha(0:weno_num_stencils) = d_cbL_${XYZ}$ (0:weno_num_stencils, j)/(beta(0:weno_num_stencils)**2._wp)

                            elseif (mapped_weno) then
                                alpha(0:weno_num_stencils) = d_cbL_${XYZ}$ (0:weno_num_stencils, j)/(beta(0:weno_num_stencils)**2._wp)
                                omega = alpha/sum(alpha)
                                alpha(0:weno_num_stencils) = (d_cbL_${XYZ}$ (0:weno_num_stencils, j)*(1._wp + d_cbL_${XYZ}$ (0:weno_num_stencils, j) - 3._wp*omega(0:weno_num_stencils)) + omega(0:weno_num_stencils)**2._wp) &
                                                             *(omega(0:weno_num_stencils)/(d_cbL_${XYZ}$ (0:weno_num_stencils, j)**2._wp + omega(0:weno_num_stencils)*(1._wp - 2._wp*d_cbL_${XYZ}$ (0:weno_num_stencils, j))))

                            elseif (wenoz) then
                                ! Borges, et al. (2008)

                                tau = abs(beta(1) - beta(0))
                                alpha(0:weno_num_stencils) = d_cbL_${XYZ}$ (0:weno_num_stencils, j)*(1._wp + tau/beta(0:weno_num_stencils))

                            end if

                            omega = alpha/sum(alpha)

                            vL_rs_vf_${XYZ}$ (j, k, l, i) = omega(0)*poly(0) + omega(1)*poly(1)

                            ! reconstruct from right side

                            poly(0) = v_rs_ws_${XYZ}$ (j, k, l, i) &
                                      + poly_coef_cbR_${XYZ}$ (j, 0, 0)*dvd(0)
                            poly(1) = v_rs_ws_${XYZ}$ (j, k, l, i) &
                                      + poly_coef_cbR_${XYZ}$ (j, 1, 0)*dvd(-1)

                            if (wenojs) then
                                alpha(0:weno_num_stencils) = d_cbR_${XYZ}$ (0:weno_num_stencils, j)/(beta(0:weno_num_stencils)**2._wp)

                            elseif (mapped_weno) then
                                alpha(0:weno_num_stencils) = d_cbR_${XYZ}$ (0:weno_num_stencils, j)/(beta(0:weno_num_stencils)**2._wp)
                                omega = alpha/sum(alpha)
                                alpha(0:weno_num_stencils) = (d_cbR_${XYZ}$ (0:weno_num_stencils, j)*(1._wp + d_cbR_${XYZ}$ (0:weno_num_stencils, j) - 3._wp*omega(0:weno_num_stencils)) + omega(0:weno_num_stencils)**2._wp) &
                                                             *(omega(0:weno_num_stencils)/(d_cbR_${XYZ}$ (0:weno_num_stencils, j)**2._wp + omega(0:weno_num_stencils)*(1._wp - 2._wp*d_cbR_${XYZ}$ (0:weno_num_stencils, j))))

                            elseif (wenoz) then

                                alpha(0:weno_num_stencils) = d_cbR_${XYZ}$ (0:weno_num_stencils, j)*(1._wp + tau/beta(0:weno_num_stencils))

                            end if

                            omega = alpha/sum(alpha)

                            vR_rs_vf_${XYZ}$ (j, k, l, i) = omega(0)*poly(0) + omega(1)*poly(1)

                        end do
                    end do
                end do
            end do
            $:END_GPU_PARALLEL_LOOP()
        end if
    #:endfor
end if
if (weno_order == 5 .or. dummy) then
    #:if not MFC_CASE_OPTIMIZATION or weno_num_stencils > 1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High-level Suggestion

Replace hardcoded array sizes (e.g., dimension(3), dimension(10)) with preprocessor macros derived from case-specific configurations. This will prevent potential runtime errors when simulation parameters change, making the amdflang support more robust. [High-level, importance: 9]

Solution Walkthrough:

Before:

! in src/simulation/m_riemann_solvers.fpp
subroutine s_hllc_flux(...)
    ...
    #:if not MFC_CASE_OPTIMIZATION and USING_AMD
        real(wp), dimension(3) :: alpha_L, alpha_R
        real(wp), dimension(10) :: Ys_L, Ys_R
        ...
    #:else
        real(wp), dimension(num_fluids) :: alpha_L, alpha_R
        real(wp), dimension(num_species) :: Ys_L, Ys_R
        ...
    #:endif
    ...
end subroutine s_hllc_flux

After:

! in case.fpp or similar config header
#define NUM_FLUIDS_STATIC 3
#define NUM_SPECIES_STATIC 10

! in src/simulation/m_riemann_solvers.fpp
subroutine s_hllc_flux(...)
    ...
    #:if not MFC_CASE_OPTIMIZATION and USING_AMD
        real(wp), dimension(NUM_FLUIDS_STATIC) :: alpha_L, alpha_R
        real(wp), dimension(NUM_SPECIES_STATIC) :: Ys_L, Ys_R
        ...
    #:else
        real(wp), dimension(num_fluids) :: alpha_L, alpha_R
        real(wp), dimension(num_species) :: Ys_L, Ys_R
        ...
    #:endif
    ...
end subroutine s_hllc_flux

@codeant-ai
Copy link
Contributor

codeant-ai bot commented Jan 29, 2026

Nitpicks 🔍

🔒 No security issues identified
⚡ Recommended areas for review

  • Race condition (missing private)
    The GPU parallel loop used for the radial motion model declares many local temporaries in the private list, but it no longer marks cell and Re as private. Both are assigned and read inside the loop (and are declared outside), so they must be private per-iteration to avoid data races when the loop runs in parallel.

  • Dimension mismatch risk
    The code uses a hard-coded size 12 for the CBC state vector L under USING_AMD. If the real sys_size differs from 12 (larger or smaller) this will cause out-of-bounds accesses or unused space. The same dimension(12) appears repeatedly across many procedures, increasing the chance of a mismatch with the rest of the codebase.

  • Fixed-size local arrays
    When building with AMD support the code declares several local arrays with small, hard-coded dimensions (e.g. L dimensioned to 12, alpha_rho, vel, Ys, h_k, etc. dimensioned to 3 or 10). If the actual runtime sizes (e.g. sys_size, num_species, num_vels, num_fluids) exceed these static values this will lead to out-of-bounds writes at runtime. The static sizing was likely introduced to avoid automatic arrays on AMD, but it must be made safe for all supported configurations or fail early with a clear error.

  • Array-size mismatch
    The AMD-specific declaration fixes tensora/tensorb to length 9, but the routine continues to index up to tensor_size and loop over 1:tensor_size. If tensor_size != 9 this will cause out-of-bounds accesses on AMD builds. Verify that tensor_size is always 9 when USING_AMD, or make the AMD declaration use the same compile-time constant used throughout the routine.

  • Array bounds
    The AMD build path fixes coefficient arrays to size 32 (e.g. coeff dimensioned as (32,0:2,0:2)), but runtime loops elsewhere iterate up to nterms. If nterms can be >32 (or the build assumptions change), this will produce out-of-bounds accesses or silently ignore entries. Conversely, if nterms < 32 some entries are unused but still allocated. Verify and either clamp loops to the actual coefficient array size or ensure nterms is guaranteed (and asserted) to be <= 32 when USING_AMD.

  • Use of dummy guard
    The conditional that starts Jacobi copy-back uses dummy in "if (igr_iter_solver == 1 .or. dummy)". The PR comment mentions dummy is a temporary workaround for a compiler bug. Ensure dummy is declared in this module or propagated from a well-defined scope and explicitly initialized to the intended logical value; otherwise the condition may evaluate unexpectedly and change control flow (and performance) in production builds.

@codeant-ai
Copy link
Contributor

codeant-ai bot commented Jan 29, 2026

CodeAnt AI finished reviewing your PR.

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

16 issues found across 44 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="src/simulation/m_bubbles_EE.fpp">

<violation number="1" location="src/simulation/m_bubbles_EE.fpp:169">
P1: Rtmp/Vtmp are truncated to three elements under AMD builds, but the loops still write up to nb entries, so nb>3 overflows these buffers.</violation>

<violation number="2" location="src/simulation/m_bubbles_EE.fpp:170">
P1: myalpha/myalpha_rho are limited to three elements under AMD builds, but the code still iterates to num_fluids, so cases with >3 fluids overflow the arrays.</violation>
</file>

<file name="src/simulation/m_ibm.fpp">

<violation number="1" location="src/simulation/m_ibm.fpp:183">
P1: Limiting `alpha_rho_IP`/`alpha_IP` to three elements causes out-of-bounds writes when a case configures more than three fluids, because the loops below still run to `num_fluids`. Use the maximum supported fluid count instead of a hard-coded `3`.</violation>
</file>

<file name="toolchain/bootstrap/modules.sh">

<violation number="1" location="toolchain/bootstrap/modules.sh:126">
P1: Add a space between `if` and `[` so the new Frontier AMD branch executes instead of causing a syntax error.</violation>
</file>

<file name=".github/workflows/frontier_amd/test.sh">

<violation number="1" location=".github/workflows/frontier_amd/test.sh:4">
P2: `wc -c` counts characters, so nodes with multi-digit GPU IDs produce an inflated `ngpus`, causing `mfc.sh test` to request too many parallel jobs. Count IDs by words instead of characters.</violation>
</file>

<file name="src/simulation/m_acoustic_src.fpp">

<violation number="1" location="src/simulation/m_acoustic_src.fpp:147">
P1: The AMD path sizes `myalpha` arrays to length 3, but the code still writes up to `num_fluids` elements (up to 10), leading to out‑of‑bounds memory corruption for cases with more than three fluids on AMD builds.</violation>
</file>

<file name="src/simulation/m_pressure_relaxation.fpp">

<violation number="1" location="src/simulation/m_pressure_relaxation.fpp:150">
P1: The AMD branch fixes `pres_K_init`/`rho_K_s` to length 3 but the subsequent logic still iterates up to `num_fluids`, so any case with more than three fluids will index past the allocated storage. Use a compile-time constant such as `num_fluids_max` to keep the arrays static without truncating their capacity.</violation>

<violation number="2" location="src/simulation/m_pressure_relaxation.fpp:221">
P1: Limiting `alpha_rho`/`alpha` to length 2 under AMD allows only two fluids, yet the routine still loops to `num_fluids` and contains branches for `num_fluids > 2`. This causes out-of-bounds writes whenever more than two fluids are present. Allocate these arrays to `num_fluids_max` (or another compile-time upper bound) instead of 2.</violation>
</file>

<file name="src/simulation/m_hyperelastic.fpp">

<violation number="1" location="src/simulation/m_hyperelastic.fpp:102">
P1: The AMD-specific tensors only have 9 elements but the routine still writes the determinant at `tensorb(tensor_size)` (`tensor_size = num_dims**2 + 1`), so AMD builds write past the end of `tensorb`. Allocate enough elements for the determinant as well.</violation>
</file>

<file name=".github/workflows/frontier_amd/submit-bench.sh">

<violation number="1" location=".github/workflows/frontier_amd/submit-bench.sh:49">
P2: CPU submissions are ignored—the helper always loads the GPU environment, so CPU bench jobs will run with the wrong modules.</violation>
</file>

<file name="src/common/m_boundary_common.fpp">

<violation number="1" location="src/common/m_boundary_common.fpp:1082">
P1: The y-direction Jacobian buffer updates are now compiled only when `num_dims > 2`, so 2‑D case-optimized builds skip all y-direction boundary handling in `s_populate_F_igr_buffers`. Guard this block with `num_dims > 1` so it still runs for 2‑D cases.</violation>
</file>

<file name="src/simulation/m_cbc.fpp">

<violation number="1" location="src/simulation/m_cbc.fpp:659">
P1: AMD-only CBC arrays for fluid properties are limited to length 3, but the code still iterates up to `num_fluids`, so cases with more than three fluids will overflow these buffers and produce incorrect fluxes.</violation>
</file>

<file name="src/common/m_variables_conversion.fpp">

<violation number="1" location="src/common/m_variables_conversion.fpp:1191">
P1: The AMD-specific version of `s_convert_primitive_to_flux_variables` truncates `alpha_rho_K/alpha_K` to length 3 while the routine still fills them for every fluid (`contxe = num_fluids`), causing buffer overruns on multi-fluid cases.</violation>

<violation number="2" location="src/common/m_variables_conversion.fpp:1338">
P1: `s_compute_species_fraction` now limits its output arrays to length 3 while still iterating over all `num_fluids`, so AMD builds will corrupt memory as soon as a case uses more than three fluids.</violation>
</file>

<file name="src/simulation/m_qbmm.fpp">

<violation number="1" location="src/simulation/m_qbmm.fpp:728">
P1: Hardcoding the quadrature arrays to fixed size `(4,3)` while loops still iterate up to `nb` causes out-of-bounds writes for `nb > 3`, leading to memory corruption in non-case-optimized builds.</violation>
</file>

<file name="src/simulation/m_data_output.fpp">

<violation number="1" location="src/simulation/m_data_output.fpp:272">
P1: The AMD-specific alpha array is limited to 3 elements, so runs with >3 fluids overwrite the stack when s_compute_enthalpy fills alpha(1:num_fluids). Use num_fluids_max (or another upper bound ≥ num_fluids) for the AMD static allocation.</violation>
</file>

Since this is your first cubic review, here's how it works:

  • cubic automatically reviews your code and comments on bugs and improvements
  • Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
  • Ask questions if you need clarification on any suggestion

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

#!/bin/bash

gpus=`rocm-smi --showid | awk '{print $1}' | grep -Eo '[0-9]+' | uniq | tr '\n' ' '`
ngpus=`echo "$gpus" | tr -d '[:space:]' | wc -c`
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: wc -c counts characters, so nodes with multi-digit GPU IDs produce an inflated ngpus, causing mfc.sh test to request too many parallel jobs. Count IDs by words instead of characters.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At .github/workflows/frontier_amd/test.sh, line 4:

<comment>`wc -c` counts characters, so nodes with multi-digit GPU IDs produce an inflated `ngpus`, causing `mfc.sh test` to request too many parallel jobs. Count IDs by words instead of characters.</comment>

<file context>
@@ -0,0 +1,20 @@
+#!/bin/bash
+
+gpus=`rocm-smi --showid | awk '{print $1}' | grep -Eo '[0-9]+' | uniq | tr '\n' ' '`
+ngpus=`echo "$gpus" | tr -d '[:space:]' | wc -c`
+
+device_opts=""
</file context>
Suggested change
ngpus=`echo "$gpus" | tr -d '[:space:]' | wc -c`
ngpus=`echo "$gpus" | wc -w`
Fix with Cubic

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (8)
src/simulation/m_start_up.fpp (1)

1376-1387: if (.not. igr .or. dummy) should be grouped to avoid double init.
With the current condition, dummy = .true. causes both branches to run, undermining the guard and potentially double-initializing modules. Use .not. (igr .or. dummy) (or .not. igr .and. .not. dummy) instead.

🔧 Suggested fix
-        if (.not. igr .or. dummy) then
+        if (.not. (igr .or. dummy)) then
             if (recon_type == WENO_TYPE) then
                 call s_initialize_weno_module()
             elseif (recon_type == MUSCL_TYPE) then
                 call s_initialize_muscl_module()
             end if
             call s_initialize_cbc_module()
             call s_initialize_riemann_solvers_module()
         end if
src/simulation/m_time_steppers.fpp (1)

712-718: Prevent dummy from triggering conversion when q_prim_vf isn’t allocated.

q_prim_vf is only allocated when .not. igr. With .not. igr .or. dummy, a true dummy would call the conversion and write into unallocated storage. You can keep the workaround but route dummy into a no-op branch.

🔧 Suggested fix
- if (.not. igr .or. dummy) then
-     call s_convert_conservative_to_primitive_variables( &
-         q_cons_ts(1)%vf, &
-         q_T_sf, &
-         q_prim_vf, &
-         idwint)
- end if
+ if (.not. igr) then
+     call s_convert_conservative_to_primitive_variables( &
+         q_cons_ts(1)%vf, &
+         q_T_sf, &
+         q_prim_vf, &
+         idwint)
+ else if (dummy) then
+     continue
+ end if
src/simulation/m_bubbles_EL.fpp (1)

785-798: Fix uninitialized velocity components in AMD code path.

In the AMD non-optimized path, vel is dimensioned(3) but only vel(1:num_dims) are assigned. For 1D/2D simulations (num_dims < 3), the remaining components contain garbage that dot_product(vel, vel) includes, corrupting the speed of sound calculation. Initialize vel = 0._wp or slice the dot_product calls to vel(1:num_dims).

Suggested fix
#:if not MFC_CASE_OPTIMIZATION and USING_AMD
    real(wp), dimension(3) :: vel
#:else
    real(wp), dimension(num_dims) :: vel
#:endif
+vel = 0._wp
...
-E = gamma*pinf + pi_inf + 0.5_wp*rhol*dot_product(vel, vel)
+E = gamma*pinf + pi_inf + 0.5_wp*rhol*dot_product(vel(1:num_dims), vel(1:num_dims))
...
-cson = sqrt((H - 0.5_wp*dot_product(vel, vel))/gamma)
+cson = sqrt((H - 0.5_wp*dot_product(vel(1:num_dims), vel(1:num_dims)))/gamma)
src/simulation/m_viscous.fpp (1)

1023-1063: dummy in reconstruction can execute multiple schemes.

With recon_type == TYPE .or. dummy inside the SCHEME loops, a true dummy would run both WENO and MUSCL branches and overwrite recon_dir/bounds. If dummy is meant to stay false, consider hardening it (e.g., parameter constant or guard) to prevent accidental activation.

Also applies to: 1065-1108, 1168-1211

src/simulation/m_rhs.fpp (1)

342-590: dummy-gated branches can run mutually exclusive paths in the same step.

With .or. dummy added across allocation and execution branches, a true dummy would execute both IGR and FV paths and invoke viscous routines even when their buffers are not allocated (e.g., tau_Re_vf), and allocations are still keyed only to .not. igr/viscous. Please ensure dummy is hard-wired false at runtime or refactor to keep these branches mutually exclusive and mirror allocations/deallocations if dummy can ever be true.

Also applies to: 662-718, 731-744, 754-988, 1059-1071, 1636-1668, 1745-1758, 1879-1917, 1934-1995

src/simulation/m_surface_tension.fpp (1)

331-394: dummy in reconstruction can override selected scheme.

If dummy ever becomes true, both scheme branches run and recon_dir/bounds end up using the last scheme’s polynomial width. Consider ensuring dummy is always false or guarding to run only the configured scheme.

src/simulation/m_data_output.fpp (1)

329-333: dummy is defined and defaults to false; however, it appears to be placeholder code.

dummy is declared as a module variable in m_global_parameters.fpp and is initialized to .false. by default. It is never set to true anywhere in the codebase, making the condition if (viscous .or. dummy) functionally equivalent to if (viscous). The addition of .or. dummy appears unnecessary unless this variable is intended for future use. Consider simplifying to if (viscous) for clarity, or document the purpose of dummy if it serves as a placeholder for future functionality.

src/simulation/m_cbc.fpp (1)

712-747: Add a comment explaining the dummy variable workaround for future maintainers.

The dummy variable is properly defined in m_global_parameters and initialized to .false.. It serves as a workaround for an AMD compiler case-optimization bug where conditionals with parameters followed by GPU kernels can be incorrectly optimized away. Adding a brief comment at lines 712 and 747 (or once at the top of the affected block) explaining this temporary workaround would help maintainers understand its purpose and facilitate removal when the compiler bug is resolved.

For example, consider adding above the first occurrence:

! Workaround for AMD compiler case-optimization bug: dummy ensures
! conditional branches are evaluated at runtime, preventing incorrect
! optimization of GPU kernel launch conditions.
🤖 Fix all issues with AI agents
In `@src/simulation/m_compute_cbc.fpp`:
- Around line 38-47: The AMD-specific branches declare mf and dalpha_rho_ds with
a fixed dimension(3) which can overflow when num_fluids > 3; update the
declarations guarded by the condition "not MFC_CASE_OPTIMIZATION and USING_AMD"
to use dimension(num_fluids) (matching the non-AMD branch) or else add a
compile-time check enforcing num_fluids ≤ 3; change all occurrences in this file
where mf and dalpha_rho_ds are declared in the AMD path (symbols: mf,
dalpha_rho_ds, L, and the preprocessor flags MFC_CASE_OPTIMIZATION and
USING_AMD) so loops that iterate 1..num_fluids no longer index out of bounds.

In `@src/simulation/m_hyperelastic.fpp`:
- Around line 101-105: The AMD branch declares tensora and tensorb as
dimension(9) but the kernel indexes up to tensor_size (num_dims**2 + 1), causing
out-of-bounds writes/reads; fix by making the AMD arrays match tensor_size
(declare tensora and tensorb dimension(tensor_size)) or alternatively keep
tensora/tensorb as dimension(num_dims**2) and move the determinant storage into
a separate scalar (e.g., det) used where tensorb(tensor_size) is written/read;
update all references that currently use tensorb(tensor_size) and any loops over
tensor indices (the code around tensora/tensorb usage and determinant writes at
lines referenced) to use the adjusted array size or the new scalar.

In `@src/simulation/m_riemann_solvers.fpp`:
- Around line 3765-3778: The GPU parallel region macro
GPU_PARALLEL_LOOP(collapse=3, private='[...]', copyin='[norm_dir]') is missing
the inner loop index i in its private list causing potential data races for the
inner do i loops that assign alpha_rho_L/R and vel%L; update the private list
passed to GPU_PARALLEL_LOOP to include the loop variable i so that i is private
to each thread (ensure the same change is applied wherever GPU_PARALLEL_LOOP is
used around the do j/k/l blocks that contain the do i loops).

In `@toolchain/bootstrap/modules.sh`:
- Line 126: The if test is malformed: fix the conditional that reads if[ "$u_c"
'==' 'famd' ]; then by adding required spaces around [ and ] and using a
POSIX-compatible string operator, e.g. change the test to use [ "$u_c" = "famd"
] with proper spacing and quoting of the variable; update the conditional that
references variable u_c accordingly so the shell no longer throws a syntax
error.
🟠 Major comments (23)
toolchain/modules-48-51 (1)

48-51: Add famd-gpu GPU variant entry.

The toolchain/modules file format (documented at lines 6-7) explicitly requires [slug]-gpu entries for GPU-capable systems. All GPU systems in the file follow this pattern: b-gpu, a-gpu, w-gpu, e-gpu, p-gpu, f-gpu, and tuo-gpu. Since OLCF Frontier has GPU acceleration, famd (Frontier AMD) should include a famd-gpu entry with GPU-specific modules and settings alongside the existing famd-all entries.

src/simulation/m_ibm.fpp-184-196 (1)

184-196: Same out-of-bounds risk with hardcoded array sizes.

This follows the same pattern as m_bubbles_EE.fpp. When USING_AMD is true and MFC_CASE_OPTIMIZATION is false, arrays are sized to fixed values (3, 18, 12). Loops iterate over num_fluids, nb, nb*nmom, and nb*nnode, which may exceed these hardcoded sizes at runtime.

Ensure that AMD builds without case optimization enforce:

  • num_fluids ≤ 3
  • nb ≤ 3
  • nb*nmom ≤ 18
  • nb*nnode ≤ 12
toolchain/bootstrap/modules.sh-136-139 (1)

136-139: Fix undefined variables in AMD (famd) configuration block.

Lines 136-139 reference variables that are not defined within the famd block:

  • CRAY_MPICH_PREFIX (line 136): Used but never set, resulting in "-L/lib" instead of proper path
  • CRAY_PMI_POST_LINK_OPTS (line 137): Used but never set
  • CRAY_LD_LIBRARY_PATH (line 139): Explicitly excluded from the earlier Cray path setup (line 121 condition skips famd), but then referenced here

These variables should either be defined within the famd block using ${OLCF_AFAR_ROOT} values, loaded from modules, or the references should be removed if unnecessary for AMD compilation.

src/simulation/m_bubbles_EE.fpp-168-174 (1)

168-174: Add validation to prevent out-of-bounds access in AMD non-optimized GPU builds.

The hardcoded size of 3 for Rtmp, Vtmp, myalpha, and myalpha_rho is a known workaround for AMD GPU compiler limitations with automatic arrays in device code. However, when USING_AMD and not MFC_CASE_OPTIMIZATION are both true, there is no runtime validation preventing num_fluids > 3 or nb > 3 from being set by the user.

Loops at lines 215, 223, 234, 242 (iterating 1..nb) and line 245 (iterating 1..num_fluids) will access array indices beyond the hardcoded size of 3, causing silent out-of-bounds access if constraints are violated.

Add a validation check (e.g., in s_initialize_gpu_vars or similar setup routine) that either:

  • Enforces num_fluids ≤ 3 and nb ≤ 3 when building for AMD without case optimization, or
  • Raises an error if these limits are exceeded in such configurations.

Alternatively, document the constraint prominently in the configuration or error messages.

src/simulation/m_sim_helpers.fpp-100-115 (1)

100-115: Fixed-size alpha buffers risk OOB when num_fluids > 3.
In the AMD branch, alpha, alpha_rho, and Gs are sized to 3 while the routine and callees iterate num_fluids. If num_fluids can exceed 3, this will overrun; please enforce the cap in host-side input checks (e.g., @:ASSERT(num_fluids <= 3, ...)) or use a compile-time max constant for these buffers.

As per coding guidelines: Use fypp macro @:ASSERT(predicate, message) for validating conditions: @:ASSERT(predicate, message).

src/simulation/m_hyperelastic.fpp-107-111 (1)

107-111: Guard fixed-size alpha arrays against num_fluids > 3.
alpha_k/alpha_rho_k are fixed at 3 in the AMD path but downstream routines iterate num_fluids, which will overrun if num_fluids > 3. Add a host-side guard (or widen to a compile-time max) before launching GPU work.

🛡️ Example guard
 #:if not MFC_CASE_OPTIMIZATION and USING_AMD
     real(wp), dimension(3) :: alpha_k, alpha_rho_k
 #:else
     real(wp), dimension(num_fluids) :: alpha_k, alpha_rho_k
 #:endif
+    #:if not MFC_CASE_OPTIMIZATION and USING_AMD
+        @:ASSERT(num_fluids <= 3, 'USING_AMD path requires num_fluids <= 3')
+    #:endif
As per coding guidelines: Use fypp macro `@:ASSERT(predicate, message)` for validating conditions: `@:ASSERT(predicate, message)`.
src/simulation/m_acoustic_src.fpp-146-150 (1)

146-150: Fixed-size myalpha arrays can overflow when num_fluids > 3.
myalpha/myalpha_rho are sized to 3 under AMD, but the GPU loop iterates num_fluids. Add a host-side guard (or widen to a compile-time max) before the kernel runs.

🛡️ Example guard
 #:if not MFC_CASE_OPTIMIZATION and USING_AMD
     real(wp), dimension(3) :: myalpha, myalpha_rho
 #:else
     real(wp), dimension(num_fluids) :: myalpha, myalpha_rho
 #:endif
+    #:if not MFC_CASE_OPTIMIZATION and USING_AMD
+        @:ASSERT(num_fluids <= 3, 'USING_AMD path requires num_fluids <= 3')
+    #:endif
As per coding guidelines: Use fypp macro `@:ASSERT(predicate, message)` for validating conditions: `@:ASSERT(predicate, message)`.
src/simulation/m_pressure_relaxation.fpp-220-224 (1)

220-224: alpha/alpha_rho size=2 needs an explicit num_fluids constraint.
The loops iterate 1..num_fluids; if num_fluids exceeds 2, these arrays will overrun. If the AMD workaround is intentionally limited to 2 fluids, enforce that in a host-side check; otherwise size to a shared max constant.

🛡️ Example host-side guard (e.g., in s_initialize_pressure_relaxation_module)
+#:if not MFC_CASE_OPTIMIZATION and USING_AMD
+    @:ASSERT(num_fluids <= 2, 'Pressure relaxation AMD path requires num_fluids <= 2')
+#:endif
As per coding guidelines: Use fypp macro `@:ASSERT(predicate, message)` for validating conditions: `@:ASSERT(predicate, message)`.
src/simulation/m_bubbles_EL.fpp-580-582 (1)

580-582: Add Re to the GPU private list.

Re is written inside the loop (via s_convert_species_to_mixture_variables_acc) and should be private to avoid cross-thread races.

🔧 Suggested fix
- $:GPU_PARALLEL_LOOP(private='[k,i,myalpha_rho,myalpha,myVapFlux,preterm1, term2, paux, pint, Romega, term1_fac,myR_m, mygamma_m, myPb, myMass_n, myMass_v,myR, myV, myBeta_c, myBeta_t, myR0, myPbdot, myMvdot,myPinf, aux1, aux2, myCson, myRho,gamma,pi_inf,qv,dmalf, dmntait, dmBtait, dm_bub_adv_src, dm_divu,adap_dt_stop]', &
+ $:GPU_PARALLEL_LOOP(private='[k,i,myalpha_rho,myalpha,Re,myVapFlux,preterm1, term2, paux, pint, Romega, term1_fac,myR_m, mygamma_m, myPb, myMass_n, myMass_v,myR, myV, myBeta_c, myBeta_t, myR0, myPbdot, myMvdot,myPinf, aux1, aux2, myCson, myRho,gamma,pi_inf,qv,dmalf, dmntait, dmBtait, dm_bub_adv_src, dm_divu,adap_dt_stop]', &
Based on learnings: Declare loop-local variables with `private='[...]'` in GPU parallel loop macros.
src/common/m_boundary_common.fpp-1561-1617 (1)

1561-1617: Y-direction F_igr buffers are skipped in 2D under CASE_OPTIMIZATION.

This block is guarded by num_dims > 2, so in 2D with MFC_CASE_OPTIMIZATION enabled it compiles out the y-boundary population. That leaves jac_sf y-boundaries uninitialized. The guard should be num_dims > 1.

🔧 Suggested fix
-#:if not MFC_CASE_OPTIMIZATION or num_dims > 2
+#:if not MFC_CASE_OPTIMIZATION or num_dims > 1
CMakeLists.txt-463-471 (1)

463-471: Split CRAY_MPICH_LIB flags before linking with LLVMFlang.

The environment variable $ENV{CRAY_MPICH_LIB} contains multiple space-separated flags (e.g., -L<path> -lmpich -lmpi), but passing it directly to target_link_libraries treats it as a single argument rather than expanding the flags. Use separate_arguments() to split the string into a proper list, then pass via target_link_options() for correct linker flag handling.

🔧 Suggested fix
 if (MFC_MPI AND ARGS_MPI)
     find_package(MPI COMPONENTS Fortran REQUIRED)
     
     target_compile_definitions(${a_target} PRIVATE MFC_MPI)
     if(CMAKE_Fortran_COMPILER_ID STREQUAL "LLVMFlang")
         target_compile_options(${a_target} PRIVATE "$ENV{CRAY_MPICH_INC}")
-        target_link_libraries(${a_target} PRIVATE $ENV{CRAY_MPICH_LIB})
+        set(CRAY_MPICH_LIB_ENV "$ENV{CRAY_MPICH_LIB}")
+        separate_arguments(CRAY_MPICH_LIB_ENV NATIVE_COMMAND "${CRAY_MPICH_LIB_ENV}")
+        target_link_options(${a_target} PRIVATE ${CRAY_MPICH_LIB_ENV})
     else() 
         target_link_libraries(${a_target} PRIVATE MPI::MPI_Fortran)
     endif()
 endif()
CMakeLists.txt-489-494 (1)

489-494: Use target_link_options() with separate_arguments() to properly handle CRAY_HIPFORT_LIB as raw linker flags.

CRAY_HIPFORT_LIB contains space-separated linker flags (-L and -l), not a CMake library target. Passing it directly to target_link_libraries() is incorrect; CMake will treat it as a single argument and may place it in the wrong position on the link line. Use separate_arguments() to split the environment variable and target_link_options() for raw linker flags.

🔧 Suggested fix
 elseif(CMAKE_Fortran_COMPILER_ID STREQUAL "LLVMFlang")
-    target_link_libraries(${a_target} PRIVATE $ENV{CRAY_HIPFORT_LIB})
+    set(CRAY_HIPFORT_LIB_ENV "$ENV{CRAY_HIPFORT_LIB}")
+    separate_arguments(CRAY_HIPFORT_LIB_ENV NATIVE_COMMAND "${CRAY_HIPFORT_LIB_ENV}")
+    target_link_options(${a_target} PRIVATE ${CRAY_HIPFORT_LIB_ENV})
 else() 
     find_package(hipfort COMPONENTS hipfft CONFIG REQUIRED)
     target_link_libraries(${a_target} PRIVATE hipfort::hipfft)                    
 endif()
src/simulation/m_time_steppers.fpp-691-698 (1)

691-698: Guard the AMD fixed-size vel/alpha arrays against size mismatches.

The AMD path hard-codes vel to length 3 and alpha to length 3. While num_vels is mathematically bounded by spatial dimensions (≤3), num_fluids is not. If num_fluids exceeds 3, downstream routines see a shape mismatch that could silently drop or corrupt data. Add an assertion to enforce the constraint:

🔧 Suggested fix
#:if not MFC_CASE_OPTIMIZATION and USING_AMD
    real(wp), dimension(3) :: vel        !< Cell-avg. velocity
    real(wp), dimension(3) :: alpha      !< Cell-avg. volume fraction
+    @:ASSERT(num_vels <= 3 .and. num_fluids <= 3, "AMD fixed-size vel/alpha require num_vels,num_fluids <= 3")
#:else
    real(wp), dimension(num_vels) :: vel        !< Cell-avg. velocity
    real(wp), dimension(num_fluids) :: alpha      !< Cell-avg. volume fraction
#:endif

Per coding guidelines: Use fypp macro @:ASSERT(predicate, message) for validating conditions.

src/simulation/m_bubbles_EL.fpp-540-544 (1)

540-544: Add guard to prevent AMD fixed-size alpha arrays with num_fluids > 3.

In the AMD code path, myalpha_rho/myalpha are statically dimensioned to 3, but downstream routines like s_convert_species_to_mixture_variables_acc iterate over num_fluids. If num_fluids exceeds 3 at compile-time with USING_AMD enabled, array access becomes out-of-bounds. Guard against this configuration:

🔧 Suggested guard
#:if not MFC_CASE_OPTIMIZATION and USING_AMD
    real(wp), dimension(3) :: myalpha_rho, myalpha
+   @:ASSERT(num_fluids <= 3, "AMD fixed-size alpha arrays require num_fluids <= 3")
#:else
    real(wp), dimension(num_fluids) :: myalpha_rho, myalpha
#:endif
src/common/m_chemistry.fpp-22-27 (1)

22-27: AMD fixed-size chemistry arrays can overflow for larger mechanisms.

On USING_AMD, the reaction and diffusion kernels allocate 10-element arrays and use molecular_weights_nonparameter, but the loops still span chemxb:chemxe. If num_species exceeds 10, this will index past bounds. Please enforce num_species <= 10 for AMD builds or generate a non-parameter weights array sized to num_species so the kernels remain safe across mechanisms.

Also applies to: 129-167, 177-420

src/simulation/m_viscous.fpp-75-81 (1)

75-81: Guard AMD fixed-size viscous arrays against num_fluids > 3.

On the AMD/non-optimization path, alpha_visc and alpha_rho_visc are size 3 but the loops still iterate 1:num_fluids. If num_fluids can exceed 3, this will index out of bounds. Please enforce the limit or size the arrays to num_fluids.

🛡️ Suggested guard
        integer :: i, j, k, l, q !< Generic loop iterator
+#:if not MFC_CASE_OPTIMIZATION and USING_AMD
+        @:ASSERT(num_fluids <= 3, "AMD viscous path supports up to 3 fluids")
+#:endif

As per coding guidelines "Use the fypp ASSERT macro for validating conditions: @:ASSERT(predicate, message)".

src/simulation/m_viscous.fpp-102-210 (1)

102-210: Initialize Re_visc for bubbles_euler paths.

Re_visc is computed only in the non-bubbles branch, but it is used immediately afterward in both shear and bulk stress updates. When bubbles_euler is true, this leaves Re_visc undefined and can corrupt stresses. Consider computing Re_visc outside the bubbles/non-bubbles split or initializing it in the bubbles path.

Also applies to: 212-315

src/simulation/m_riemann_solvers.fpp-1321-1344 (1)

1321-1344: Avoid uninitialized MHD-relativity states under MFC_CASE_OPTIMIZATION.

With MFC_CASE_OPTIMIZATION enabled and num_dims <= 2, the preprocessor strips the entire relativistic MHD block, but later code still consumes pres_mag, H_*, and E_*. The energy flux assignment is also gated away for 1D, leaving flux_rs...E_idx unset. Please ensure these variables are always assigned (or explicitly mark the combo unsupported).

💡 Example of keeping 2D-safe initialization while guarding 3D-only terms
-#:if not MFC_CASE_OPTIMIZATION or num_dims > 2
-    Ga%L = 1._wp/sqrt(1._wp - vel_L_rms)
-    Ga%R = 1._wp/sqrt(1._wp - vel_R_rms)
-    vdotB%L = vel_L(1)*B%L(1) + vel_L(2)*B%L(2) + vel_L(3)*B%L(3)
-    vdotB%R = vel_R(1)*B%R(1) + vel_R(2)*B%R(2) + vel_R(3)*B%R(3)
-    b4%L(1:3) = B%L(1:3)/Ga%L + Ga%L*vel_L(1:3)*vdotB%L
-    b4%R(1:3) = B%R(1:3)/Ga%R + Ga%R*vel_R(1:3)*vdotB%R
-    B2%L = B%L(1)**2._wp + B%L(2)**2._wp + B%L(3)**2._wp
-    B2%R = B%R(1)**2._wp + B%R(2)**2._wp + B%R(3)**2._wp
-    pres_mag%L = 0.5_wp*(B2%L/Ga%L**2._wp + vdotB%L**2._wp)
-    pres_mag%R = 0.5_wp*(B2%R/Ga%R**2._wp + vdotB%R**2._wp)
-    H_L = 1._wp + (gamma_L + 1)*pres_L/rho_L
-    H_R = 1._wp + (gamma_R + 1)*pres_R/rho_R
-    cm%L(1:3) = (rho_L*H_L*Ga%L**2 + B2%L)*vel_L(1:3) - vdotB%L*B%L(1:3)
-    cm%R(1:3) = (rho_R*H_R*Ga%R**2 + B2%R)*vel_R(1:3) - vdotB%R*B%R(1:3)
-    E_L = rho_L*H_L*Ga%L**2 - pres_L + 0.5_wp*(B2%L + vel_L_rms*B2%L - vdotB%L**2._wp) - rho_L*Ga%L
-    E_R = rho_R*H_R*Ga%R**2 - pres_R + 0.5_wp*(B2%R + vel_R_rms*B2%R - vdotB%R**2._wp) - rho_R*Ga%R
-#:endif
+    Ga%L = 1._wp/sqrt(1._wp - vel_L_rms)
+    Ga%R = 1._wp/sqrt(1._wp - vel_R_rms)
+    vdotB%L = vel_L(1)*B%L(1) + vel_L(2)*B%L(2)
+    vdotB%R = vel_R(1)*B%R(1) + vel_R(2)*B%R(2)
+    B2%L = B%L(1)**2._wp + B%L(2)**2._wp
+    B2%R = B%R(1)**2._wp + B%R(2)**2._wp
+#:if not MFC_CASE_OPTIMIZATION or num_dims > 2
+    vdotB%L = vdotB%L + vel_L(3)*B%L(3)
+    vdotB%R = vdotB%R + vel_R(3)*B%R(3)
+    B2%L = B2%L + B%L(3)**2._wp
+    B2%R = B2%R + B%R(3)**2._wp
+#:endif
+    pres_mag%L = 0.5_wp*(B2%L/Ga%L**2._wp + vdotB%L**2._wp)
+    pres_mag%R = 0.5_wp*(B2%R/Ga%R**2._wp + vdotB%R**2._wp)
+    H_L = 1._wp + (gamma_L + 1)*pres_L/rho_L
+    H_R = 1._wp + (gamma_R + 1)*pres_R/rho_R
+    cm%L(1:3) = (rho_L*H_L*Ga%L**2 + B2%L)*vel_L(1:3) - vdotB%L*B%L(1:3)
+    cm%R(1:3) = (rho_R*H_R*Ga%R**2 + B2%R)*vel_R(1:3) - vdotB%R*B%R(1:3)
+    E_L = rho_L*H_L*Ga%L**2 - pres_L + 0.5_wp*(B2%L + vel_L_rms*B2%L - vdotB%L**2._wp) - rho_L*Ga%L
+    E_R = rho_R*H_R*Ga%R**2 - pres_R + 0.5_wp*(B2%R + vel_R_rms*B2%R - vdotB%R**2._wp) - rho_R*Ga%R

Also applies to: 1531-1537

src/simulation/m_riemann_solvers.fpp-294-308 (1)

294-308: Guard AMD fixed-size arrays and slice full-array operations.

Hard-coded sizes (3/10) are now used for fluid/species/bubble/xi-field arrays. If runtime num_fluids/num_species/nb/num_dims exceed these, you’ll hit OOB; when they’re smaller, full-array ops (Xs_L(:), sum(...)) include uninitialized tails. Please add explicit bounds assertions and restrict ops to 1:num_* (or zero-initialize the tails).

✅ Example fix for species arrays (apply similarly in other AMD branches)
#:if USING_AMD
+    @:ASSERT(num_species <= 10, 'USING_AMD supports up to 10 species')
+    Ys_L = 0._wp; Ys_R = 0._wp
#:endif
...
#:if USING_AMD
-    Xs_L(:) = Ys_L(:)*MW_L/molecular_weights_nonparameter(:)
-    Xs_R(:) = Ys_R(:)*MW_R/molecular_weights_nonparameter(:)
+    Xs_L = 0._wp; Xs_R = 0._wp
+    Xs_L(1:num_species) = Ys_L(1:num_species)*MW_L/molecular_weights_nonparameter(1:num_species)
+    Xs_R(1:num_species) = Ys_R(1:num_species)*MW_R/molecular_weights_nonparameter(1:num_species)
#:else
     Xs_L(:) = Ys_L(:)*MW_L/molecular_weights(:)
     Xs_R(:) = Ys_R(:)*MW_R/molecular_weights(:)
#:endif
...
-    gamma_L = sum(Xs_L(:)/(Gamma_iL(:) - 1.0_wp))
-    gamma_R = sum(Xs_R(:)/(Gamma_iR(:) - 1.0_wp))
+    gamma_L = sum(Xs_L(1:num_species)/(Gamma_iL(1:num_species) - 1.0_wp))
+    gamma_R = sum(Xs_R(1:num_species)/(Gamma_iR(1:num_species) - 1.0_wp))

As per coding guidelines: Use the fypp ASSERT macro for validating conditions: @:ASSERT(predicate, message).

Also applies to: 496-502, 1071-1087, 1277-1283, 1975-1995, 2020-2030, 2040-2044, 3265-3271, 3722-3727

src/common/m_phase_change.fpp-94-98 (1)

94-98: Add runtime guard to prevent array overflow when num_fluids > 3 in AMD path.

The AMD conditional branch (lines 94–98, 302–341, 421–433) sizes arrays p_infOV, p_infpT, p_infSL, sk, hk, gk, ek, rhok, and p_infpTg to 3 elements, but loops iterate from 1 to num_fluids. When num_fluids > 3, these accesses overflow the arrays. The partial safeguard at lines 328–333 only initializes excess elements when num_fluids < 3; it does not prevent overflow when num_fluids > 3.

Add @:ASSERT(num_fluids <= 3, "USING_AMD path supports up to 3 fluids") at the start of s_infinite_relaxation_k to enforce this constraint.

Suggested location for guard
subroutine s_infinite_relaxation_k(q_cons_vf)
+    #:if not MFC_CASE_OPTIMIZATION and USING_AMD
+        @:ASSERT(num_fluids <= 3, "USING_AMD path supports up to 3 fluids")
+    #:endif
     type(scalar_field), dimension(sys_size), intent(inout) :: q_cons_vf
src/simulation/m_qbmm.fpp-726-737 (1)

726-737: Add runtime assertions to guard fixed-size AMD buffers against out-of-bounds writes.

The AMD code path uses fixed-size arrays (moms=6, nnode=4, nb=3, nterms=32) while loops iterate up to nmom, nnode, nb, and nterms without bounds checking. If these parameters exceed the fixed sizes—which is possible since they are runtime-determined—writes will overflow. For example, do i=1,nb accesses wght(:,i) with dimension(4,3), and f_quad2D loops do i=1,nnode but receives dimension(4) arrays in the AMD path.

Add Fypp ASSERT macros immediately after the array declarations to enforce constraints when USING_AMD is true:

Suggested fix
        #:if not MFC_CASE_OPTIMIZATION and USING_AMD
            real(wp), dimension(6) :: moms, msum
            real(wp), dimension(4, 3) :: wght, abscX, abscY, wght_pb, wght_mv, wght_ht, ht
+           @:ASSERT(nmom <= 6, "USING_AMD expects nmom ≤ 6")
+           @:ASSERT(nnode <= 4, "USING_AMD expects nnode ≤ 4")
+           @:ASSERT(nb <= 3, "USING_AMD expects nb ≤ 3")
        #:else
            real(wp), dimension(nmom) :: moms, msum
            real(wp), dimension(nnode, nb) :: wght, abscX, abscY, wght_pb, wght_mv, wght_ht, ht
        #:endif
        #:if USING_AMD
            real(wp), dimension(32, 0:2, 0:2) :: coeff
+           @:ASSERT(nterms <= 32, "USING_AMD expects nterms ≤ 32")
        #:else
src/common/m_variables_conversion.fpp-324-330 (1)

324-330: Guard AMD fixed-size buffers against larger num_fluids/num_vels/num_species/nb.

The AMD optimization paths in this file downsize arrays to fixed lengths (3 for fluids/vels, 10 for species), but the code contains loops iterating to num_fluids, num_vels, num_species, and nb that directly index these fixed-size arrays. For example:

  • s_convert_species_to_mixture_variables_acc (line ~344): do i = 1, num_fluids indexes alpha_rho_K(i) with dimension(3)
  • s_compute_species_fraction (line ~1356): do i = 1, num_fluids indexes alpha_rho_K(i) with dimension(3)
  • s_flux_variables (line ~1234-1237): loops index vel_K(i) and Y_K(i) with fixed bounds

This will cause out-of-bounds memory access if any limit is exceeded.

Add ASSERT guards in s_initialize_variables_conversion_module to validate these constraints at startup:

Suggested fix
    impure subroutine s_initialize_variables_conversion_module

        integer :: i, j

+       #:if not MFC_CASE_OPTIMIZATION and USING_AMD
+           @:ASSERT(num_fluids <= 3, "USING_AMD path supports ≤3 fluids")
+           @:ASSERT(num_vels <= 3, "USING_AMD path supports ≤3 velocity components")
+           @:ASSERT(num_species <= 10, "USING_AMD path supports ≤10 species")
+           @:ASSERT(nb <= 3, "USING_AMD path supports ≤3 bubble bins")
+       #:endif

        $:GPU_ENTER_DATA(copyin='[is1b,is1e,is2b,is2e,is3b,is3e]')

Also applies to lines 590–598, 1191–1200, 1338–1341, 1412–1415.

src/simulation/m_data_output.fpp-271-277 (1)

271-277: Fix hardcoded buffer dimensions on non-optimized AMD builds to prevent overflow when num_fluids > 3.

On AMD builds with MFC_CASE_OPTIMIZATION = false, alpha and related variables are hardcoded to dimension(3). If num_fluids is set to 4 or higher, this creates an array overrun. Change the hardcoded 3 to use num_fluids or num_fluids_max, or add a compile-time check to enforce num_fluids ≤ 3 for non-optimized AMD builds.

🟡 Minor comments (12)
src/simulation/include/inline_capillary.fpp-4-9 (1)

4-9: Initialize Omega(2,1/1,2/2,2) when the guard is false.

With #:if not MFC_CASE_OPTIMIZATION or num_dims > 1, these entries are left undefined when the condition is false. If any downstream logic (even diagnostics) reads them in 1D, this can propagate garbage values. Consider zeroing them in an #:else branch or confirm they are never referenced in that configuration.

Suggested guard-safe zeroing
 #:if not MFC_CASE_OPTIMIZATION or num_dims > 1
   Omega(2, 1) = sigma*w1*w2/normW
   Omega(1, 2) = Omega(2, 1)

   Omega(2, 2) = -sigma*(w1*w1 + w3*w3)/normW
+#:else
+  Omega(2, 1) = 0.0_wp
+  Omega(1, 2) = 0.0_wp
+  Omega(2, 2) = 0.0_wp
 #:endif
src/simulation/m_time_steppers.fpp-533-538 (1)

533-538: Avoid double run_time_info writes if dummy is ever true.

With dummy set, both branches execute and you’ll emit two runtime-info rows per step. If dummy is intended to stay false for the compiler workaround, consider documenting that invariant or adding a guard/ASSERT to prevent accidental enablement.

load_amd.sh-5-7 (1)

5-7: Remove or use AFAR_UMS_LATEST to avoid dead variable.

It’s computed but unused. Either remove it or wire it into OLCF_AFAR_ROOT to prevent confusion.

🔧 Suggested fix (removal)
-AFAR_UMS_LATEST=$(ls -d --color=never ${AFAR_UMS_BASEDIR}/*/ | tail -n1)
load_amd.sh-1-4 (1)

1-4: Add an explicit shell declaration (or document that the file must be sourced).

Shellcheck flags the missing shebang. If this is meant to be executed, a shebang avoids /bin/sh running without the module function; if it’s meant to be sourced, add an explicit note or a ShellCheck directive.

🔧 Suggested fix
+#!/usr/bin/env bash
+# shellcheck shell=bash
src/simulation/m_riemann_solvers.fpp-4762-4769 (1)

4762-4769: Initialize divergence_cyl before the preprocessor guard.

When MFC_CASE_OPTIMIZATION strips the block (e.g., 1D), divergence_cyl is still used for stresses and can be undefined. Initialize to zero and only add terms inside the guard.

🧯 Safe initialization
-#:if not MFC_CASE_OPTIMIZATION or num_dims > 1
-    divergence_cyl = avg_dvdx_int(1) + avg_dvdy_int(2) + avg_v_int(2)/r_eff
+    divergence_cyl = 0._wp
+#:if not MFC_CASE_OPTIMIZATION or num_dims > 1
+    divergence_cyl = avg_dvdx_int(1) + avg_dvdy_int(2) + avg_v_int(2)/r_eff
     if (num_dims > 2) then
         #:if not MFC_CASE_OPTIMIZATION or num_dims > 2
             divergence_cyl = divergence_cyl + avg_dvdz_int(3)/r_eff
         #:endif
     end if
 #:endif
.github/workflows/bench.yml-73-80 (1)

73-80: Add frontier_amd to actionlint’s allowed labels (or equivalent).
actionlint flags frontier_amd as unknown; if actionlint runs in CI, this will fail. Please whitelist the custom label (or update your actionlint config) to avoid spurious lint failures.

.github/workflows/frontier_amd/test.sh-7-13 (1)

7-13: Add explicit parameter validation for job_device and job_interface.

These env vars are set by submit.sh during normal workflow execution, but the script has no validation. Direct invocation without these variables would silently fall back to CPU/no-interface behavior. Adding parameter expansion checks makes the script robust against misuse:

Suggested fix
+ : "${job_device:?job_device must be set to cpu|gpu}"
+ : "${job_interface:=none}"
 device_opts=""
 if [ "$job_device" = "gpu" ]; then
     device_opts+="--gpu"
.github/workflows/frontier_amd/test.sh-3-5 (1)

3-5: Fix GPU counting for multi-digit GPU IDs.

The current logic uses wc -c to count characters after removing whitespace. This works only for single-digit GPU IDs; with multi-digit IDs (e.g., "10 11 12"), the character count inflates incorrectly. Use wc -w to count tokens instead.

Suggested fix
-gpus=`rocm-smi --showid | awk '{print $1}' | grep -Eo '[0-9]+' | uniq | tr '\n' ' '`
-ngpus=`echo "$gpus" | tr -d '[:space:]' | wc -c`
+gpus=$(rocm-smi --showid | awk '{print $1}' | grep -Eo '[0-9]+' | uniq)
+ngpus=$(echo "$gpus" | wc -w)
+if [ "$job_device" = "gpu" ] && [ "$ngpus" -eq 0 ]; then
+  echo "No GPUs detected; aborting GPU test run." >&2
+  exit 1
+fi
.github/workflows/frontier_amd/submit-bench.sh-5-10 (1)

5-10: Quote the script path and document the interface argument.

cat $1 will break on spaces, and $3 is used in job_slug but not documented in usage. Tightening both avoids avoidable operator error.

🔧 Proposed fix
 usage() {
-    echo "Usage: $0 [script.sh] [cpu|gpu]"
+    echo "Usage: $0 [script.sh] [cpu|gpu] [none|acc|omp]"
 }
 
 if [ ! -z "$1" ]; then
-    sbatch_script_contents=`cat $1`
+    sbatch_script_contents=$(cat "$1")
 else
     usage
     exit 1
 fi
@@
-job_slug="`basename "$1" | sed 's/\.sh$//' | sed 's/[^a-zA-Z0-9]/-/g'`-$2-$3"
+job_slug="`basename "$1" | sed 's/\.sh$//' | sed 's/[^a-zA-Z0-9]/-/g'`-$2-$3"

Also applies to: 27-28

.github/workflows/frontier_amd/submit.sh-5-10 (1)

5-10: Quote the script path and document the interface argument.

cat $1 will fail on paths with spaces, and $3 is used but not described in usage. Aligning usage and quoting avoids brittle submissions.

🔧 Proposed fix
 usage() {
-    echo "Usage: $0 [script.sh] [cpu|gpu]"
+    echo "Usage: $0 [script.sh] [cpu|gpu] [none|acc|omp]"
 }
 
 if [ ! -z "$1" ]; then
-    sbatch_script_contents=`cat $1`
+    sbatch_script_contents=$(cat "$1")
 else
     usage
     exit 1
 fi
@@
-job_slug="`basename "$1" | sed 's/\.sh$//' | sed 's/[^a-zA-Z0-9]/-/g'`-$2-$3"
+job_slug="`basename "$1" | sed 's/\.sh$//' | sed 's/[^a-zA-Z0-9]/-/g'`-$2-$3"

Also applies to: 28-29

.github/workflows/frontier_amd/build.sh-19-21 (1)

19-21: Remove unused dirname assignment.

dirname is set on Line 20 but never used, which is already flagged by Shellcheck. Dropping it keeps the loop clean and avoids confusion.

🧹 Proposed cleanup
-    dirname=$(basename "$dir")
     ./mfc.sh run "$dir/case.py" --case-optimization -j 8 --dry-run $build_opts
.github/workflows/frontier_amd/bench.sh-3-16 (1)

3-16: Fail fast when required job context variables are missing.

job_device, job_interface, and job_slug are assumed to be set by the submit wrapper; if someone runs this script directly, it silently misconfigures. Add a guard to make the contract explicit.

✅ Suggested guard
 #!/bin/bash
 
+if [ -z "${job_device:-}" ] || [ -z "${job_slug:-}" ]; then
+  echo "Usage: job_device=<cpu|gpu> job_interface=<none|acc|omp> job_slug=<slug> $0"
+  exit 1
+fi
+
 n_ranks=12
 device_opts=""
 if [ "$job_device" = "gpu" ]; then
🧹 Nitpick comments (1)
src/post_process/m_global_parameters.fpp (1)

248-248: Add a comment explaining the purpose of the dummy variable.

Per the PR description, this variable is a temporary workaround for an AMD amdflang compiler bug where a conditional with a parameter followed by a GPU kernel can cause a catastrophic error. Without documentation, future maintainers may remove this as apparent dead code.

📝 Suggested documentation
     logical :: fft_wrt
+    !> `@note` Temporary workaround for amdflang case-optimization bug.
+    !! Remove once the compiler is patched.
     logical :: dummy
     logical :: pres_wrt

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 20

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
src/simulation/m_data_output.fpp (1)

329-334: Potential unallocated array access when dummy=true and viscous=false.

Line 329 changes the condition to viscous .or. dummy, which means vcfl_sf and Rc_sf will be accessed even when viscous is false (if dummy is true). However, these arrays are only allocated when viscous is true (see s_initialize_data_output_module around lines 1827-1833):

if (viscous) then
    @:ALLOCATE(vcfl_sf(0:m, 0:n, 0:p))
    @:ALLOCATE(Rc_sf  (0:m, 0:n, 0:p))

If dummy=true and viscous=false, this will cause an access to unallocated arrays, leading to undefined behavior or a runtime crash.

Proposed fix

Either guard the allocation with the same condition:

-            if (viscous) then
+            if (viscous .or. dummy) then
                 @:ALLOCATE(vcfl_sf(0:m, 0:n, 0:p))
                 @:ALLOCATE(Rc_sf  (0:m, 0:n, 0:p))

Or keep the runtime check consistent:

-        if (viscous .or. dummy) then
+        if (viscous) then
src/simulation/m_time_steppers.fpp (1)

692-712: Guard fixed-size AMD vel/alpha arrays against larger counts.

With USING_AMD + no case optimization, vel/alpha are size 3 but downstream routines still index by num_vels/num_fluids. If either exceeds 3, GPU kernels can read/write out of bounds. Add a guard (or size to a max constant) before entering the GPU loop.

🛠️ Suggested guard
    impure subroutine s_compute_dt()
+        #:if not MFC_CASE_OPTIMIZATION and USING_AMD
+            if (num_vels > 3 .or. num_fluids > 3) then
+                call s_mpi_abort('USING_AMD expects num_vels/num_fluids <= 3 in s_compute_dt')
+            end if
+        #:endif
         real(wp) :: rho        !< Cell-avg. density
As per coding guidelines: Use s_mpi_abort() for error termination instead of stop.
src/simulation/m_viscous.fpp (1)

73-118: Fixed-size AMD viscous scratch arrays can overflow when num_fluids > 3.

In the USING_AMD + non-optimized path, alpha_visc/alpha_rho_visc are size 3 but loops still iterate 1:num_fluids. If num_fluids exceeds 3, device out-of-bounds access is possible. Add a guard (or use a max-size parameter) before the GPU kernels.

🛠️ Suggested guard
    subroutine s_compute_viscous_stress_cylindrical_boundary(q_prim_vf, grad_x_vf, grad_y_vf, grad_z_vf, &
                                                             tau_Re_vf, &
                                                             ix, iy, iz)
+        #:if not MFC_CASE_OPTIMIZATION and USING_AMD
+            if (num_fluids > 3) then
+                call s_mpi_abort('USING_AMD expects num_fluids <= 3 in s_compute_viscous_stress_cylindrical_boundary')
+            end if
+        #:endif
         type(scalar_field), dimension(sys_size), intent(in) :: q_prim_vf
As per coding guidelines: Use s_mpi_abort() for error termination instead of stop.
src/simulation/m_qbmm.fpp (1)

573-640: AMD fixed-size QBMM scratch buffers need bounds guarding.

In AMD builds, several local arrays are hard-capped (coeffs = 32, moms/msum = 6, wght/absc = 4×3, f_quad inputs = 4×3/4), but loops still run over nterms, nmom, nnode, and nb. If any runtime value exceeds the cap, device out-of-bounds access is possible. Add guards (or size to max constants) before entering the GPU region.

🛠️ Suggested guard
        #:if not MFC_CASE_OPTIMIZATION and USING_AMD
            real(wp), dimension(6) :: moms, msum
            real(wp), dimension(4, 3) :: wght, abscX, abscY, wght_pb, wght_mv, wght_ht, ht
        #:else
            real(wp), dimension(nmom) :: moms, msum
            real(wp), dimension(nnode, nb) :: wght, abscX, abscY, wght_pb, wght_mv, wght_ht, ht
        #:endif
        #:if USING_AMD
            real(wp), dimension(32, 0:2, 0:2) :: coeff
        #:else
            real(wp), dimension(nterms, 0:2, 0:2) :: coeff
        #:endif
+        #:if USING_AMD
+            if (nterms > 32) then
+                call s_mpi_abort('USING_AMD expects nterms <= 32 in QBMM')
+            end if
+        #:endif
+        #:if not MFC_CASE_OPTIMIZATION and USING_AMD
+            if (nmom > 6 .or. nnode > 4 .or. nb > 3) then
+                call s_mpi_abort('USING_AMD expects nmom<=6, nnode<=4, nb<=3 in QBMM')
+            end if
+        #:endif
As per coding guidelines: Use s_mpi_abort() for error termination instead of stop.

Also applies to: 652-676, 726-746, 890-894, 983-1023

src/common/m_variables_conversion.fpp (1)

323-356: Guard AMD fixed-size mixture/species arrays against larger runtime counts.

In USING_AMD + non-optimized builds, alpha_* arrays are size 3 and Y_* arrays are size 10, but loops still iterate num_fluids/num_species. If either exceeds the cap, device out-of-bounds access is possible. Add guards (or size to max constants) before entering GPU loops.

🛠️ Suggested guard
        type(int_bounds_info), dimension(1:3), intent(in) :: ibounds
+        #:if not MFC_CASE_OPTIMIZATION and USING_AMD
+            if (num_fluids > 3) then
+                call s_mpi_abort('USING_AMD expects num_fluids <= 3 in variables_conversion')
+            end if
+            if (chemistry .and. num_species > 10) then
+                call s_mpi_abort('USING_AMD expects num_species <= 10 in variables_conversion')
+            end if
+        #:endif
        #:if USING_AMD and not MFC_CASE_OPTIMIZATION
            real(wp), dimension(3) :: alpha_K, alpha_rho_K
As per coding guidelines: Use s_mpi_abort() for error termination instead of stop.

Also applies to: 590-598, 1191-1200, 1337-1376, 1412-1415

🤖 Fix all issues with AI agents
In @.github/workflows/frontier_amd/submit-bench.sh:
- Around line 9-51: The current submission inlines sbatch_script_contents into
the heredoc which causes $/$(...) expansion and breaks paths with spaces;
instead drop sbatch_script_contents and pass the script filename into the job
environment (e.g., compute job_script_basename="$(basename "$1")" and keep
job_slug/job_device/job_interface as you already set) then inside the heredoc
invoke the script at runtime with a quoted path such as bash
"$SLURM_SUBMIT_DIR/$job_script_basename" (or ensure it is executable and use
"$SLURM_SUBMIT_DIR/$job_script_basename"), preserving variable quoting so
runtime SLURM variables are expanded on the compute node rather than on the
submit host.

In @.github/workflows/frontier_amd/submit.sh:
- Around line 5-7: The usage() currently documents only two args but the script
later uses $3, causing job_slug to end with a trailing '-' and job_interface to
be empty; either (A) add a third argument to usage() and validate $3 (like
phoenix/submit.sh) accepting values such as none|acc|omp, set job_interface from
$3 with validation, and include it in job_slug construction, or (B) remove all
references to $3 and change the job_slug construction and job_interface
assignment so they don't append a dangling '-' or rely on an empty interface;
update the usage() text to match the chosen behavior and ensure job_slug and
job_interface are computed safely (no trailing hyphen, job_interface defaulted
or validated) in the script functions.

In @.github/workflows/frontier_amd/test.sh:
- Around line 3-4: The GPU count logic using the variables gpus and ngpus is
fragile because ngpus is computed by stripping whitespace then using wc -c,
which breaks for multi‑digit IDs; update the ngpus calculation to count words
instead (e.g., replace the wc -c approach with wc -w or an equivalent
word-count/field-count method) so that the ngpus variable correctly reflects the
number of GPU IDs parsed into gpus.

In `@src/common/m_boundary_common.fpp`:
- Around line 1561-1617: The y-direction IGR block in s_populate_F_igr_buffers
is being excluded for 2D because the conditional is "not MFC_CASE_OPTIMIZATION
or num_dims > 2"; change this to include 2D by using "not MFC_CASE_OPTIMIZATION
or num_dims >= 2" (or otherwise remove the strict > check) so the bc_y/jac_sf
y-boundary handling compiles for num_dims == 2; update the #:if that surrounds
the y-direction block and ensure symbols referenced (MFC_CASE_OPTIMIZATION,
num_dims, s_populate_F_igr_buffers, bc_y, jac_sf) remain correct.

In `@src/common/m_chemistry.fpp`:
- Around line 22-27: The AMD-only static array molecular_weights_nonparameter is
hard-coded to length 10 and can overflow when loops over chemxb:chemxe use
num_species; update the AMD block (guarded by USING_AMD) to allocate/declare
molecular_weights_nonparameter using the dynamic num_species size (or add a
clear assert that num_species == 10 in AMD builds) and initialize it from the
mechanism's molecular weights at runtime; apply the same fix to all other
AMD-only work arrays referenced in this file (e.g., Xs_*, mass_diffusivities_*,
and any other arrays in the 129-187, 177-187, 203-423 regions) so all AMD
buffers are sized from num_species (or validated against it).

In `@src/common/m_phase_change.fpp`:
- Around line 94-98: The AMD-path declarations use fixed-size arrays
p_infOV,p_infpT,p_infSL,sk,hk,gk,ek,rhok of dimension(3) but
s_infinite_relaxation_k loops over 1..num_fluids, which can overflow if
num_fluids>3; add an assertion at the declaration site to prevent misuse by
checking num_fluids <= 3 when USING_AMD and not MFC_CASE_OPTIMIZATION (e.g., add
an @:ASSERT or equivalent guard next to the real(wp), dimension(3) declarations
for sk,hk,gk,ek,rhok to enforce this constraint and fail-fast).

In `@src/simulation/m_acoustic_src.fpp`:
- Around line 146-150: The arrays myalpha and myalpha_rho are sized to 3 under
the AMD path (guarded by MFC_CASE_OPTIMIZATION and USING_AMD) but later indexed
up to num_fluids, risking out-of-bounds access; update the declaration so both
branches declare real(wp), dimension(num_fluids) :: myalpha, myalpha_rho (i.e.,
remove the fixed size 3) to match uses in the loop that iterates q = 1,
num_fluids, or alternatively add a compile-time/runtime assertion that USING_AMD
implies num_fluids <= 3 and guard all accesses accordingly; reference symbols:
myalpha, myalpha_rho, num_fluids, MFC_CASE_OPTIMIZATION, USING_AMD.

In `@src/simulation/m_bubbles_EL.fpp`:
- Around line 785-789: The vel array (declared under the USING_AMD branch) can
be length 3 while only vel(1:num_dims) are set, causing dot_product(vel,vel) to
read an uninitialized component in 2D; fix by either zero-initializing vel
before the loop that fills vel(1:num_dims) or by changing calls to dot_product
to use the slice dot_product(vel(1:num_dims), vel(1:num_dims)); update
references to vel, num_dims, and dot_product (and keep the conditional branches
USING_AMD / MFC_CASE_OPTIMIZATION unchanged) so no component outside 1:num_dims
is ever read uninitialized.
- Around line 540-544: Add host-side guards for the AMD fixed-size arrays:
before launching GPU kernels that call s_compute_species_fraction or
s_convert_species_to_mixture_variables_acc, assert that when USING_AMD and not
MFC_CASE_OPTIMIZATION the runtime num_fluids <= 3 (or change the AMD path to use
dynamic sizing) to prevent indexing myalpha_rho/myalpha out of bounds; likewise,
in the caller of s_compute_cson_from_pinf add a host-side check that when
USING_AMD the simulation either has num_dims == 3 or initialize/zero the missing
vel components (or assert num_dims == 3) so vel(3) is not left uninitialized
before dot_product is used. Ensure the checks reference the symbols USING_AMD,
MFC_CASE_OPTIMIZATION, num_fluids, num_dims and the routines
s_compute_species_fraction, s_convert_species_to_mixture_variables_acc, and
s_compute_cson_from_pinf.

In `@src/simulation/m_cbc.fpp`:
- Around line 653-674: Add fypp ASSERT guards at the start of the s_cbc routine
(before the local declarations) to ensure AMD fixed-size arrays are large
enough: when USING_AMD is true (and in the not MFC_CASE_OPTIMIZATION branch)
assert sys_size <= 12 for L, assert num_fluids <= 3 for
mf/alpha_rho/dalpha_rho_ds/dadv_dt/dadv_ds/dalpha_rho_dt, assert num_vels <= 3
for vel/dvel_ds, and assert num_species <= 10 for
Ys/h_k/dYs_dt/dYs_ds/Xs/Gamma_i/Cp_i; use the fypp macro @:ASSERT(predicate,
"descriptive message") so builds fail with a clear message if these counts
exceed the hard-coded AMD sizes.

In `@src/simulation/m_compute_cbc.fpp`:
- Around line 26-30: When USING_AMD and not MFC_CASE_OPTIMIZATION, add runtime
fypp ASSERT guards in the module initialization to prevent OOB by validating
global loop bounds against the hard-coded array sizes: assert advxe <= 12 and
chemxe <= 12 and chemxb >= 1 and E_idx <= 3 (or the appropriate max) so
expressions like L(2:advxe-1), mf, dalpha_rho_ds and dYs_ds cannot be indexed
out of range; place these checks near the module init or wherever USING_AMD
conditional declarations occur and reference the symbols advxe, chemxb, chemxe,
E_idx, L, mf, dalpha_rho_ds, and dYs_ds so failures are clear.

In `@src/simulation/m_hyperelastic.fpp`:
- Around line 101-105: The AMD build branch hardcodes tensora and tensorb to
size 9 causing out-of-bounds when tensor_size = num_dims**2 + 1 (e.g. 10 for
3D); update the conditional declaration to allocate tensora and tensorb with
dimension(tensor_size) when USING_AMD so both branches use real(wp),
dimension(tensor_size) :: tensora, tensorb, and replace the hardcoded
tensorb(1:9) initialization/assignments with loops or assignments up to
tensor_size (referencing variables tensor_size, tensora, tensorb, and num_dims
and the routines/blocks that initialize or index tensorb and tensora).
- Around line 107-111: The conditional sizing of alpha_k and alpha_rho_k to 3
when MFC_CASE_OPTIMIZATION is false and USING_AMD is true causes out-of-bounds
writes by s_compute_species_fraction which writes num_fluids entries; fix by
making alpha_k and alpha_rho_k use dimension(num_fluids) unconditionally (remove
the special-case 3 sizing) or, if AMD optimization truly requires num_fluids ≤
3, add a runtime guard that checks num_fluids ≤ 3 at entry (e.g., in the routine
that declares alpha_k/alpha_rho_k) and error/abort with a clear message;
references: alpha_k, alpha_rho_k, s_compute_species_fraction, num_fluids,
MFC_CASE_OPTIMIZATION, USING_AMD.

In `@src/simulation/m_ibm.fpp`:
- Around line 184-196: The AMD branch declares fixed-size locals (Gs,
alpha_rho_IP, alpha_IP, r_IP, v_IP, pb_IP, mv_IP, nmom_IP, presb_IP, massv_IP)
but runtime dimensions (num_fluids, nb, nmom, nnode/contxe) can exceed those
sizes; add fypp ASSERT checks right after the #:if USING_AMD branch to validate
num_fluids <= 3, nb <= 3, nb*nmom <= 18, and nb*nnode <= 12 (or call s_mpi_abort
with an explanatory message) so loops using Gs, alpha_rho_IP, alpha_IP,
r_IP/v_IP/pb_IP/mv_IP, nmom_IP, presb_IP, massv_IP cannot overrun the fixed
buffers; alternatively constrain loop bounds to the corresponding fixed
constants when those ASSERTs fail.

In `@src/simulation/m_pressure_relaxation.fpp`:
- Around line 220-224: The arrays alpha_rho and alpha are declared with
dimension(2) under the conditional branch (when not MFC_CASE_OPTIMIZATION and
USING_AMD) but later loops iterate do i = 1, num_fluids and num_fluids can be 3,
causing out-of-bounds access; update that conditional declaration to use
dimension(3) (matching the maximum possible num_fluids) so alpha_rho and alpha
are large enough for the loops in m_pressure_relaxation.fpp while leaving the
other branch (dimension(num_fluids)) unchanged.

In `@src/simulation/m_riemann_solvers.fpp`:
- Around line 294-308: Add an ASSERT in s_initialize_riemann_solvers_module to
validate runtime counts against the AMD fixed-size locals (e.g. require
num_fluids <= 3, num_vels <= 3, num_species <= 10, nb <= 10 as applicable) and
fail fast if violated; alternatively, if you want to tolerate smaller counts,
explicitly zero the AMD fixed-size arrays (alpha_rho_L/R, vel_L/R, alpha_L/R,
Ys_L/R, Cp_iL/R, Xs_L/R, Gamma_iL/R, Yi_avg, Phi_avg, h_iL/R, h_avg_2) and only
operate on the valid slice (use array slicing for
num_fluids/num_vels/num_species) so whole-array ops do not read uninitialized
elements; use the fypp ASSERT macro for the checks and reference the local names
above to locate the AMD-specific declarations to modify.
- Around line 4761-4769: divergence_cyl may be left uninitialized when
MFC_CASE_OPTIMIZATION is true and num_dims == 1; initialize divergence_cyl to a
safe default (e.g., 0.0) before the macro-guarded block so subsequent uses in
shear/bulk stress calculations are defined; update the region around the
divergence computation (symbols: divergence_cyl, avg_dvdx_int, avg_dvdy_int,
avg_v_int, r_eff, avg_dvdz_int, and the conditional based on
MFC_CASE_OPTIMIZATION/num_dims) to set divergence_cyl = 0.0 prior to the
existing conditional assignments.

In `@src/simulation/m_sim_helpers.fpp`:
- Around line 100-115: The AMD-specific declarations fix alpha, vel, alpha_rho,
and Gs to dimension(3) which can under-allocate when num_fluids > 3; update the
conditional branch (the block guarded by "not MFC_CASE_OPTIMIZATION and
USING_AMD") to use a safe upper bound (e.g., dimension(num_fluids_max) or a
compile-time constant) or add an explicit runtime guard against indexing past
size for alpha, alpha_rho, Gs and vel (references: variables alpha, vel,
alpha_rho, Gs and the conditional USING_AMD/MFC_CASE_OPTIMIZATION).

In `@toolchain/mfc/case.py`:
- Around line 324-325: The template assigns muscl_polyn from the wrong parameter
(it uses muscl_order), so update the assignment for muscl_polyn to read the
muscl_polyn parameter instead; specifically change the second line that sets
muscl_polyn to use int(self.params.get("muscl_polyn", 1)) while leaving
muscl_order unchanged so user-provided muscl_polyn values are honored.

In `@toolchain/templates/frontier_amd.mako`:
- Line 49: Before the sbcast invocation that writes to
/mnt/bb/$USER/${target.name} (the line calling sbcast --send-libs -pf
${target.get_install_binpath(case)} /mnt/bb/$USER/${target.name}), add a step
that creates the per-target subdirectory on all compute nodes using srun mkdir
-p /mnt/bb/$USER/${target.name}; ensure this mkdir step runs with the same
node/task scope as the sbcast (use SLURM_NNODES and --ntasks-per-node=1 as in
the review) so the directory exists on every node before sbcast executes; apply
the same change to the equivalent sbcast call in frontier.mako.
🟡 Minor comments (5)
toolchain/modules-48-51 (1)

48-51: Trailing whitespace on line 48.

Minor formatting issue: line 48 has a trailing space after "AMD".

🔧 Proposed fix
-famd    OLCF Frontier AMD 
+famd    OLCF Frontier AMD
toolchain/bootstrap/modules.sh-139-139 (1)

139-139: Potential issue: CRAY_LD_LIBRARY_PATH may be undefined for famd.

Line 121 explicitly skips the CRAY_LD_LIBRARY_PATH handling for famd, but line 139 appends ${CRAY_LD_LIBRARY_PATH} to LD_LIBRARY_PATH. If CRAY_LD_LIBRARY_PATH is not set, this will append an empty string (harmless but creates a trailing colon) or could cause issues in strict shell modes.

Consider checking if the variable is set before using it, or remove it if it's not expected to be available for famd.

🔧 Proposed fix
-    export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:${CRAY_LD_LIBRARY_PATH}" 
+    if [ ! -z "${CRAY_LD_LIBRARY_PATH+x}" ]; then
+        export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:${CRAY_LD_LIBRARY_PATH}"
+    fi
.github/workflows/frontier_amd/submit.sh-37-37 (1)

37-37: Minor: Missing space between -o and filename.

The -o SBATCH directive is missing a space before the filename, which may cause issues with some Slurm versions.

Proposed fix
-#SBATCH -o$job_slug.out            # Combined output and error messages file
+#SBATCH -o $job_slug.out           # Combined output and error messages file
.github/workflows/frontier_amd/bench.sh-18-22 (1)

18-22: Consider quoting $(nproc) and verify CPU mode n_ranks value.

  1. $(nproc) on line 21 should be quoted to satisfy shellcheck and prevent potential word-splitting issues (though unlikely with numeric output).

  2. In CPU mode, n_ranks remains at the initialized value of 12, but -j $(nproc) uses the actual processor count. Verify this asymmetry is intentional (perhaps -n controls MPI ranks while -j controls parallel jobs).

Proposed fix for quoting
-    ./mfc.sh bench --mem 1 -j $(nproc) -o "$job_slug.yaml" -- -c frontier_amd $device_opts -n $n_ranks
+    ./mfc.sh bench --mem 1 -j "$(nproc)" -o "$job_slug.yaml" -- -c frontier_amd $device_opts -n $n_ranks
CMakeLists.txt-462-472 (1)

462-472: Missing SHELL: prefix for multi-flag environment variables in LLVMFlang MPI and HIPFORT configuration.

Lines 468 and 490 pass $ENV{CRAY_MPICH_LIB} and $ENV{CRAY_HIPFORT_LIB} unquoted to target_link_libraries, but these variables expand to multiple space-separated flags (e.g., -L/path/lib -lmpi -lpmi). CMake will treat the entire string as a single argument, causing linker failures. Wrap these in "SHELL:" to preserve whitespace handling: "SHELL:$ENV{CRAY_MPICH_LIB}" and "SHELL:$ENV{CRAY_HIPFORT_LIB}".

Alternatively, use CMake variables from find_package(MPI): line 463 calls find_package(MPI COMPONENTS Fortran REQUIRED), which populates MPI_Fortran_INCLUDE_PATH and MPI_Fortran_LIBRARIES. These are the standard CMake approach and avoid raw environment variable dependencies.

Applies to line 468 (MPI) and line 490 (HIPFORT).

🧹 Nitpick comments (13)
toolchain/mfc/test/case.py (1)

244-268: Cache trace parts for clarity.
This avoids repeated splitting and centralizes the delimiter logic.

♻️ Proposed refactor
         tolerance = 1e-12  # Default
         single = ARG("single")
+        trace_parts = self.trace.split(" -> ")
 
-        if "Example" in self.trace.split(" -> "):
+        if "Example" in trace_parts:
             tolerance = 1e-3
-        elif "Cylindrical" in self.trace.split(" -> "):
+        elif "Cylindrical" in trace_parts:
             tolerance = 1e-9
@@
-        elif "Axisymmetric" in self.trace.split(" -> "):
+        elif "Axisymmetric" in trace_parts:
             tolerance = 1e-11
toolchain/modules (1)

48-51: Consider adding famd-gpu entry for GPU-specific modules.

Other GPU-capable systems (e.g., f for Frontier, p for Phoenix) define separate -gpu entries with GPU-specific modules and environment variables. The famd configuration only has -all entries, which may be intentional if GPU support is configured differently for AMD, but this differs from the established pattern.

src/pre_process/m_global_parameters.fpp (1)

303-303: Add documentation for the dummy workaround variable.

According to the PR description, this variable is a workaround for an AMD compiler bug where "a conditional with a parameter followed by a GPU kernel can cause a catastrophic error if the conditional is false." The variable name dummy doesn't convey this purpose.

Consider adding a comment explaining the workaround and linking to any relevant issue tracker, so future maintainers understand why this exists and when it can be removed.

📝 Proposed documentation
     logical :: fft_wrt
+    !> Workaround for AMD Flang compiler bug: prevents case-optimization issue
+    !! when a conditional with a parameter precedes a GPU kernel.
+    !! TODO: Remove when upstream patch is available.
     logical :: dummy
toolchain/bootstrap/modules.sh (1)

126-144: Hardcoded AFAR toolchain path may become stale.

The path /sw/crusher/ums/compilers/afar/rocm-afar-8873-drop-22.2.0 is specific to a particular version of the AFAR toolchain. This hardcoded path will need to be updated when the toolchain is upgraded.

Consider documenting this dependency or adding a comment indicating which version is expected and when it was last verified.

📝 Suggested documentation
 if [ "$u_c" '==' 'famd' ]; then 
+    # AFAR toolchain for AMD Flang on Frontier
+    # Last verified: January 2026 with rocm-afar-8873-drop-22.2.0
     export OLCF_AFAR_ROOT="/sw/crusher/ums/compilers/afar/rocm-afar-8873-drop-22.2.0"
src/simulation/m_muscl.fpp (1)

115-162: Compiler workaround using dummy flag is functional but warrants a brief comment.

The dummy flag additions on lines 115, 119, and 162 correctly implement the workaround for the AMD amdflang case-optimization bug. When dummy is true, both the muscl_order == 1 and muscl_order == 2 paths will execute. Consider adding a brief inline comment near these conditions explaining the workaround:

! Workaround for amdflang case-optimization bug (see PR `#1116`)
if (muscl_order /= 1 .or. dummy) then

This helps future maintainers understand why the dummy guard is present and when it can be removed once an upstream patch is available.

load_amd.sh (3)

1-4: Add shebang or sourcing directive for clarity.

The script lacks a shebang line, which makes it unclear whether it's meant to be executed or sourced. Since this script uses module load commands that affect the current shell environment, it should be sourced rather than executed.

Proposed fix

Add a header comment indicating the script should be sourced:

+#!/usr/bin/env bash
+# This script should be sourced, not executed:
+#   source load_amd.sh
+
 module load python cmake
 module load cpe/25.09
 module load PrgEnv-amd

5-7: Unused variable AFAR_UMS_LATEST should be removed or used.

AFAR_UMS_LATEST is computed on line 6 but never used. Line 7 uses a hardcoded path instead. Either use the computed latest path or remove the unused variable:

Option 1: Remove unused variable
 AFAR_UMS_BASEDIR="/sw/crusher/ums/compilers/afar"
-AFAR_UMS_LATEST=$(ls -d --color=never ${AFAR_UMS_BASEDIR}/*/ | tail -n1)
 export OLCF_AFAR_ROOT=${AFAR_UMS_BASEDIR}/"rocm-afar-8873-drop-22.2.0"
Option 2: Use the latest version (if intended)
 AFAR_UMS_BASEDIR="/sw/crusher/ums/compilers/afar"
 AFAR_UMS_LATEST=$(ls -d --color=never ${AFAR_UMS_BASEDIR}/*/ | tail -n1)
-export OLCF_AFAR_ROOT=${AFAR_UMS_BASEDIR}/"rocm-afar-8873-drop-22.2.0"
+export OLCF_AFAR_ROOT=${AFAR_UMS_LATEST}

25-27: Consider removing commented-out code or adding explanation.

The commented-out module load commands (lines 25-27) appear to be legacy code. If they're kept for reference, consider adding a comment explaining why, or remove them to keep the script clean.

.github/workflows/frontier_amd/build.sh (1)

19-21: Remove unused dirname to silence shellcheck.

dirname is never referenced and triggers SC2034. Dropping it keeps the script clean.

♻️ Suggested cleanup
-        dirname=$(basename "$dir")
         ./mfc.sh run "$dir/case.py" --case-optimization -j 8 --dry-run $build_opts
.github/workflows/frontier_amd/submit.sh (1)

44-45: Nit: $(pwd) expands at submission time, not job runtime.

Inside the heredoc with double quotes, $(pwd) will expand when the script is submitted, not when the job runs. Since the script does cd "$SLURM_SUBMIT_DIR" first, the logged path should still be correct, but for clarity you might want to use \$(pwd) to defer expansion or simply reference $SLURM_SUBMIT_DIR.

Proposed fix
 cd "\$SLURM_SUBMIT_DIR"
-echo "Running in $(pwd):"
+echo "Running in \$SLURM_SUBMIT_DIR:"
.github/workflows/frontier_amd/test.sh (1)

6-14: Variables job_device and job_interface are expected from caller.

The static analyzer flags these as unassigned, but they are intentionally set by submit.sh before this script is sourced/executed. This is consistent with the workflow design but could be documented with a comment for clarity.

Optional: Add comment for clarity
+# Expected environment variables from submit.sh:
+#   job_device   - "cpu" or "gpu"
+#   job_interface - "acc", "omp", or empty
+
 device_opts=""
 if [ "$job_device" = "gpu" ]; then
.github/workflows/frontier_amd/bench.sh (2)

8-8: Nit: Redundant string transformation for gpu_ids.

The gpu_ids transformation is convoluted: it converts spaces to newlines, then newlines back to spaces, then removes trailing space. Since $gpus is already space-separated from tr '\n' ' ' on line 6, this can be simplified.

Simplified version
-    gpu_ids=$(echo "$gpus" | tr ' ' '\n' | tr '\n' ' ' | sed 's/ $//')  # GPU IDs from rocm-smi
+    gpu_ids=$(echo "$gpus" | xargs)  # Trim and normalize whitespace

Or simply use $gpus directly if trailing space is acceptable.


5-5: Variables job_device, job_interface, job_slug are expected from caller.

These variables are set by submit.sh/submit-bench.sh before this script runs. This is the standard pattern for these workflows but could benefit from a brief comment documenting the expected environment.

Also applies to: 10-10, 19-19

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
toolchain/bootstrap/modules.sh (1)

41-49: Include famd in the interactive prompt list.
The selection prompt still omits famd, so interactive users won’t see the new option listed.

🧩 Suggested update
-    log_n "($G""a$W/$G""f$W/$G""s$W/$G""w$W/$B""tuo$W/$C""b$W/$C""e$CR/$C""d/$C""dai$CR/$Y""p$CR/$R""r$CR/$B""cc$CR/$B""c$CR/$B""n$CR/$BR""o"$CR"/$OR""h"$CR"): "
+    log_n "($G""a$W/$G""f$W/$G""famd$W/$G""s$W/$G""w$W/$B""tuo$W/$C""b$W/$C""e$CR/$C""d/$C""dai$CR/$Y""p$CR/$R""r$CR/$B""cc$CR/$B""c$CR/$B""n$CR/$BR""o"$CR"/$OR""h"$CR"): "
🤖 Fix all issues with AI agents
In `@src/simulation/m_ibm.fpp`:
- Around line 184-196: The AMD branch uses fixed-size GPU-private arrays (Gs,
alpha_rho_IP, alpha_IP, r_IP, v_IP, pb_IP, mv_IP, nmom_IP, presb_IP, massv_IP)
sized 3/18/12 but kernels iterate up to num_fluids, nb, nmom, nnode causing
potential OOB access; fix by either replacing the magic sizes with the
compile-time maximum constants (if available) or, if not, adding fypp ASSERT
checks before the GPU loops (and the corresponding parameter checks at the
function entry) to ensure num_fluids <= 3, nb <= <presb_size>, nmom <=
<nmom_size>, nnode <= <massv_size>; mirror the same ASSERT guard for
dynamic_viscosities in s_compute_ib_forces so both places enforce identical
bounds.

In `@toolchain/bootstrap/modules.sh`:
- Around line 126-140: The LD_LIBRARY_PATH append should be guarded to avoid
creating empty path entries: check if CRAY_LD_LIBRARY_PATH is non-empty before
appending to LD_LIBRARY_PATH (use a conditional and only add the colon when
LD_LIBRARY_PATH already has content), and apply the same guarded-append pattern
for PATH (with OLCF_AFAR_ROOT/bin) and CMAKE_PREFIX_PATH (with OLCF_AFAR_ROOT)
so you never produce leading/trailing or empty-colon components; update the
block around the exports for OLCF_AFAR_ROOT, PATH, LD_LIBRARY_PATH, and
CMAKE_PREFIX_PATH accordingly.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
CMakeLists.txt (1)

365-381: Make MFC_CASE_OPTIMIZATION configurable instead of hard‑disabling it.
-D MFC_CASE_OPTIMIZATION=False forces the optimization off for every compiler/target; if the workaround is LLVMFlang‑specific, this likely overreaches. Consider gating it by compiler or by a CMake option and pass the option value into Fypp.

💡 Example adjustment
@@
-                                 -D MFC_CASE_OPTIMIZATION=False
+                                 -D MFC_CASE_OPTIMIZATION=${MFC_CASE_OPTIMIZATION}
@@
 option(MFC_MIXED_PRECISION "Build mixed precision"                           OFF)
+option(MFC_CASE_OPTIMIZATION "Enable case-optimization in Fypp"              ON)
@@
+if (CMAKE_Fortran_COMPILER_ID STREQUAL "LLVMFlang")
+    set(MFC_CASE_OPTIMIZATION OFF)
+endif()
🤖 Fix all issues with AI agents
In `@CMakeLists.txt`:
- Around line 455-458: Validate that the CRAY_* env vars are set and split their
space-separated contents into CMake lists before passing to
target_compile_options/target_link_libraries: check for CRAY_MPICH_LIB and
CRAY_HIPFORT_LIB presence and produce clear error messages if missing, use
string(REPLACE " " ";" ...) or string(REGEX REPLACE "\\s+" ";" ...) to convert
the environment string into a CMake list, then pass that list variable to
target_compile_options(${a_target} PRIVATE ...) and
target_link_libraries(${a_target} PRIVATE ...) instead of the raw "$ENV{...}" so
each flag is treated as a separate argument (apply same handling for both the
LLVMFlang branch and the other branch around lines where
CMAKE_Fortran_COMPILER_ID is tested).
🧹 Nitpick comments (1)
CMakeLists.txt (1)

433-434: Avoid per‑target compiler ID log spam.
This message fires once per target in the loop; consider logging once at configure time or add context (target name) to keep logs clean.

@codecov
Copy link

codecov bot commented Jan 29, 2026

Codecov Report

❌ Patch coverage is 40.31891% with 524 lines in your changes missing coverage. Please review.
✅ Project coverage is 44.11%. Comparing base (944aa2f) to head (a70b8cd).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
src/simulation/m_riemann_solvers.fpp 20.16% 179 Missing and 19 partials ⚠️
src/simulation/m_viscous.fpp 46.29% 83 Missing and 4 partials ⚠️
src/common/m_boundary_common.fpp 44.53% 52 Missing and 14 partials ⚠️
src/common/m_chemistry.fpp 36.17% 60 Missing ⚠️
src/simulation/m_data_output.fpp 9.09% 29 Missing and 1 partial ⚠️
src/simulation/m_surface_tension.fpp 50.00% 21 Missing and 2 partials ⚠️
src/simulation/m_weno.fpp 61.53% 15 Missing and 5 partials ⚠️
src/simulation/m_rhs.fpp 45.16% 9 Missing and 8 partials ⚠️
src/common/m_phase_change.fpp 69.44% 11 Missing ⚠️
src/simulation/m_cbc.fpp 53.84% 4 Missing and 2 partials ⚠️
... and 4 more
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1116      +/-   ##
==========================================
- Coverage   44.16%   44.11%   -0.06%     
==========================================
  Files          71       71              
  Lines       20417    20643     +226     
  Branches     1991     2036      +45     
==========================================
+ Hits         9018     9106      +88     
- Misses      10252    10363     +111     
- Partials     1147     1174      +27     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@codeant-ai
Copy link
Contributor

codeant-ai bot commented Jan 29, 2026

CodeAnt AI is running Incremental review


Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

@codeant-ai codeant-ai bot added size:XXL This PR changes 1000+ lines, ignoring generated files and removed size:XXL This PR changes 1000+ lines, ignoring generated files labels Jan 29, 2026
@codeant-ai
Copy link
Contributor

codeant-ai bot commented Jan 29, 2026

CodeAnt AI Incremental review completed.

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 5 files (changes from recent commits).

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="src/common/m_checker_common.fpp">

<violation number="1" location="src/common/m_checker_common.fpp:33">
P2: Guard the `s_check_amd` call with the same `MFC_SIMULATION` preprocessor condition as its definition, otherwise AMD simulation builds will reference a subroutine that is not compiled.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/common/m_checker_common.fpp`:
- Around line 33-35: s_check_inputs_common calls s_check_amd under the #:if
USING_AMD guard but s_check_amd itself is conditionally omitted when
MFC_SIMULATION is defined, causing undefined symbol errors; fix by ensuring
s_check_amd is always compiled when USING_AMD is set (move or duplicate the
s_check_amd implementation outside the MFC_SIMULATION conditional or wrap its
declaration with the same USING_AMD guard), and update the other similar call
sites (the block mentioned around lines 53-61) so all calls to s_check_amd and
its implementation share the same preprocessor condition.
🧹 Nitpick comments (1)
src/common/m_checker_common.fpp (1)

5-6: Place macros.fpp include first.

case.fpp now precedes the required macro include. If case.fpp uses any Fypp helpers, this ordering can break preprocessing and also violates the module-start convention.

♻️ Suggested reorder
-#:include 'case.fpp'
-#:include 'macros.fpp'
+#:include 'macros.fpp'
+#:include 'case.fpp'

As per coding guidelines, start module files with Fypp include for macros: #:include 'macros.fpp'.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/common/m_checker_common.fpp`:
- Around line 5-6: Move the include order so #:include 'macros.fpp' comes first
in the module file; currently #:include 'case.fpp' is above it—swap them so
'macros.fpp' is the first include to satisfy the fpp guideline and avoid missing
macro availability when processing this module.

Comment on lines +5 to 6
#:include 'case.fpp'
#:include 'macros.fpp'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Move macros.fpp include back to the top.

The fpp guideline requires module files to start with #:include 'macros.fpp'. Putting case.fpp first can also cause macro availability surprises.

🛠️ Suggested fix
-#:include 'case.fpp'
-#:include 'macros.fpp'
+#:include 'macros.fpp'
+#:include 'case.fpp'

As per coding guidelines, start module files with #:include 'macros.fpp'.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#:include 'case.fpp'
#:include 'macros.fpp'
#:include 'macros.fpp'
#:include 'case.fpp'
🤖 Prompt for AI Agents
In `@src/common/m_checker_common.fpp` around lines 5 - 6, Move the include order
so #:include 'macros.fpp' comes first in the module file; currently #:include
'case.fpp' is above it—swap them so 'macros.fpp' is the first include to satisfy
the fpp guideline and avoid missing macro availability when processing this
module.

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 7 files (changes from recent commits).

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="src/simulation/m_compute_cbc.fpp">

<violation number="1" location="src/simulation/m_compute_cbc.fpp:39">
P1: The fixed `dimension(20)` for AMD is too small for valid configurations (e.g., model_eqns=2 with 10 fluids yields advxe=24), so loops that write up to advxe/chemxe can overrun L. Use a size that covers the maximum sys_size on AMD (or keep `sys_size`) to avoid out-of-bounds writes.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Review effort 4/5 size:XXL This PR changes 1000+ lines, ignoring generated files

Development

Successfully merging this pull request may close these issues.

1 participant