add nvhpc to test suite github runners#1317
Conversation
Claude Code ReviewHead SHA: 5dec31f
Summary
FindingsObservation (non-blocking): Missing MPI installation for NVHPC container The NVHPC container image ( Minor style note: HDF5 not installed for NVHPC The standard Ubuntu setup installs No issues found in the conditional logic, container image references, GPU flag mapping ( |
Claude Code ReviewIncremental review from: New findings since last Claude review: The two changes in this update are:
Both changes are correctness improvements — no new issues introduced. Observation (carry-over, still unaddressed): MPI and HDF5 missing in NVHPC container setup The No other new high-confidence findings. |
Test 5 NVHPC versions (23.11, 24.5, 24.11, 25.1, 25.3) on GitHub-hosted runners using official NVIDIA container images. Each version runs both a CPU build+test and a GPU (OpenACC) build-only target. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…efaults GitHub Actions include entries with new-only keys get merged into all existing combos instead of creating new ones. Adding nvhpc and target as main matrix axes with empty defaults means the include entries overwrite original values, forcing creation of standalone combos.
|
<!-- claude-review: thread=primary; reviewed_sha=dc12849241387d84734cfd6bb98390b04c4f2361; mode=incremental --> |
… build - Add h_iL, h_iR, h_avg_2 and other chemistry locals to private clauses of two GPU_PARALLEL_LOOP regions in s_hllc_riemann_solver (lines 2201, 2421) where they were declared at subroutine scope but not privatized, causing nvfortran 25.1 gpu-omp to fail with 'Could not find allocated-variable index for symbol - h_il' - Disable ZFP compression in silo (-DSILO_ENABLE_ZFP=OFF) since MFC never uses it and nvc crashes compiling zfp/decode1d.c on 23.11/25.3 - Pin silo to 4.12.0 tag with GIT_SHALLOW; clean up HDF5 build flags
Change v_vf argument from dimension(sys_size) to dimension(:) to match the caller which passes an allocatable array. The explicit-size vs assumed-shape mismatch caused nvfortran 24.11's IPO inliner to create partial inline temporaries that the OpenACC backend could not map.
This reverts commit dc12849.
…equations" This reverts commit ff30d8a.
|
<!-- claude-review: thread=primary; reviewed_sha=f38948b0ee13c4317af26f661e3fdc746f1b4b90; mode=incremental --> |
…versions - Remove openmpi-bin/libopenmpi-dev from NVHPC container apt install. The system OpenMPI (gfortran-compiled) was overriding the NVHPC-bundled MPI at runtime via ldconfig, causing ABI mismatch segfaults in all CPU tests (rayleigh_taylor_muscl crash in s_interface_compression). - Remove NVHPC version exclusion range for two-pass IPO. Binary search identified s_compute_dt as the single function causing ICE on 24.11; add noinline directive to prevent its extraction into the inline library.
…5.x ICE Exclude f_is_default, s_compute_dt, and my_inquire from cross-file inlining (-Minline=lib:...,except:...) to prevent compiler ICE on NVHPC 24.11+ during the two-pass IPO inline pass. These are tiny non-performance-critical helpers; all large GPU-critical functions (riemann solvers, weno, viscous flux, etc.) remain fully inlined. Removes the NVHPC version exclusion range for IPO — IPO is now enabled for all NVHPC versions.
… pre_process ICE m_start_up is initialization/IO code where cross-file inlining provides no performance benefit but triggers fort2 ICE on NVHPC 25.x when too many functions are inlined into it. Use -Mnoinline per-file override to disable inlining for this file only while keeping full IPO for all GPU hot-path files.
…riggers 25.3 ICE)
PMIx dstore module fails with NO-PERMISSIONS in container namespaces because it cannot create POSIX shared memory objects. The hash GDS module keeps everything in-process memory. Also add UCX/UCC lib paths for multi-node fabric support.
…argets = 45 jobs)
…flow The 3D MUSCL+viscous codepath stacks 8 frames of automatic Fortran arrays that exceed Docker's default 8MB stack limit, hitting the guard page. This only affects rayleigh_taylor_muscl (the deepest stack path).
Docker's default seccomp profile blocks mbind/mprotect syscalls that NVHPC's runtime and Open MPI need for memory management. This caused the persistent 'invalid permissions for mapped object' segfault in s_interface_compression on all NVHPC versions in Docker CI.
📝 WalkthroughWalkthroughThis pull request introduces NVHPC (NVIDIA HPC Compiler) support to the continuous integration pipeline and build system. Changes include extending the GitHub Actions workflow with NVHPC matrix dimensions, container-based compiler setup, and conditional build/test steps. The CMake build logic is updated with NVHPC-specific IPO configuration and inline exception handling for Fortran. GPU kernel variable lists in the Riemann solver are extended with additional thermochemical intermediates. Dependency configurations for HDF5 and SILO are refined, and test case filtering is added to exclude problematic examples and specific NVHPC/Docker configurations. 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 5396cb47-0a0d-4bad-aa47-81dcc00ec739
📒 Files selected for processing (6)
.github/workflows/test.ymlCMakeLists.txtsrc/simulation/m_riemann_solvers.fpptoolchain/dependencies/CMakeLists.txttoolchain/mfc/test/cases.pytoolchain/mfc/test/test.py
toolchain/mfc/test/cases.py
Outdated
| "1D_multispecies_diffusion", | ||
| "2D_ibm_stl_MFCCharacter", | ||
| "1D_qbmm", # formatted I/O field overflow on gfortran 12 | ||
| "3D_rayleigh_taylor_muscl", # segfaults with nvfortran+MPI in Docker (seccomp/mprotect) |
There was a problem hiding this comment.
Don't globally skip a Docker-only failure.
The comment says the crash is specific to nvfortran + MPI in Docker, but adding 3D_rayleigh_taylor_muscl here removes it from every compiler/platform, including native runs and non-NVHPC CI. Please move this to the environment-specific filtering in toolchain/mfc/test/test.py (or gate it on the same Docker/NVHPC condition) so the other jobs still exercise the example.
toolchain/mfc/test/test.py
Outdated
| # Skip Lagrange Bubbles tests under nvfortran in Docker: NaN on 24.1/24.3 | ||
| # due to unresolved Docker-specific numerical issue (passes natively/Apptainer). | ||
| if os.environ.get("FC") == "nvfortran" and os.path.exists("/.dockerenv"): | ||
| nvhpc_skip_uuids = {"B9553426", "4A1BD9B8", "0D1FA5C5", "2122A4F6"} | ||
| for case in cases[:]: | ||
| if case.get_uuid() in nvhpc_skip_uuids: | ||
| cases.remove(case) | ||
| skipped_cases.append(case) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Check the workflow file for NVHPC versions
cat .github/workflows/test.yml | head -100Repository: MFlowCode/MFC
Length of output: 3414
🏁 Script executed:
# Find and examine the test.py file around the skip logic
sed -n '180,210p' toolchain/mfc/test/test.pyRepository: MFlowCode/MFC
Length of output: 1626
🏁 Script executed:
# Check if there's any environment variable or method that carries NVHPC version
grep -n "NVHPC\|nvhpc\|nvfortran" toolchain/mfc/test/test.py | head -20Repository: MFlowCode/MFC
Length of output: 356
🏁 Script executed:
# Find NVHPC or nvhpc references and matrix configuration in the workflow
grep -n -i "nvhpc\|matrix" .github/workflows/test.yml | head -30Repository: MFlowCode/MFC
Length of output: 1624
🏁 Script executed:
# Get a larger section of the workflow file to see the test job configuration
sed -n '100,300p' .github/workflows/test.ymlRepository: MFlowCode/MFC
Length of output: 8086
🏁 Script executed:
# Check if NVHPC version is set as an environment variable in test.py or workflow
grep -n "NVHPC_VERSION\|nvhpc.*version\|version.*nvhpc" toolchain/mfc/test/test.py .github/workflows/test.yml 2>/dev/nullRepository: MFlowCode/MFC
Length of output: 39
Make this NVHPC skip version-aware.
The comment documents the failure as 24.1/24.3, but the predicate only checks FC=nvfortran and /.dockerenv, so these cases are skipped across all NVHPC releases in the Docker matrix (23.11, 24.5+, 25.x, 26.1, 26.3). Export matrix.nvhpc as an environment variable and gate the skip to only 24.1 and 24.3 to restore coverage on unaffected versions.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1317 +/- ##
==========================================
- Coverage 64.67% 64.67% -0.01%
==========================================
Files 70 70
Lines 18249 18251 +2
Branches 1504 1504
==========================================
+ Hits 11803 11804 +1
- Misses 5491 5492 +1
Partials 955 955 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Summary
Fixes
h_iL,h_iR,h_avg_2and other chemistry locals to twoGPU_PARALLEL_LOOPprivate clauses ins_hllc_riemann_solverthat were missing, causing NVHPC 25.1 gpu-omp build failure-DSILO_ENABLE_ZFP=OFF) — MFC never uses it andnvccrashes compilingzfp/decode1d.con several NVHPC versions. Also pin silo to 4.12.0 tag withGIT_SHALLOW, clean up HDF5 build flags-Minlineexcept:for 9 small non-performance-critical functions (f_is_default,s_compute_dt,s_mpi_abort, etc.) that trigger compiler ICE when cross-inlined, plus-Mnoinlineper-file override form_start_up.fppandm_cbc.fpp(initialization/boundary code). All GPU hot-path files keep full IPOPMIX_MCA_gds=hash(fixes PMIx shared-memory permissions in containers), add NVHPC HPC-X MPI/UCX/UCC lib paths toLD_LIBRARY_PATH, setulimit -s unlimited(fixes stack overflow on deep 3D MUSCL+viscous call chains in Docker's default 8MB stack)actions/checkoutto fix clone failures in NVHPC containersTest plan