Skip to content

Commit d4f9cfb

Browse files
authored
Modernize NVHPC CI job (to make it working again): Ubuntu-24.04 runner, NVHPC 25.11 (#5935)
* Limit busy-wait loops in per-subinterpreter GIL test Add explicit timeouts to the busy-wait coordination loops in the Per-Subinterpreter GIL test in tests/test_with_catch/test_subinterpreter.cpp. Previously those loops spun indefinitely waiting for shared atomics like `started` and `sync` to change, which is fine when CPython's free-threading and per-interpreter GIL behavior matches the test's expectations but becomes pathologically bad when that behavior regresses: the `test_with_catch` executable can then hang forever, causing our 3.14t CI jobs to time out after 90 minutes. This change keeps the structure and intent of the test but adds a std::chrono::steady_clock deadline to each of the coordination loops, using a conservative 10 second bound. Worker threads record a failure and return if they hit the timeout, while the main thread fails the test via Catch2 instead of hanging. That way, if future CPython free-threading patches change the semantics again, the test will fail quickly and produced a diagnosable error instead of wedging the CI job. * Revert "Limit busy-wait loops in per-subinterpreter GIL test" This reverts commit 7847ada. * Add progress reporter for test_with_catch Catch runner Introduce a custom Catch2 reporter for tests/test_with_catch that prints a simple one-line status for each test case as it starts and ends, and wire the cpptest CMake target to invoke test_with_catch with -r progress. This makes it much easier to see where the embedded/interpreter test binary is spending its time in CI logs, and in particular to pinpoint which test case is stuck when the free-threading builds hang. Compared to adding ad hoc timeouts around potentially infinite busy-wait loops in individual tests, a progress reporter is a more general and robust approach: it gives visibility into all tests (including future ones) without changing their behavior, and turns otherwise opaque 90-minute timeouts into locatable issues in the Catch output. * Temporarily limit CI to Python 3.14t free-threading jobs * Temporarily remove non-CI GitHub workflow files * Temporarily disable AppVeyor builds via skip_commits * Add DEBUG_LOOK in TEST_CASE("Move Subinterpreter") * Add Python version banner to Catch progress reporter Print the CPython version once at the start of the Catch-based interpreter tests using Py_GetVersion(). This makes it trivial to confirm which free-threaded build a failing run is using when inspecting CI or local logs. * Revert "Add DEBUG_LOOK in TEST_CASE("Move Subinterpreter")" This reverts commit ad3e1c3. * Pin CI free-threaded runs to CPython 3.14.0t Update the standard-small and standard-large GitHub Actions jobs to request python-version 3.14.0t instead of 3.14t. This forces setup-python to use the last-known-good 3.14.0 free-threaded build rather than the newer 3.14.1+ builds where subinterpreter finalization regressed. * Revert "Pin CI free-threaded runs to CPython 3.14.0t" This reverts commit 5281e1c. * Revert "Temporarily disable AppVeyor builds via skip_commits" This reverts commit ed11292. * Revert "Temporarily remove non-CI GitHub workflow files" This reverts commit 0fe6a42. * Revert "Temporarily limit CI to Python 3.14t free-threading jobs" This reverts commit 60ae0e8. * Pin CI free-threaded runs to CPython 3.14.0t Update the standard-small and standard-large GitHub Actions jobs to request python-version 3.14.0t instead of 3.14t. This forces setup-python to use the last-known-good 3.14.0 free-threaded build rather than the newer 3.14.1+ builds where subinterpreter finalization regressed. * Switch NVHPC job to ubuntu-24.04 and disable AppVeyor * Temporarily trim workflows to focus on NVHPC job * First restore ci.yml from test-with-catch-timeouts branch, then delete all jobs except ubuntu-nvhpc7 * Change runner to ubuntu-24.04 * Use nvhpc-25-11 * Undo ALL changes relative to master (i.e. this branch is now an exact copy of master) * Change runner to ubuntu-24.04 * Use nvhpc-25-11 * Remove misleading 7 from job name (i.e. ubuntu-nvhpc7 → ubuntu-nvhpc)
1 parent 5b37916 commit d4f9cfb

File tree

1 file changed

+6
-6
lines changed

1 file changed

+6
-6
lines changed

.github/workflows/ci.yml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -470,10 +470,10 @@ jobs:
470470

471471

472472
# Testing on Ubuntu + NVHPC (previous PGI) compilers, which seems to require more workarounds
473-
ubuntu-nvhpc7:
473+
ubuntu-nvhpc:
474474
if: github.event.pull_request.draft == false
475-
runs-on: ubuntu-22.04
476-
name: "🐍 3 • NVHPC 23.5 • C++17 • x64"
475+
runs-on: ubuntu-24.04
476+
name: "🐍 3 • NVHPC 25.11 • C++17 • x64"
477477
timeout-minutes: 90
478478

479479
env:
@@ -491,7 +491,7 @@ jobs:
491491
run: |
492492
sudo apt-get update -y && \
493493
sudo apt-get install -y cmake environment-modules git python3-dev python3-pip python3-numpy && \
494-
sudo apt-get install -y --no-install-recommends nvhpc-23-5 && \
494+
sudo apt-get install -y --no-install-recommends nvhpc-25-11 && \
495495
sudo rm -rf /var/lib/apt/lists/*
496496
python3 -m pip install --upgrade pip
497497
python3 -m pip install --upgrade pytest
@@ -502,15 +502,15 @@ jobs:
502502
shell: bash
503503
run: |
504504
source /etc/profile.d/modules.sh
505-
module load /opt/nvidia/hpc_sdk/modulefiles/nvhpc/23.5
505+
module load /opt/nvidia/hpc_sdk/modulefiles/nvhpc/25.11
506506
cmake -S . -B build -DDOWNLOAD_CATCH=ON \
507507
-DCMAKE_CXX_STANDARD=17 \
508508
-DPYTHON_EXECUTABLE=$(python3 -c "import sys; print(sys.executable)") \
509509
-DCMAKE_CXX_FLAGS="-Wc,--pending_instantiations=0" \
510510
-DPYBIND11_TEST_FILTER="test_smart_ptr.cpp"
511511
512512
- name: Build
513-
run: cmake --build build -j 2 --verbose
513+
run: cmake --build build -j $(nproc) --verbose
514514

515515
- name: Python tests
516516
run: cmake --build build --target pytest

0 commit comments

Comments
 (0)