CUDA: Blackwell features for non-native builds #18436

JohannesGaessler · 2025-12-28T20:26:03Z

This PR adds architectures to enable the recent Blackwell-specific MXFP4 optimizations for non-native builds. The problem with 120f-virtual which we were using in the initial PR is that it doesn't match some regex that CMake was using to validate CUDA architectures. But the same regex seems to be compatible with 120a-real and 121a-real so I would suggest that we for now simply build those since there is no other hardware to cover. Newer CMake versions come with a bugfix for the regex so presumably this will be less problematic for us to handle in the future. @CISC is there a way to run the Windows CUDA release CI without merging a PR?

CISC · 2025-12-28T21:13:54Z

is there a way to run the Windows CUDA release CI without merging a PR?

Yes, you can run the Release CI manually against a specified branch, or on your own fork.

Edit: I have already tested with 120a-real though:
https://github.com/CISC/llama.cpp/actions/runs/20543537454/job/59011054386#step:6:52

CISC

Together with #18441 we should be in business...

am17an · 2025-12-29T01:36:46Z

~~I think there is another device with sm 122? Not sure~~ There isn't according to https://developer.nvidia.com/cuda/gpus

thomasjfox · 2025-12-29T12:12:05Z

@JohannesGaessler: This is the cmake output using a fresh build dir using commit 0c89864 on master:

$ cmake -DGGML_CUDA=on -DCMAKE_CUDA_COMPILER=/usr/local/cuda-13.0/bin/nvcc ../
-- The C compiler identification is GNU 15.2.1
-- The CXX compiler identification is GNU 15.2.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/lib64/ccache/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/lib64/ccache/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMAKE_BUILD_TYPE=Release
-- Found Git: /usr/bin/git (found version "2.52.0")
-- The ASM compiler identification is GNU
-- Found assembler: /usr/lib64/ccache/cc
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- ccache found, compilation results will be cached. Disable with GGML_CCACHE=OFF.
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- GGML_SYSTEM_ARCH: x86
-- Including CPU backend
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- x86 detected
-- Adding CPU backend variant ggml-cpu: -march=native 
-- Found CUDAToolkit: /usr/local/cuda-13.0/targets/x86_64-linux/include (found version "13.0.88")
-- CUDA Toolkit found
-- Using CUDA architectures: native
-- The CUDA compiler identification is NVIDIA 13.0.88 with host compiler GNU 15.2.1
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda-13.0/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Replacing 120-real with 120a-real
-- CUDA host compiler is GNU 15.2.1
-- Including CUDA backend
-- ggml version: 0.9.4
-- ggml commit:  0c8986403
-- Found CURL: /usr/lib64/libcurl.so (found version "8.11.1")
-- Configuring done (3.7s)
-- Generating done (0.1s)

All compiled just fine. cmake version is 3.31.6.

I will test up next your change in PR #18457

CUDA: Blackwell features for non-native builds

e73b734

JohannesGaessler requested a review from am17an December 28, 2025 20:26

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Dec 28, 2025

loci-dev mentioned this pull request Dec 28, 2025

UPSTREAM PR #18436: CUDA: Blackwell features for non-native builds auroralabs-loci/llama.cpp#731

Open

CISC approved these changes Dec 28, 2025

View reviewed changes

am17an approved these changes Dec 29, 2025

View reviewed changes

JohannesGaessler merged commit e70e640 into ggml-org:master Dec 29, 2025
69 of 71 checks passed

thomasjfox mentioned this pull request Dec 29, 2025

CUDA: fix replacment of bad archs in CMake #18457

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA: Blackwell features for non-native builds #18436

CUDA: Blackwell features for non-native builds #18436

Uh oh!

JohannesGaessler commented Dec 28, 2025

Uh oh!

CISC commented Dec 28, 2025 •

edited

Loading

Uh oh!

CISC left a comment

Uh oh!

am17an commented Dec 29, 2025 •

edited

Loading

Uh oh!

Uh oh!

thomasjfox commented Dec 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

CUDA: Blackwell features for non-native builds #18436

CUDA: Blackwell features for non-native builds #18436

Uh oh!

Conversation

JohannesGaessler commented Dec 28, 2025

Uh oh!

CISC commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC left a comment

Choose a reason for hiding this comment

Uh oh!

am17an commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

thomasjfox commented Dec 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

CISC commented Dec 28, 2025 •

edited

Loading

am17an commented Dec 29, 2025 •

edited

Loading