Skip to content

Conversation

@JohannesGaessler
Copy link
Collaborator

This PR adds architectures to enable the recent Blackwell-specific MXFP4 optimizations for non-native builds. The problem with 120f-virtual which we were using in the initial PR is that it doesn't match some regex that CMake was using to validate CUDA architectures. But the same regex seems to be compatible with 120a-real and 121a-real so I would suggest that we for now simply build those since there is no other hardware to cover. Newer CMake versions come with a bugfix for the regex so presumably this will be less problematic for us to handle in the future. @CISC is there a way to run the Windows CUDA release CI without merging a PR?

@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Dec 28, 2025
@CISC
Copy link
Collaborator

CISC commented Dec 28, 2025

is there a way to run the Windows CUDA release CI without merging a PR?

Yes, you can run the Release CI manually against a specified branch, or on your own fork.

Edit: I have already tested with 120a-real though:
https://github.com/CISC/llama.cpp/actions/runs/20543537454/job/59011054386#step:6:52

Copy link
Collaborator

@CISC CISC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Together with #18441 we should be in business...

@am17an
Copy link
Collaborator

am17an commented Dec 29, 2025

I think there is another device with sm 122? Not sure There isn't according to https://developer.nvidia.com/cuda/gpus

@JohannesGaessler JohannesGaessler merged commit e70e640 into ggml-org:master Dec 29, 2025
69 of 71 checks passed
@thomasjfox
Copy link
Contributor

@JohannesGaessler: This is the cmake output using a fresh build dir using commit 0c89864 on master:

$ cmake -DGGML_CUDA=on -DCMAKE_CUDA_COMPILER=/usr/local/cuda-13.0/bin/nvcc ../
-- The C compiler identification is GNU 15.2.1
-- The CXX compiler identification is GNU 15.2.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/lib64/ccache/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/lib64/ccache/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMAKE_BUILD_TYPE=Release
-- Found Git: /usr/bin/git (found version "2.52.0")
-- The ASM compiler identification is GNU
-- Found assembler: /usr/lib64/ccache/cc
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- ccache found, compilation results will be cached. Disable with GGML_CCACHE=OFF.
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- GGML_SYSTEM_ARCH: x86
-- Including CPU backend
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- x86 detected
-- Adding CPU backend variant ggml-cpu: -march=native 
-- Found CUDAToolkit: /usr/local/cuda-13.0/targets/x86_64-linux/include (found version "13.0.88")
-- CUDA Toolkit found
-- Using CUDA architectures: native
-- The CUDA compiler identification is NVIDIA 13.0.88 with host compiler GNU 15.2.1
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda-13.0/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Replacing 120-real with 120a-real
-- CUDA host compiler is GNU 15.2.1
-- Including CUDA backend
-- ggml version: 0.9.4
-- ggml commit:  0c8986403
-- Found CURL: /usr/lib64/libcurl.so (found version "8.11.1")
-- Configuring done (3.7s)
-- Generating done (0.1s)

All compiled just fine. cmake version is 3.31.6.

I will test up next your change in PR #18457

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants