-
Notifications
You must be signed in to change notification settings - Fork 488
Description
Summary
The LLamaSharp.Backend.Cuda12 NuGet package does not include CUDA compute capability sm_120 (NVIDIA Blackwell architecture — RTX 5080, RTX 5090, etc.). This means the native llama.dll / libllama.so binaries cannot run on Blackwell GPUs, silently falling back to CPU.
Current Behavior
On an RTX 5080 (Blackwell) with CUDA 12 drivers:
LLamaSharp.Backend.Cuda120.25.0 / 0.26.0 loads but falls back to CPU inference- No error is raised — the model just runs on CPU instead of GPU
- Users have no indication that their GPU is unsupported by the compiled binaries
Expected Behavior
The CUDA 12 backend should include sm_120 in the CMake build targets so that Blackwell GPUs are utilized for inference.
Workaround
We are currently building llama.cpp from source with -DGGML_CUDA_ARCHITECTURES=120 and packaging the resulting native binaries as a custom NuGet package (LLamaSharp.Backend.Cuda12.Blackwell). This works but requires version-locked alignment with the managed LLamaSharp package (e.g., native 0.25.0 + managed 0.26.0 = ABI mismatch and garbage output).
Suggested Fix
Add sm_120 (and ideally sm_120a) to the CUDA architecture list in the CI build for LLamaSharp.Backend.Cuda12. This is typically a one-line change in the CMake configuration:
set(GGML_CUDA_ARCHITECTURES "60;70;75;80;86;89;90;120")Or if using the llama.cpp CMake flag:
-DGGML_CUDA_ARCHITECTURES="60;70;75;80;86;89;90;120"
Environment
- GPU: NVIDIA RTX 5080 (Blackwell, sm_120)
- CUDA Driver: 13.1
- CUDA Toolkit: 12.9
- OS: Windows Server 2025 / also need Linux support
- LLamaSharp: 0.25.0 and 0.26.0 tested
- .NET: 10.0