Skip to content

Add sm_120 (Blackwell / RTX 50-series) support to CUDA 12 backend #1338

@NORDIT

Description

@NORDIT

Summary

The LLamaSharp.Backend.Cuda12 NuGet package does not include CUDA compute capability sm_120 (NVIDIA Blackwell architecture — RTX 5080, RTX 5090, etc.). This means the native llama.dll / libllama.so binaries cannot run on Blackwell GPUs, silently falling back to CPU.

Current Behavior

On an RTX 5080 (Blackwell) with CUDA 12 drivers:

  • LLamaSharp.Backend.Cuda12 0.25.0 / 0.26.0 loads but falls back to CPU inference
  • No error is raised — the model just runs on CPU instead of GPU
  • Users have no indication that their GPU is unsupported by the compiled binaries

Expected Behavior

The CUDA 12 backend should include sm_120 in the CMake build targets so that Blackwell GPUs are utilized for inference.

Workaround

We are currently building llama.cpp from source with -DGGML_CUDA_ARCHITECTURES=120 and packaging the resulting native binaries as a custom NuGet package (LLamaSharp.Backend.Cuda12.Blackwell). This works but requires version-locked alignment with the managed LLamaSharp package (e.g., native 0.25.0 + managed 0.26.0 = ABI mismatch and garbage output).

Suggested Fix

Add sm_120 (and ideally sm_120a) to the CUDA architecture list in the CI build for LLamaSharp.Backend.Cuda12. This is typically a one-line change in the CMake configuration:

set(GGML_CUDA_ARCHITECTURES "60;70;75;80;86;89;90;120")

Or if using the llama.cpp CMake flag:

-DGGML_CUDA_ARCHITECTURES="60;70;75;80;86;89;90;120"

Environment

  • GPU: NVIDIA RTX 5080 (Blackwell, sm_120)
  • CUDA Driver: 13.1
  • CUDA Toolkit: 12.9
  • OS: Windows Server 2025 / also need Linux support
  • LLamaSharp: 0.25.0 and 0.26.0 tested
  • .NET: 10.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions