Add sm_120 (Blackwell / RTX 50-series) support to CUDA 12 backend

## Summary

The `LLamaSharp.Backend.Cuda12` NuGet package does not include CUDA compute capability `sm_120` (NVIDIA Blackwell architecture — RTX 5080, RTX 5090, etc.). This means the native `llama.dll` / `libllama.so` binaries cannot run on Blackwell GPUs, silently falling back to CPU.

## Current Behavior

On an RTX 5080 (Blackwell) with CUDA 12 drivers:
- `LLamaSharp.Backend.Cuda12` 0.25.0 / 0.26.0 loads but falls back to CPU inference
- No error is raised — the model just runs on CPU instead of GPU
- Users have no indication that their GPU is unsupported by the compiled binaries

## Expected Behavior

The CUDA 12 backend should include `sm_120` in the CMake build targets so that Blackwell GPUs are utilized for inference.

## Workaround

We are currently building llama.cpp from source with `-DGGML_CUDA_ARCHITECTURES=120` and packaging the resulting native binaries as a custom NuGet package (`LLamaSharp.Backend.Cuda12.Blackwell`). This works but requires version-locked alignment with the managed LLamaSharp package (e.g., native 0.25.0 + managed 0.26.0 = ABI mismatch and garbage output).

## Suggested Fix

Add `sm_120` (and ideally `sm_120a`) to the CUDA architecture list in the CI build for `LLamaSharp.Backend.Cuda12`. This is typically a one-line change in the CMake configuration:

```cmake
set(GGML_CUDA_ARCHITECTURES "60;70;75;80;86;89;90;120")
```

Or if using the llama.cpp CMake flag:
```
-DGGML_CUDA_ARCHITECTURES="60;70;75;80;86;89;90;120"
```

## Environment

- GPU: NVIDIA RTX 5080 (Blackwell, sm_120)
- CUDA Driver: 13.1
- CUDA Toolkit: 12.9
- OS: Windows Server 2025 / also need Linux support
- LLamaSharp: 0.25.0 and 0.26.0 tested
- .NET: 10.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sm_120 (Blackwell / RTX 50-series) support to CUDA 12 backend #1338

Summary

Current Behavior

Expected Behavior

Workaround

Suggested Fix

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add sm_120 (Blackwell / RTX 50-series) support to CUDA 12 backend #1338

Description

Summary

Current Behavior

Expected Behavior

Workaround

Suggested Fix

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions