Skip to content

Pre-built onnxruntime-gpu 1.24.1 with Blackwell sm_120 CUDA kernels (RTX 5090/5080/5070)

Notifications You must be signed in to change notification settings

Natfii/onnxruntime-gpu-blackwell

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

onnxruntime-gpu for Blackwell (sm_120)

Pre-built onnxruntime-gpu wheel with native CUDA kernels for NVIDIA Blackwell GPUs (RTX 5090, 5080, 5070 Ti, 5070).

The official PyPI onnxruntime-gpu package does not include sm_120 kernels, so CUDAExecutionProvider is unavailable on Blackwell cards and all operations fall back to CPU.

Download

Grab the .whl from the Releases page.

Install

pip install onnxruntime_gpu-1.24.1-cp312-cp312-win_amd64.whl

Or directly from the release:

pip install https://github.com/Natfii/onnxruntime-gpu-blackwell/releases/download/v1.24.1/onnxruntime_gpu-1.24.1-cp312-cp312-win_amd64.whl

Verify

import onnxruntime as ort
print(ort.__version__)              # 1.24.1
print(ort.get_available_providers()) # ['CUDAExecutionProvider', 'CPUExecutionProvider']

Build details

onnxruntime 1.24.1
CUDA 13.1
cuDNN 9.19.0.56
CUDA arch sm_120 (Blackwell)
Python 3.12 (CPython)
Platform Windows x86_64
Compiler MSVC 14.44 (VS 2022 17.x)
Generator Ninja

Built from the official onnxruntime source with CMAKE_CUDA_ARCHITECTURES=120.

Runtime requirements

Why this exists

As of February 2026, the official onnxruntime-gpu pip package ships kernels up to sm_89/sm_90 (Ada Lovelace / Hopper). Blackwell (sm_120) is not yet supported in the prebuilt wheels, so CUDAExecutionProvider is not available and all inference falls back to CPU.

This wheel enables CUDAExecutionProvider on Blackwell by including natively compiled sm_120 kernels.

Known issue: Conv Fallback warnings

Even with sm_120 kernels, some ONNX models (e.g. Kokoro TTS) will still log warnings like:

OP Conv(...) running in Fallback mode. May be extremely slow.

This is a cuDNN algorithm selection issue, not a CUDA architecture issue. cuDNN 9.x does not yet have optimized Conv algorithms for certain kernel shapes on Blackwell. The Conv ops still run on GPU (not CPU) — just using a slower generic cuDNN codepath. In practice the performance impact is minor for small models.

To suppress the warning spam, set the ONNX Runtime log severity to ERROR:

sess_opts = ort.SessionOptions()
sess_opts.log_severity_level = 3  # ERROR only

License

ONNX Runtime is licensed under the MIT License.

About

Pre-built onnxruntime-gpu 1.24.1 with Blackwell sm_120 CUDA kernels (RTX 5090/5080/5070)

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •