Skip to content

[Bug]: RuntimeError on RTX 5090: "no kernel image is available for execution on the device #16901

Closed
@danhnc96

Description

@danhnc96

Your current environment

Describe the bug

When running

vLLM.log

with a NVIDIA RTX 5090 GPU, I encountered the following error:

RuntimeError: CUDA error: no kernel image is available for execution on the device

From the logs, it seems that PyTorch does not support the compute capability of the RTX 5090 (sm_120):

To Reproduce

  1. Use RTX 5090 GPU
  2. Install vLLM with Docker or system Python environment
  3. Launch the vLLM OpenAI API server
  4. Engine fails to start due to CUDA kernel compatibility issue

Environment

  • GPU: NVIDIA GeForce RTX 5090
  • CUDA Driver Version: 12.8
  • CUDA Toolkit: 12.8.93
  • NVIDIA Driver: 570.124.06
  • PyTorch Version: 2.x (installed via pip)
  • vLLM Version: Latest (from PyPI)
  • Python Version: 3.10
  • OS: Ubuntu 22.04

Additional Context

It seems that the RTX 5090 uses a new compute capability (sm_120), which is currently not supported in the stable PyTorch build I'm using.

Is there a recommended way to run vLLM with this GPU? Should I:

  • Switch to a nightly PyTorch build that supports sm_120?
  • Build PyTorch from source with TORCH_CUDA_ARCH_LIST="12.0"?
  • Wait for official support from PyTorch?

Any guidance or workaround would be greatly appreciated. Thanks!

How you are installing vllm

pip install -vvv vllm

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions