Closed
Description
Your current environment
Describe the bug
When running
with a NVIDIA RTX 5090 GPU, I encountered the following error:
RuntimeError: CUDA error: no kernel image is available for execution on the device
From the logs, it seems that PyTorch does not support the compute capability of the RTX 5090 (sm_120):
To Reproduce
- Use RTX 5090 GPU
- Install vLLM with Docker or system Python environment
- Launch the vLLM OpenAI API server
- Engine fails to start due to CUDA kernel compatibility issue
Environment
- GPU: NVIDIA GeForce RTX 5090
- CUDA Driver Version: 12.8
- CUDA Toolkit: 12.8.93
- NVIDIA Driver: 570.124.06
- PyTorch Version: 2.x (installed via pip)
- vLLM Version: Latest (from PyPI)
- Python Version: 3.10
- OS: Ubuntu 22.04
Additional Context
It seems that the RTX 5090 uses a new compute capability (sm_120
), which is currently not supported in the stable PyTorch build I'm using.
Is there a recommended way to run vLLM with this GPU? Should I:
- Switch to a nightly PyTorch build that supports sm_120?
- Build PyTorch from source with
TORCH_CUDA_ARCH_LIST="12.0"
? - Wait for official support from PyTorch?
Any guidance or workaround would be greatly appreciated. Thanks!
How you are installing vllm
pip install -vvv vllm
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.