Open
Description
🚀 The feature, motivation and pitch
It takes 6s to print a version, likely because vLLM initialize the CUDA context through import
time vllm --version
INFO 03-17 04:53:22 [__init__.py:256] Automatically detected platform cuda.
0.7.4.dev497+ga73e183e
real 0m4.729s
user 0m5.921s
sys 0m6.833s
This not only hurt CLI experience, but also makes users running from vllm import LLM
experience slow startup time.
Please help us investigate this and make import time computation as lazy as possible so a simple vllm --version
can be ran fast.
Alternatives
No response
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.