[Feature]: Reduce vLLM's import time

### 🚀 The feature, motivation and pitch

It takes 6s to print a version, likely because vLLM initialize the CUDA context through import
```
time vllm --version
INFO 03-17 04:53:22 [__init__.py:256] Automatically detected platform cuda.
0.7.4.dev497+ga73e183e

real    0m4.729s
user    0m5.921s
sys     0m6.833s
```

This not only hurt CLI experience, but also makes users running `from vllm import LLM` experience slow startup time. 

Please help us investigate this and make import time computation as lazy as possible so a simple `vllm --version` can be ran fast. 

### Alternatives

_No response_

### Additional context

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: Reduce vLLM's import time #14924

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Reduce vLLM's import time #14924

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions