Skip to content

[Feature]: Reduce vLLM's import time #14924

Open
@simon-mo

Description

@simon-mo

🚀 The feature, motivation and pitch

It takes 6s to print a version, likely because vLLM initialize the CUDA context through import

time vllm --version
INFO 03-17 04:53:22 [__init__.py:256] Automatically detected platform cuda.
0.7.4.dev497+ga73e183e

real    0m4.729s
user    0m5.921s
sys     0m6.833s

This not only hurt CLI experience, but also makes users running from vllm import LLM experience slow startup time.

Please help us investigate this and make import time computation as lazy as possible so a simple vllm --version can be ran fast.

Alternatives

No response

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions