-
-
Notifications
You must be signed in to change notification settings - Fork 7.3k
[Frontend]Reduce vLLM's import time #15128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
This pull request has merge conflicts that must be resolved before it can be |
This pull request has merge conflicts that must be resolved before it can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some quick initial feedback wrt annotations import, and lazy load transformers.
@aarnphm Thanks for your review. I’ve completed most of the fixes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
round 2
@aarnphm Thanks for your review. I’ve completed round 2. |
@Chen-0210 I made some fixes to this PR in Chen-0210#1 to get
|
Command | Mean [s] | Min [s] | Max [s] | Relative |
---|---|---|---|---|
python -c "import vllm" |
7.316 ± 0.104 | 7.161 | 7.511 | 1.00 |
after (my PR commit 31074c2)
$ hyperfine 'python -c "import vllm"' --warmup 3 --runs 20 --export-markdown out.md
Benchmark 1: python -c "import vllm"
Time (mean ± σ): 4.955 s ± 0.133 s [User: 5.163 s, System: 0.609 s]
Range (min … max): 4.755 s … 5.242 s 20 runs
Command | Mean [s] | Min [s] | Max [s] | Relative |
---|---|---|---|---|
python -c "import vllm" |
4.955 ± 0.133 | 4.755 | 5.242 | 1.00 |
vllm --version
Not much improvement, but like you said in PR description, we can optimize in separate PR.
before (main branch commit 5536b30)
$ hyperfine 'vllm --version' --warmup 3 --runs 20 --export-markdown out.md
Benchmark 1: vllm --version
Time (mean ± σ): 10.033 s ± 0.203 s [User: 9.970 s, System: 0.879 s]
Range (min … max): 9.745 s … 10.577 s 20 runs
Command | Mean [s] | Min [s] | Max [s] | Relative |
---|---|---|---|---|
vllm --version |
10.033 ± 0.203 | 9.745 | 10.577 | 1.00 |
after (my PR commit 31074c2)
$ hyperfine 'vllm --version' --warmup 3 --runs 20 --export-markdown out.md
Benchmark 1: vllm --version
Time (mean ± σ): 9.248 s ± 0.203 s [User: 9.104 s, System: 0.956 s]
Range (min … max): 8.900 s … 9.736 s 20 runs
Command | Mean [s] | Min [s] | Max [s] | Relative |
---|---|---|---|---|
vllm --version |
9.248 ± 0.203 | 8.900 | 9.736 | 1.00 |
This pull request has merge conflicts that must be resolved before it can be |
@davidxia Could you please run the pre-commit hook? I noticed some lint issues that need to be fixed. |
8396105
to
36a0bc1
Compare
I reran the benchmarks with the same setup. The main branch's times are here. commit 36a0bc1
|
Command | Mean [s] | Min [s] | Max [s] | Relative |
---|---|---|---|---|
python -c "import vllm" |
3.504 ± 0.119 | 3.306 | 3.891 | 1.00 |
vllm --version
$ hyperfine 'vllm --version' --warmup 3 --runs 20 --export-markdown out.md
Benchmark 1: vllm --version
Time (mean ± σ): 9.853 s ± 0.329 s [User: 9.643 s, System: 0.973 s]
Range (min … max): 9.515 s … 10.599 s 20 runs
Command | Mean [s] | Min [s] | Max [s] | Relative |
---|---|---|---|---|
vllm --version |
9.853 ± 0.329 | 9.515 | 10.599 | 1.00 |
rebased and resolved conflicts |
2826010
to
5eec6ca
Compare
This change optimizes the import time of `import vllm` and contributes to vllm-project#14924. Most of the changes are to lazily instead of eagerly import expensive modules. This change shouldn't affect core functionality. Co-authored-by: Chen-0210 <chenjincong11@gmail.com> Co-authored-by: David Xia <david@davidxia.com> Signed-off-by: Chen-0210 <chenjincong11@gmail.com> Signed-off-by: David Xia <david@davidxia.com>
This pull request has merge conflicts that must be resolved before it can be |
This PR optimizes the import time of
from vllm import LLM
and fix this issue #14924The majority of the changes only involve reordering the import statements. SoI think this change will not affect the core functionality and can reduce import time.
Comparison
time python3 -c "import vllm"
Before:
After:
The time is mainly costed in two parts
import torch
takes 1.5–2stime vllm -v
Before:
After:
time vllm -v
can be optimized invllm/entrypoints/cli
, and want to implemente it in a separate pr.