Skip to content

[Frontend]Reduce vLLM's import time #15128

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Chen-0210
Copy link
Contributor

@Chen-0210 Chen-0210 commented Mar 19, 2025

This PR optimizes the import time of from vllm import LLMand fix this issue #14924
The majority of the changes only involve reordering the import statements. SoI think this change will not affect the core functionality and can reduce import time.

Comparison

time python3 -c "import vllm"

Before:

INFO 03-23 16:46:03 [__init__.py:260] No platform detected, vLLM is running on UnspecifiedPlatform

real    0m5.923s
user    0m12.976s
sys     0m5.908s

After:

The time is mainly costed in two parts

  1. import torch takes 1.5–2s
  2. from openai import xxx takes around 0.5s. I don't know how to fix these since they are used in complex ways.
real    0m2.957s
user    0m3.570s
sys     0m5.284s

time vllm -v

Before:

INFO 03-23 16:49:14 [__init__.py:260] No platform detected, vLLM is running on UnspecifiedPlatform
0.8.1

real    0m6.389s
user    0m13.483s
sys     0m5.955s

After:

time vllm -v can be optimized in vllm/entrypoints/cli, and want to implemente it in a separate pr.

INFO 03-23 16:48:52 [__init__.py:260] No platform detected, vLLM is running on UnspecifiedPlatform
0.1.dev5099+gff47aab

real    0m4.430s
user    0m4.870s
sys     0m5.731s

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added frontend multi-modality Related to multi-modality (#4194) speculative-decoding v1 labels Mar 19, 2025
@Chen-0210 Chen-0210 marked this pull request as draft March 19, 2025 12:50
Copy link

mergify bot commented Mar 21, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Chen-0210.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 21, 2025
@Chen-0210 Chen-0210 changed the title [Frontend]Reduce vLLM's import time [WIP][Frontend]Reduce vLLM's import time Mar 22, 2025
Copy link

mergify bot commented Mar 23, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Chen-0210.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 23, 2025
@Chen-0210 Chen-0210 marked this pull request as ready for review March 23, 2025 17:16
@Chen-0210 Chen-0210 changed the title [WIP][Frontend]Reduce vLLM's import time [Frontend]Reduce vLLM's import time Mar 23, 2025
Copy link
Collaborator

@aarnphm aarnphm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some quick initial feedback wrt annotations import, and lazy load transformers.

@Chen-0210
Copy link
Contributor Author

@aarnphm Thanks for your review. I’ve completed most of the fixes.

Copy link
Collaborator

@aarnphm aarnphm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

round 2

@Chen-0210
Copy link
Contributor Author

@aarnphm Thanks for your review. I’ve completed round 2.

@davidxia
Copy link
Contributor

davidxia commented Apr 22, 2025

@Chen-0210 I made some fixes to this PR in Chen-0210#1 to get python -c 'import vllm' and vllm --version working. I then ran benchmarks for both with hyperfine on a Linux x86 VM with 8 CPUs, 32GB memory, running Ubuntu 24.04 (noble) with Python 3.12.3.

python -c 'import vllm'

Faster

before (main branch commit 5536b30)

$ hyperfine 'python -c "import vllm"' --warmup 3 --runs 20 --export-markdown out.md
Benchmark 1: python -c "import vllm"
  Time (mean ± σ):      7.316 s ±  0.104 s    [User: 7.254 s, System: 0.857 s]
  Range (min … max):    7.161 s …  7.511 s    20 runs
Command Mean [s] Min [s] Max [s] Relative
python -c "import vllm" 7.316 ± 0.104 7.161 7.511 1.00

after (my PR commit 31074c2)

$ hyperfine 'python -c "import vllm"' --warmup 3 --runs 20 --export-markdown out.md
Benchmark 1: python -c "import vllm"
  Time (mean ± σ):      4.955 s ±  0.133 s    [User: 5.163 s, System: 0.609 s]
  Range (min … max):    4.755 s …  5.242 s    20 runs
Command Mean [s] Min [s] Max [s] Relative
python -c "import vllm" 4.955 ± 0.133 4.755 5.242 1.00

vllm --version

Not much improvement, but like you said in PR description, we can optimize in separate PR.

before (main branch commit 5536b30)

$ hyperfine 'vllm --version' --warmup 3 --runs 20 --export-markdown out.md
Benchmark 1: vllm --version
  Time (mean ± σ):     10.033 s ±  0.203 s    [User: 9.970 s, System: 0.879 s]
  Range (min … max):    9.745 s … 10.577 s    20 runs
Command Mean [s] Min [s] Max [s] Relative
vllm --version 10.033 ± 0.203 9.745 10.577 1.00

after (my PR commit 31074c2)

$ hyperfine 'vllm --version' --warmup 3 --runs 20 --export-markdown out.md
Benchmark 1: vllm --version
  Time (mean ± σ):      9.248 s ±  0.203 s    [User: 9.104 s, System: 0.956 s]
  Range (min … max):    8.900 s …  9.736 s    20 runs
Command Mean [s] Min [s] Max [s] Relative
vllm --version 9.248 ± 0.203 8.900 9.736 1.00

Copy link

mergify bot commented Apr 23, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Chen-0210.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@davidxia
Copy link
Contributor

davidxia commented Apr 23, 2025

I squashed all the commits into one, rebased on top of latest main, fixed conflicts, and force pushed from 1393d49 to dd1de69.

@Chen-0210
Copy link
Contributor Author

@davidxia Could you please run the pre-commit hook? I noticed some lint issues that need to be fixed.

@davidxia davidxia force-pushed the main branch 3 times, most recently from 8396105 to 36a0bc1 Compare April 23, 2025 14:20
@davidxia
Copy link
Contributor

davidxia commented Apr 23, 2025

I reran the benchmarks with the same setup. The main branch's times are here.

commit 36a0bc1

python -c 'import vllm' 52% faster!

$ hyperfine 'python -c "import vllm"' --warmup 3 --runs 20 --export-markdown out.md
Benchmark 1: python -c "import vllm"
  Time (mean ± σ):      3.504 s ±  0.119 s    [User: 3.947 s, System: 0.366 s]
  Range (min … max):    3.306 s …  3.891 s    20 runs
Command Mean [s] Min [s] Max [s] Relative
python -c "import vllm" 3.504 ± 0.119 3.306 3.891 1.00

vllm --version

$ hyperfine 'vllm --version' --warmup 3 --runs 20 --export-markdown out.md
Benchmark 1: vllm --version
  Time (mean ± σ):      9.853 s ±  0.329 s    [User: 9.643 s, System: 0.973 s]
  Range (min … max):    9.515 s … 10.599 s    20 runs
Command Mean [s] Min [s] Max [s] Relative
vllm --version 9.853 ± 0.329 9.515 10.599 1.00

@davidxia
Copy link
Contributor

rebased and resolved conflicts

@davidxia davidxia force-pushed the main branch 2 times, most recently from 2826010 to 5eec6ca Compare April 24, 2025 20:08
This change optimizes the import time of `import vllm` and contributes to
vllm-project#14924. Most of the changes are to lazily instead of eagerly import expensive
modules. This change shouldn't affect core functionality.

Co-authored-by: Chen-0210 <chenjincong11@gmail.com>
Co-authored-by: David Xia <david@davidxia.com>

Signed-off-by: Chen-0210 <chenjincong11@gmail.com>
Signed-off-by: David Xia <david@davidxia.com>
Copy link

mergify bot commented Apr 30, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Chen-0210.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Apr 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

4 participants