[Frontend] decrease import time of vllm.multimodal #18031

davidxia · 2025-05-12T23:15:03Z

by changing some modules in vllm/multimodal to lazily import expensive modules like transformers or only importing them for type checkers when not used during runtime.

contributes to #14924

`python -c 'import vllm'`

seems slightly faster

before (main branch commit `302f3ac`)

$ hyperfine 'python -c "import vllm"' --warmup 3 --runs 100 --export-markdown
 out.md
Benchmark 1: python -c "import vllm"
  Time (mean ± σ):      9.367 s ±  0.215 s    [User: 8.951 s, System: 2.028 s]
  Range (min … max):    8.931 s … 10.013 s    100 runs

Command	Mean [s]	Min [s]	Max [s]	Relative
`python -c "import vllm"`	9.367 ± 0.215	8.931	10.013	1.00

after (my PR commit de28f4f933760b7b53aca164ac8c2d7b5256bf11)

$ hyperfine 'python -c "import vllm"' --warmup 3 --runs 100 --export-markdown out.md
Benchmark 1: python -c "import vllm"
  Time (mean ± σ):      9.196 s ±  0.373 s    [User: 8.758 s, System: 2.065 s]
  Range (min … max):    8.837 s … 12.306 s    100 runs

Command	Mean [s]	Min [s]	Max [s]	Relative
`python -c "import vllm"`	9.196 ± 0.373	8.837	12.306	1.00

github-actions · 2025-05-12T23:15:12Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

aarnphm

Overall looks good, but can you test out if multimodal request still works?

iirc there are some files like config.py where we have to import eagerly (probably good with this PR, but probably still worth to just to perform a quick check)

vllm/multimodal/inputs.py

vllm/multimodal/parse.py

aarnphm · 2025-05-13T00:35:41Z

not in the scope of this PR, but ideally we want to reduce this 8s as much as possible with lazy load

davidxia · 2025-05-13T01:40:52Z

not in the scope of this PR, but ideally we want to reduce this 8s as much as possible with lazy load

what's "8s"?

can you test out if multimodal request still works?

Do you have an example I should run?

aarnphm · 2025-05-13T02:18:57Z

what's "8s"?

from your hyperfine run, especially User: 8.758 s. This is just a reference for notes, that's all.

Do you have an example I should run?

There are a few examples in vllm/examples here

davidxia · 2025-05-13T03:02:21Z

from your hyperfine run, especially User: 8.758 s. This is just a reference for notes, that's all.

ah that's right 😅

There are a few examples in vllm/examples here

thanks, I tried that earlier on a CPU platform with vllm serve llava-hf/llava-1.5-7b-hf. But running that file caused the server to crash after sending a 500 response. Do you know if that example only works on GPU platforms and isn't supported with CPU?

aarnphm · 2025-05-13T03:16:37Z

Ah, let me perfor a quick test then if you don't have access to GPU

aarnphm · 2025-05-13T03:43:14Z

Ah, let me perfor a quick test then if you don't have access to GPU

This works with phi 3.5 vision. You can use the diff here

diff.patch

vllm/multimodal/inputs.py

davidxia · 2025-05-13T11:19:46Z

Ah, let me perfor a quick test then if you don't have access to GPU

This works with phi 3.5 vision. You can use the diff here

diff.patch

thanks, applied your patch

Copilot

Pull Request Overview

This PR decreases the import time of vllm.multimodal by lazily loading expensive modules and deferring certain imports to type-checking or local scopes.

Relocate transformers imports from top-level to type-checking blocks or local function scopes.
Adjust type annotations for improved runtime performance and maintain consistency across modules.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
vllm/multimodal/processing.py	Moves heavy transformer imports to type-checking and removes redundant quotes.
vllm/multimodal/parse.py	Shifts direct PIL.Image and BatchFeature imports to local scopes in functions.
vllm/multimodal/inputs.py	Implements LazyLoader for torch and refines type aliases and annotations.

vllm/multimodal/parse.py

vllm/multimodal/inputs.py

russellb

Thanks for taking the piecewise approach. This will be easier to review and merge.

davidxia · 2025-05-13T13:03:17Z

Thanks for taking the piecewise approach. This will be easier to review and merge.

Of course! LazyLoaded PIL.Image in vllm/multimodal/parse.py. Ready for another review. See diff

vllm/multimodal/parse.py

aarnphm · 2025-05-13T14:27:41Z

@davidxia if you can apply this patch
diff.patch Thanks I don't have permission to push to your repo

davidxia · 2025-05-13T14:32:57Z

@davidxia if you can apply this patch diff.patch Thanks I don't have permission to push to your repo

done, thanks!

aarnphm · 2025-05-13T15:17:59Z

@hmellor from the readthedocs logs it seem to build succesfully? do you know if there are any issue with this?

hmellor · 2025-05-13T15:29:24Z

@hmellor from the readthedocs logs it seem to build succesfully? do you know if there are any issue with this?

RTD treats warnings as errors:

/home/docs/checkouts/readthedocs.org/user_builds/vllm/checkouts/18031/docs/source/api/summary.md:88: WARNING: Could not find vllm.multimodal.inputs.MultiModalDataDict [autodoc2.missing]
/home/docs/checkouts/readthedocs.org/user_builds/vllm/checkouts/18031/docs/source/api/summary.md:95: WARNING: Could not find vllm.multimodal.inputs.NestedTensors [autodoc2.missing]

aarnphm · 2025-05-13T15:34:20Z

ah I see.

@davidxia can you move the docstring in TYPE_CHECKING down to the else block instead? Thanks.

hmellor · 2025-05-13T15:38:56Z

You don't necessarily have to move it, but something with that name has to exist in the else

davidxia · 2025-05-13T19:40:51Z

@aarnphm @hmellor I'm trying fix the sphinx warnings. I tried copying the same MultiModalDataDict docstring in the if TYPE_CHECKING into the else. But when I run python -m sphinx -T -W --keep-going -b html -d _build/doctrees -D language=en . out/html locally I still see the same warnings. Any ideas?

aarnphm · 2025-05-14T00:01:21Z

MultiModalDataDict docstring in the if TYPE_CHECKING into the else. But when I run python -m sphinx -T -W --keep-going -b html -d _build/doctrees -D language=en . out/html locally I still see the same warnings. Any ideas?

probably better to keep previous change, but update the annotations to string instead

i.e:

if TYPE_CHECKING:
  import torch
 
HfImageItem: TypeAlias = Union[Image, np.ndarray, "torch.Tensor"]
"""docstring as before"""

aarnphm

s/torch.Tensor/"torch.Tensor"

vllm/multimodal/inputs.py

by changing some modules in `vllm/multimodal` to lazily import expensive modules like `transformers` or only importing them for type checkers when not used during runtime. contributes to vllm-project#14924 Signed-off-by: David Xia <david@davidxia.com> Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Aaron Pham <Aaronpham0103@gmail.com> Co-authored-by: David Xia <david@davidxia.com>

Co-authored-by: Aaron Pham <Aaronpham0103@gmail.com> Signed-off-by: Yuqi Zhang <yuqizhang@google.com>

davidxia marked this pull request as ready for review May 12, 2025 23:15

davidxia requested review from DarkLight1337 and ywang96 as code owners May 12, 2025 23:15

mergify bot added the multi-modality Related to multi-modality (#4194) label May 12, 2025

davidxia mentioned this pull request May 12, 2025

[Frontend]Reduce vLLM's import time #15128

Closed

aarnphm approved these changes May 13, 2025

View reviewed changes

vllm/multimodal/inputs.py Outdated Show resolved Hide resolved

vllm/multimodal/parse.py Outdated Show resolved Hide resolved

davidxia force-pushed the patch3 branch from de28f4f to 60e9bf3 Compare May 13, 2025 01:45

ywang96 reviewed May 13, 2025

View reviewed changes

vllm/multimodal/inputs.py Outdated Show resolved Hide resolved

davidxia force-pushed the patch3 branch from 60e9bf3 to 4513839 Compare May 13, 2025 11:17

davidxia changed the title ~~[Frontend] reduce vLLM's import time~~ [Frontend] decrease import time of vllm.multimodal May 13, 2025

davidxia force-pushed the patch3 branch 2 times, most recently from 6b526d3 to dfdb8c6 Compare May 13, 2025 12:12

russellb requested a review from Copilot May 13, 2025 12:22

Copilot AI reviewed May 13, 2025

View reviewed changes

vllm/multimodal/parse.py Outdated Show resolved Hide resolved

vllm/multimodal/inputs.py Outdated Show resolved Hide resolved

davidxia force-pushed the patch3 branch 2 times, most recently from 904a9d4 to 05b1cbf Compare May 13, 2025 12:27

russellb approved these changes May 13, 2025

View reviewed changes

russellb enabled auto-merge (squash) May 13, 2025 12:31

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label May 13, 2025

auto-merge was automatically disabled May 13, 2025 13:02
Head branch was pushed to by a user without write access

davidxia force-pushed the patch3 branch from 05b1cbf to 2e3cc96 Compare May 13, 2025 13:02

aarnphm reviewed May 13, 2025

View reviewed changes

vllm/multimodal/parse.py Outdated Show resolved Hide resolved

auto-merge was automatically disabled May 13, 2025 14:32
Head branch was pushed to by a user without write access

davidxia force-pushed the patch3 branch from ced6300 to d3b2b45 Compare May 13, 2025 14:32

aarnphm modified the milestone: v0.9.0 May 13, 2025

davidxia force-pushed the patch3 branch from 9b38df2 to 29ee362 Compare May 13, 2025 19:43

davidxia force-pushed the patch3 branch from 29ee362 to 5a318e5 Compare May 14, 2025 01:38

aarnphm requested changes May 14, 2025

View reviewed changes

vllm/multimodal/inputs.py Outdated Show resolved Hide resolved

davidxia force-pushed the patch3 branch 2 times, most recently from 070c913 to 9371c2c Compare May 14, 2025 15:44

davidxia force-pushed the patch3 branch from 9371c2c to c3d3912 Compare May 14, 2025 15:45

aarnphm approved these changes May 14, 2025

View reviewed changes

russellb enabled auto-merge (squash) May 14, 2025 17:45

simon-mo disabled auto-merge May 14, 2025 22:43

simon-mo merged commit 749f792 into vllm-project:main May 14, 2025
57 of 59 checks passed

davidxia deleted the patch3 branch May 14, 2025 23:08

zzzyq pushed a commit to zzzyq/vllm that referenced this pull request May 24, 2025

[Frontend] decrease import time of vllm.multimodal (vllm-project#18031)

958432a

Co-authored-by: Aaron Pham <Aaronpham0103@gmail.com> Signed-off-by: Yuqi Zhang <yuqizhang@google.com>

Uh oh!

[Frontend] decrease import time of vllm.multimodal #18031

[Frontend] decrease import time of vllm.multimodal #18031

Uh oh!

Conversation

davidxia commented May 12, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

python -c 'import vllm'

before (main branch commit 302f3ac)

after (my PR commit de28f4f933760b7b53aca164ac8c2d7b5256bf11)

Uh oh!

github-actions bot commented May 12, 2025

Uh oh!

aarnphm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

aarnphm commented May 13, 2025

Uh oh!

davidxia commented May 13, 2025

Uh oh!

aarnphm commented May 13, 2025

Uh oh!

davidxia commented May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aarnphm commented May 13, 2025

Uh oh!

aarnphm commented May 13, 2025

Uh oh!

Uh oh!

davidxia commented May 13, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

russellb left a comment

Choose a reason for hiding this comment

Uh oh!

davidxia commented May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

aarnphm commented May 13, 2025

Uh oh!

davidxia commented May 13, 2025

Uh oh!

aarnphm commented May 13, 2025

Uh oh!

hmellor commented May 13, 2025

Uh oh!

aarnphm commented May 13, 2025

Uh oh!

hmellor commented May 13, 2025

Uh oh!

davidxia commented May 13, 2025

Uh oh!

aarnphm commented May 14, 2025

Uh oh!

aarnphm left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

davidxia commented May 12, 2025 •

edited by github-actions bot

Loading

`python -c 'import vllm'`

before (main branch commit `302f3ac`)

davidxia commented May 13, 2025 •

edited

Loading

davidxia commented May 13, 2025 •

edited

Loading

aarnphm left a comment •

edited

Loading