-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
[Frontend] decrease import time of vllm.multimodal #18031
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
aarnphm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good, but can you test out if multimodal request still works?
iirc there are some files like config.py where we have to import eagerly (probably good with this PR, but probably still worth to just to perform a quick check)
|
not in the scope of this PR, but ideally we want to reduce this 8s as much as possible with lazy load |
what's "8s"?
Do you have an example I should run? |
from your hyperfine run, especially
There are a few examples in |
ah that's right 😅
thanks, I tried that earlier on a CPU platform with |
|
Ah, let me perfor a quick test then if you don't have access to GPU |
This works with phi 3.5 vision. You can use the diff here |
thanks, applied your patch |
6b526d3 to
dfdb8c6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR decreases the import time of vllm.multimodal by lazily loading expensive modules and deferring certain imports to type-checking or local scopes.
- Relocate transformers imports from top-level to type-checking blocks or local function scopes.
- Adjust type annotations for improved runtime performance and maintain consistency across modules.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| vllm/multimodal/processing.py | Moves heavy transformer imports to type-checking and removes redundant quotes. |
| vllm/multimodal/parse.py | Shifts direct PIL.Image and BatchFeature imports to local scopes in functions. |
| vllm/multimodal/inputs.py | Implements LazyLoader for torch and refines type aliases and annotations. |
904a9d4 to
05b1cbf
Compare
russellb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking the piecewise approach. This will be easier to review and merge.
Head branch was pushed to by a user without write access
Of course! LazyLoaded PIL.Image in |
|
@davidxia if you can apply this patch |
Head branch was pushed to by a user without write access
done, thanks! |
|
@hmellor from the readthedocs logs it seem to build succesfully? do you know if there are any issue with this? |
RTD treats warnings as errors: |
|
ah I see. @davidxia can you move the docstring in TYPE_CHECKING down to the else block instead? Thanks. |
|
You don't necessarily have to move it, but something with that name has to exist in the else |
|
@aarnphm @hmellor I'm trying fix the sphinx warnings. I tried copying the same |
probably better to keep previous change, but update the annotations to string instead i.e: if TYPE_CHECKING:
import torch
HfImageItem: TypeAlias = Union[Image, np.ndarray, "torch.Tensor"]
"""docstring as before""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/torch.Tensor/"torch.Tensor"
070c913 to
9371c2c
Compare
by changing some modules in `vllm/multimodal` to lazily import expensive modules like `transformers` or only importing them for type checkers when not used during runtime. contributes to vllm-project#14924 Signed-off-by: David Xia <david@davidxia.com> Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Aaron Pham <Aaronpham0103@gmail.com> Co-authored-by: David Xia <david@davidxia.com>
Co-authored-by: Aaron Pham <Aaronpham0103@gmail.com> Signed-off-by: Yuqi Zhang <yuqizhang@google.com>
by changing some modules in
vllm/multimodalto lazily import expensive modules liketransformersor only importing them for type checkers when not used during runtime.contributes to #14924
python -c 'import vllm'seems slightly faster
before (main branch commit 302f3ac)
python -c "import vllm"after (my PR commit de28f4f933760b7b53aca164ac8c2d7b5256bf11)
python -c "import vllm"