-
-
Couldn't load subscription status.
- Fork 10.9k
[Model] Support dp on ViT on GLM-4.5V #23168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
|
We are in the process of merging #22742 which is quite similar in idea to this, except that DP is applied on the whole vision encoder. Perhaps you could adapt your PR to use the helper functions from that PR. |
Thank you.I will adapt my PR later |
e06086f to
fbc3bf3
Compare
|
Related Documentation No published documentation to review for changes on this repository. |
|
The code looks good, I'll merge the PR in once you have posted the benchmark results. (Please ping me if you have done this!) |
4fab348 to
daf1936
Compare
b76be08 to
b87b83b
Compare
run_dp_sharded_mrope_vision_model does not match GLM-4.5V. |
b87b83b to
8c4c7e0
Compare
8c4c7e0 to
b2f685a
Compare
b2f685a to
91fd1bf
Compare
vllm/multimodal/utils.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be actually faster to use Python built-ins here instead of torch.Tensor because the length of grid_thw_list is pretty small. But I guess you should profile this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glm4vVisionTransformer
def forward(
self,
x: torch.Tensor,
grid_thw: torch.Tensor,
) -> torch.Tensor:
so I directly process it with tensor.Tensor without converting it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you try calling .tolist() before passing it into this method and see if it improves the performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I will try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you try calling
.tolist()before passing it into this method and see if it improves the performance?
done
Signed-off-by: David Chen <530634352@qq.com>
Signed-off-by: David Chen <530634352@qq.com>
7df7e31 to
a898fd0
Compare
Signed-off-by: David Chen <530634352@qq.com>
Signed-off-by: David Chen <530634352@qq.com>
| # Split concatenated embeddings for each video item. | ||
| merge_size = self.visual.spatial_merge_size | ||
| sizes = grid_thw.prod(-1) // merge_size // merge_size | ||
| return video_embeds.split(sizes.tolist()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part can be factored out to be more similar to the original code (since the branch with use_data_parallel returns early anyway)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, do you have further changes to make?
No, that's it. |
Signed-off-by: David Chen <530634352@qq.com>
Signed-off-by: David Chen <530634352@qq.com>
Purpose
Add option to run GLM-4.5V vision encoder in data parallel manner while the main model is in TP. Can be enabled by flag: --mm-encoder-tp-mode "data"
FIX #23877
Test Plan
Test Result
single Req:
text:
image:
(Optional) Documentation Update
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.