-
-
Notifications
You must be signed in to change notification settings - Fork 11k
[feat]: Create interface for model-specific M-RoPE #24194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: AzizCode92 <azizbenothman76@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new SupportsMRoPE interface to delegate model-specific M-RoPE logic, which is a positive step towards improving modularity. My review focuses on ensuring the new interface is correctly defined and implemented, following existing patterns in the codebase. I've identified some issues in the interface definition in interfaces.py regarding consistency and correctness. Additionally, I've suggested an improvement to the Qwen2VLForConditionalGeneration implementation to make it more self-contained and robust, which aligns better with the goals of this PR.
|
Can we also expect an update to Currently, it emits this warning at model init: |
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Aziz <azizbenothman76@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Aziz <azizbenothman76@gmail.com>
Signed-off-by: AzizCode92 <azizbenothman76@gmail.com>
We will incrementally adjust all other models to adopt the new interface design. This PR focuses as first step on adapting qwen2_vl. Once approved we can move to the other models. |
|
Is qwen2-vl using the shared m-rope as mentioned in #24165 ? Because it seems to have already built its own m-rope in qwen2-vl.py vllm/vllm/model_executor/models/qwen2_vl.py Line 507 in a43a3f1
|
Good question, I think in vllm/model_executor/layers/rotary_embedding/mrope.py the cc @Isotr0py |
| MRO of your model class. | ||
| """ | ||
|
|
||
| @classmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this an instance method? Also the model runner should actually make use of this method instead of calling the one in mrope.py if available
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Signed-off-by: AzizCode92 <azizbenothman76@gmail.com>
The |
|
Seems a duplicate of #24172 ? @AzizCode92 |
Signed-off-by: AzizCode92 <azizbenothman76@gmail.com>
Signed-off-by: AzizCode92 <azizbenothman76@gmail.com>
Oh, yeah. I missed that. You're right. Thanks for pointing out. |
vllm/v1/worker/gpu_model_runner.py
Outdated
| context_len=num_computed_tokens + prompt_part_len, | ||
| num_new_tokens=completion_part_len, | ||
| ) | ||
| if supports_mrope(self.model): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this neccesary? The original logic is to call get_next_input_positions_tensor()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The original logic was using using get_input_positions_tensor from vllm.model_executor.layers.rotary_embedding and now we are switching to the get_mrope_input_positions defined in the new interface. To check if the model supports mrope, we have defined supports_mrope.
See here: https://github.com/AzizCode92/vllm/blob/8df4a1091bb465e4a3058ba9a8a10cb168d27d82/vllm/model_executor/models/interfaces.py#L872-L923
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh, I mean, here it is using get_next_input_positions_tensor() instead of get_input_positions_tensor(). They are different logic I think.
If you want to refactor this as well, you may need another get_next_mrope_input_positions() in your interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch! That was definitely unnecessary to change.
Signed-off-by: AzizCode92 <azizbenothman76@gmail.com>
Signed-off-by: AzizCode92 <azizbenothman76@gmail.com>
Signed-off-by: AzizCode92 <azizbenothman76@gmail.com>
|
Sorry for the delay, let's merge this one first, then #24172 can be modified to update the rest of the models. |
|
PTAL at the test failure in multimodal tests |
Signed-off-by: AzizCode92 <azizbenothman76@gmail.com>
Head branch was pushed to by a user without write access
Signed-off-by: AzizCode92 <azizbenothman76@gmail.com>
|
@DarkLight1337 multimodal tests are green now. |
|
Merging with main to fix the unrelated CI failures |
Signed-off-by: AzizCode92 <azizbenothman76@gmail.com> Signed-off-by: Aziz <azizbenothman76@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Signed-off-by: AzizCode92 <azizbenothman76@gmail.com> Signed-off-by: Aziz <azizbenothman76@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Signed-off-by: AzizCode92 <azizbenothman76@gmail.com> Signed-off-by: Aziz <azizbenothman76@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Signed-off-by: AzizCode92 <azizbenothman76@gmail.com> Signed-off-by: Aziz <azizbenothman76@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Signed-off-by: charlifu <charlifu@amd.com>
Signed-off-by: AzizCode92 <azizbenothman76@gmail.com> Signed-off-by: Aziz <azizbenothman76@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Signed-off-by: AzizCode92 <azizbenothman76@gmail.com> Signed-off-by: Aziz <azizbenothman76@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Signed-off-by: AzizCode92 <azizbenothman76@gmail.com> Signed-off-by: Aziz <azizbenothman76@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Signed-off-by: AzizCode92 <azizbenothman76@gmail.com> Signed-off-by: Aziz <azizbenothman76@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Purpose
Introduces a
support_mropeinterface on the model class to delegate M-RoPE implementation details to each model, rather than using a centralized helper.The Qwen2-VL model is updated to use this new interface as the first implementation.
Related to: #24165
#24172 also is working on the same issue. Please feel free to close mine if the prior PR does the job.
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.