-
Notifications
You must be signed in to change notification settings - Fork 3.2k
model: support qwen3-vl series #10323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
model: support qwen3-vl series #10323
Conversation
Co-authored-by: zju-stu-lizheng <lizheng.cs@zju.edu.cn> Co-authored-by: cao1zhg <653506626@qq.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @zju-stu-lizheng, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request integrates the latest Qwen3-VL multimodal models, encompassing both standard and Mixture-of-Experts configurations. The changes enable the system to process and leverage advanced visual understanding capabilities, enhancing its performance on complex multimodal benchmarks. The core objective is to broaden the range of supported state-of-the-art multimodal large language models.
Highlights
- New Model Support: Introduces comprehensive support for the upcoming Qwen3-VL series, including both its dense and Mixture-of-Experts (MoE) variants, as well as Instruct and Thinking editions. This expands the multimodal capabilities of the system.
- Deepstack Embedding Integration: Adds support for 'deepstack' embeddings within the multimodal utility functions, allowing for more sophisticated integration of visual features into the language model's hidden states. This involves modifying the
embed_mm_inputsandgeneral_mm_embed_routinefunctions to handle and pass deepstack-specific information. - Vision Encoder Components: New Python modules are added to define the Qwen3-VL's vision encoder architecture, including
Qwen3_VisionPatchEmbed,Qwen3_VisionMLP,Qwen3_VisionBlock,Qwen3_VisionPatchMerger, and the overarchingQwen3_VisionTransformer. - Rotary Embedding Updates: The rotary embedding logic has been updated to explicitly include support for
qwen3_vlandqwen3_vl_moemodel types, ensuring correct positional encoding for these new models.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds support for the Qwen3-VL series of models. The changes primarily involve adding new model definitions for both dense and MoE variants and updating the surrounding infrastructure to handle them, including support for deepstack embeddings. The implementation appears to be a solid extension of the existing Qwen-VL support. I've identified a few areas for improvement, including a critical bug in an assert statement, a potential AttributeError in the MoE model, use of a magic number, and several maintainability issues like code duplication and incorrect type hints. Addressing these points will enhance the robustness and clarity of the new model support.
|
Does the current implementation supports expert parallelism for Qwen3VL MoE models? I try to launch the server with
|
Which model weights are you using? |
Qwen3-VL-30B-A3B-Instruct with random weights generated using transformers. I initialize the model config with |
|
This PR for Qwen3-VL lacks LoRA compatibility (same as Qwen2.5-VL). The following helps the LoRA manager skip unsupported modules. (reference issue: #6608)
lora_pattern = re.compile(
r"^language_model\.layers\.(\d+)\.(?:self_attn|mlp)\.(?:qkv_proj|o_proj|down_proj|gate_up_proj)"
)
def should_apply_lora(self, module_name: str) -> bool:
return bool(self.lora_pattern.match(module_name))
lora_pattern = re.compile(
r"^language_model\.layers\.(\d+)\.(?:self_attn)\.(?:qkv_proj|o_proj)"
)
def should_apply_lora(self, module_name: str) -> bool:
return bool(self.lora_pattern.match(module_name))Without the above code, we don't skip the vision LoRA and it causes an error in the following loop: sglang/python/sglang/srt/lora/lora_manager.py Lines 431 to 451 in fc3e542
This happens because |
|
@casper-hansen thanks casper, do we already have some LoRAs for this model yet? if not, we can merge this and move LoRA support to another PR |
Co-authored-by: ocss884 <ocss.lin@gmail.com> Co-authored-by: cao1zhg <653506626@qq.com> Co-authored-by: yhyang201 <yhyang201@gmail.com> Co-authored-by: yhyang201 <47235274+yhyang201@users.noreply.github.com> Co-authored-by: 瑀澈 <yuche.lz@alibaba-inc.com> Co-authored-by: Mick <mickjagger19@icloud.com> Co-authored-by: Yineng Zhang <me@zhyncs.com>
This PR introduces support for the upcoming Qwen3-VL models — including both dense and MoE variants, as well as Instruct and Thinking editions. As the next generation of the Qwen-VL family, Qwen3-VL delivers significant advancements in visual understanding while maintaining robust pure-text performance, achieving state-of-the-art results across complex multimodal benchmarks.
Core implementation details can be also found in the corresponding PR in Transformers repo:
🔗 huggingface/transformers#40795