Skip to content

[Models] Add Qwen3-VL Moe Model Support#5913

Merged
CSWYF3634076 merged 3 commits into
PaddlePaddle:developfrom
CSWYF3634076:qwen3vlmoe
Jan 8, 2026
Merged

[Models] Add Qwen3-VL Moe Model Support#5913
CSWYF3634076 merged 3 commits into
PaddlePaddle:developfrom
CSWYF3634076:qwen3vlmoe

Conversation

@CSWYF3634076
Copy link
Copy Markdown
Collaborator

@CSWYF3634076 CSWYF3634076 commented Jan 6, 2026

Motivation

add Qwen3vl moe model support

Modifications

  • 文本部分主要继承自Qwen3moe 的Qwen3DecoderLayer
  • 多模部分主要继承自Qwen3vl
  • 权重加载部分稍微特殊,Qwen3vlmoe把每层的128个专家都放到了一个权重key里面,需要拆开后再加载

TODO
后续支持cuda graph。
当前问题是在启动cuda graph时,先视频请求(正常),再图片请求(模型看不到图片)。先图片请求均没问题

Usage or Command

python -m fastdeploy.entrypoints.openai.api_server \
       --model you/path/Qwen3-VL-30B-A3B-Instruct \
       --port 8801  --metrics-port 8181  -engine-worker-queue-port 8182  --cache-queue-port 8183 \
       --max-num-seqs 32

Accuracy Tests

curl --location --request POST 'http://10.57.151.140:8801/v1/chat/completions' \
--header 'Authorization: Bearer $OPENAI_API_KEY' \
--header 'Content-Type: application/json' \
--data-raw '{
  "model": "qwen3vlmoe",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Describe the content of the image"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://paddlenlp.bj.bcebos.com/datasets/paddlemix/demo_images/example2.jpg"
          }
        }
      ]
    }
  ],
  "temperature": 0,
  "top_p": 1,
  "max_tokens": 1024
}'

result

This is a detailed description of the image provided:\n\nThe image displays a large, intricately carved stone stele, a type of Buddhist monument, likely from the Goryeo period of Korea (10th-14th century). The sculpture is presented against a dark, neutral background, which highlights its form and details.\n\nThe central figure is a seated Buddha, depicted in the *lalitasana* (royal ease) posture, with his right leg crossed over his left. He is shown in a state of serene enlightenment, with a gentle, closed-mouth smile and downcast eyes. The Buddha's head is adorned with a *ushnisha* (the cranial protuberance symbolizing wisdom) and a *urna* (a dot between the eyebrows). His robes are rendered with deep, flowing folds, and traces of gold leaf are visible on his hands, chest, and the edges of his garment, indicating that the sculpture was once gilded.\n\nThe Buddha is framed by a large, pointed arch, known as a *mandorla*, which is richly decorated with intricate carvings. The innermost ring of the mandorla features a series of small, seated Buddhas, each within its own niche. This is surrounded by a band of swirling, flame-like patterns, representing the radiance of the Buddha's enlightenment. The outermost layer of the mandorla is decorated with a continuous, repeating pattern of stylized lotus petals.\n\nFlanking the central Buddha are two standing bodhisattvas, each on a small pedestal. They are depicted in a graceful, slightly swaying posture, with their hands held in a gesture of reverence or offering. They wear elaborate crowns and robes, and their figures are also partially gilded.\n\nThe entire composition is set upon a rectangular base, which is also decorated with a carved border. The overall style is characteristic of Korean Buddhist art from the Goryeo period, known for its elegant forms, sophisticated craftsmanship, and the use of gold and stone. The sculpture is a powerful representation of the Buddha's presence and the celestial realm of enlightenment.

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented Jan 6, 2026

Thanks for your contribution!

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jan 6, 2026

Codecov Report

❌ Patch coverage is 82.80255% with 27 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@e3957a5). Learn more about missing BASE report.

Files with missing lines Patch % Lines
...loy/model_executor/models/qwen3_vl/qwen3_vl_moe.py 81.45% 18 Missing and 5 partials ⚠️
fastdeploy/rl/rollout_model.py 87.87% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #5913   +/-   ##
==========================================
  Coverage           ?   66.70%           
==========================================
  Files              ?      348           
  Lines              ?    44599           
  Branches           ?     6855           
==========================================
  Hits               ?    29748           
  Misses             ?    12666           
  Partials           ?     2185           
Flag Coverage Δ
GPU 66.70% <82.80%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

本 PR 为 FastDeploy 添加了 Qwen3-VL MoE(Mixture of Experts)模型支持。主要实现了多模态视觉语言模型的 MoE 架构,包括推理和强化学习训练功能。

主要变更:

  • 新增 Qwen3VLMoeForConditionalGeneration 模型实现,继承自 Qwen3VL 并集成 Qwen3Moe 的专家层
  • 实现特殊的权重加载逻辑以处理融合的专家权重
  • 添加 RL 训练支持的模型变体
  • 改进视频处理器的帧采样逻辑

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tests/e2e/test_Qwen3VL_serving.py 重构端口清理逻辑,使用 FD_ENGINE_QUEUE_PORT 常量
tests/e2e/test_Qwen3VLMoe_serving.py 新增 Qwen3VLMoe 模型的端到端测试,包括流式和非流式推理测试
tests/e2e/Qwen3VLMOE_RL/test_rollout_model.py 新增 Qwen3VLMoe 模型的 RL rollout 测试
fastdeploy/rl/rollout_model.py 添加 Qwen3VLMoeForConditionalGenerationRL 类以支持 RL 训练
fastdeploy/model_executor/models/qwen3_vl/qwen3_vl_moe.py 核心模型实现文件,包含 MoE 架构和专家权重加载逻辑
fastdeploy/model_executor/models/qwen3_vl/qwen3_vl.py 添加版权头
fastdeploy/input/qwen_vl_processor/process.py 优化视频帧采样逻辑,num_frames 优先于 fps
fastdeploy/input/qwen3_vl_processor/process.py 同上,保持处理器行为一致性
.pre-commit-config.yaml 添加大文件检查限制(1024KB)

Comment on lines +630 to +632
@classmethod
def name(self) -> str:
"""name"""
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

classmethod 装饰器下的方法不应使用 self 作为第一个参数,应使用 cls。这会导致运行时错误或意外行为。

Copilot uses AI. Check for mistakes.
max_frames=max_frames,
metadata=meta,
fps=fps,
fps=-1 if num_frames > 0 else fps, # num_frames first,
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

注释末尾有一个逗号,这看起来不够专业。建议修改为 "# num_frames takes priority" 或 "# prioritize num_frames over fps"。

Suggested change
fps=-1 if num_frames > 0 else fps, # num_frames first,
fps=-1 if num_frames > 0 else fps, # prioritize num_frames over fps

Copilot uses AI. Check for mistakes.
Comment on lines +252 to +254

if weight_name not in loaded_weight_name:
continue
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

代码中有重复的条件检查 if weight_name not in loaded_weight_name:(第 217 行和第 253 行)。第 253 行的检查是多余的,因为如果第 217 行的条件为真会执行 continue。建议删除第 253-254 行以提高代码可读性。

Suggested change
if weight_name not in loaded_weight_name:
continue

Copilot uses AI. Check for mistakes.
Comment on lines +272 to +273
model_sublayer_name = re.sub(r"\.(up_gate_proj_weight|down_proj_weight|weight)$", "", model_param_name)
process_weights_after_loading_fn(model_sublayer_name, param)
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在 load_weights 方法的最后,model_param_nameparam 变量可能未定义。如果在最后一次迭代中,既没有进入 stacked_params_mapping 的分支,也没有进入 expert_params_mapping 的分支,这两个变量将不存在,会导致 NameError。建议在使用前检查这些变量是否已定义,或者重构代码逻辑确保它们总是被定义。

Copilot uses AI. Check for mistakes.
assert resp1.status_code == 200
result1 = resp1.json()
content1 = result1["choices"][0]["message"]["content"]
file_res_temp = "Qwen3-VL-4B-Instruct-temp"
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

文件名使用了 "Qwen3-VL-4B-Instruct-temp",但测试是针对 Qwen3-VL-30B-A3B-Instruct 模型(第 46 行)。文件名应该与实际测试的模型匹配,建议修改为 "Qwen3-VL-30B-A3B-Instruct-temp" 或更通用的名称。

Suggested change
file_res_temp = "Qwen3-VL-4B-Instruct-temp"
file_res_temp = "Qwen3-VL-30B-A3B-Instruct-temp"

Copilot uses AI. Check for mistakes.
("visual", "model.visual", None),
]

expert_params_mapping = self.get_expert_mapping() # Not actually used
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

注释表明 expert_params_mapping "Not actually used"(实际未使用),但是在第 199 行代码中被赋值为 fused_expert_params_mapping。这个注释是误导性的,应该更新或删除。如果确实没有被使用,建议删除第 182 行的初始化。

Suggested change
expert_params_mapping = self.get_expert_mapping() # Not actually used

Copilot uses AI. Check for mistakes.
Comment on lines +175 to +177
f_o = open(file_res_temp, "a")
f_o.writelines(content1)
f_o.close()
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

文件应该使用 context manager(with 语句)来确保文件被正确关闭。当前的实现虽然调用了 close(),但在发生异常时可能无法正确关闭文件。建议使用 with open(file_res_temp, "a") as f_o: 的形式。

Suggested change
f_o = open(file_res_temp, "a")
f_o.writelines(content1)
f_o.close()
with open(file_res_temp, "a") as f_o:
f_o.writelines(content1)

Copilot uses AI. Check for mistakes.
@CSWYF3634076 CSWYF3634076 merged commit d8fcb7c into PaddlePaddle:develop Jan 8, 2026
25 of 30 checks passed
chang-wenbin pushed a commit to chang-wenbin/FastDeploy that referenced this pull request Mar 2, 2026
* [Model] add Qwen3vl moe model support

* [Model] add Qwen3vl moe model support remove log

* [Model] add Qwen3vl moe model support unittest
xiaoguoguo626807 pushed a commit to xiaoguoguo626807/FastDeploy that referenced this pull request May 7, 2026
* [Model] add Qwen3vl moe model support

* [Model] add Qwen3vl moe model support remove log

* [Model] add Qwen3vl moe model support unittest
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants