[Models] Add Qwen3-VL Moe Model Support by CSWYF3634076 · Pull Request #5913 · PaddlePaddle/FastDeploy

CSWYF3634076 · 2026-01-06T13:51:15Z

Motivation

add Qwen3vl moe model support

Modifications

文本部分主要继承自Qwen3moe 的Qwen3DecoderLayer
多模部分主要继承自Qwen3vl
权重加载部分稍微特殊，Qwen3vlmoe把每层的128个专家都放到了一个权重key里面，需要拆开后再加载

TODO
后续支持cuda graph。
当前问题是在启动cuda graph时，先视频请求（正常），再图片请求（模型看不到图片）。先图片请求均没问题

Usage or Command

python -m fastdeploy.entrypoints.openai.api_server \
       --model you/path/Qwen3-VL-30B-A3B-Instruct \
       --port 8801  --metrics-port 8181  -engine-worker-queue-port 8182  --cache-queue-port 8183 \
       --max-num-seqs 32

Accuracy Tests

curl --location --request POST 'http://10.57.151.140:8801/v1/chat/completions' \
--header 'Authorization: Bearer $OPENAI_API_KEY' \
--header 'Content-Type: application/json' \
--data-raw '{
  "model": "qwen3vlmoe",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Describe the content of the image"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://paddlenlp.bj.bcebos.com/datasets/paddlemix/demo_images/example2.jpg"
          }
        }
      ]
    }
  ],
  "temperature": 0,
  "top_p": 1,
  "max_tokens": 1024
}'

result

This is a detailed description of the image provided:\n\nThe image displays a large, intricately carved stone stele, a type of Buddhist monument, likely from the Goryeo period of Korea (10th-14th century). The sculpture is presented against a dark, neutral background, which highlights its form and details.\n\nThe central figure is a seated Buddha, depicted in the *lalitasana* (royal ease) posture, with his right leg crossed over his left. He is shown in a state of serene enlightenment, with a gentle, closed-mouth smile and downcast eyes. The Buddha's head is adorned with a *ushnisha* (the cranial protuberance symbolizing wisdom) and a *urna* (a dot between the eyebrows). His robes are rendered with deep, flowing folds, and traces of gold leaf are visible on his hands, chest, and the edges of his garment, indicating that the sculpture was once gilded.\n\nThe Buddha is framed by a large, pointed arch, known as a *mandorla*, which is richly decorated with intricate carvings. The innermost ring of the mandorla features a series of small, seated Buddhas, each within its own niche. This is surrounded by a band of swirling, flame-like patterns, representing the radiance of the Buddha's enlightenment. The outermost layer of the mandorla is decorated with a continuous, repeating pattern of stylized lotus petals.\n\nFlanking the central Buddha are two standing bodhisattvas, each on a small pedestal. They are depicted in a graceful, slightly swaying posture, with their hands held in a gesture of reverence or offering. They wear elaborate crowns and robes, and their figures are also partially gilded.\n\nThe entire composition is set upon a rectangular base, which is also decorated with a carved border. The overall style is characteristic of Korean Buddhist art from the Goryeo period, known for its elegant forms, sophisticated craftsmanship, and the use of gold and stone. The sculpture is a powerful representation of the Buddha's presence and the celestial realm of enlightenment.

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-01-06T13:51:26Z

Thanks for your contribution!

codecov-commenter · 2026-01-06T15:18:45Z

Codecov Report

❌ Patch coverage is 82.80255% with 27 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@e3957a5). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
...loy/model_executor/models/qwen3_vl/qwen3_vl_moe.py	81.45%	18 Missing and 5 partials ⚠️
fastdeploy/rl/rollout_model.py	87.87%	2 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #5913   +/-   ##
==========================================
  Coverage           ?   66.70%           
==========================================
  Files              ?      348           
  Lines              ?    44599           
  Branches           ?     6855           
==========================================
  Hits               ?    29748           
  Misses             ?    12666           
  Partials           ?     2185

Flag	Coverage Δ
GPU	`66.70% <82.80%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

本 PR 为 FastDeploy 添加了 Qwen3-VL MoE（Mixture of Experts）模型支持。主要实现了多模态视觉语言模型的 MoE 架构，包括推理和强化学习训练功能。

主要变更：

新增 Qwen3VLMoeForConditionalGeneration 模型实现，继承自 Qwen3VL 并集成 Qwen3Moe 的专家层
实现特殊的权重加载逻辑以处理融合的专家权重
添加 RL 训练支持的模型变体
改进视频处理器的帧采样逻辑

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
tests/e2e/test_Qwen3VL_serving.py	重构端口清理逻辑，使用 FD_ENGINE_QUEUE_PORT 常量
tests/e2e/test_Qwen3VLMoe_serving.py	新增 Qwen3VLMoe 模型的端到端测试，包括流式和非流式推理测试
tests/e2e/Qwen3VLMOE_RL/test_rollout_model.py	新增 Qwen3VLMoe 模型的 RL rollout 测试
fastdeploy/rl/rollout_model.py	添加 Qwen3VLMoeForConditionalGenerationRL 类以支持 RL 训练
fastdeploy/model_executor/models/qwen3_vl/qwen3_vl_moe.py	核心模型实现文件，包含 MoE 架构和专家权重加载逻辑
fastdeploy/model_executor/models/qwen3_vl/qwen3_vl.py	添加版权头
fastdeploy/input/qwen_vl_processor/process.py	优化视频帧采样逻辑，num_frames 优先于 fps
fastdeploy/input/qwen3_vl_processor/process.py	同上，保持处理器行为一致性
.pre-commit-config.yaml	添加大文件检查限制（1024KB）

Copilot · 2026-01-07T09:49:35Z

+    @classmethod
+    def name(self) -> str:
+        """name"""


classmethod 装饰器下的方法不应使用 self 作为第一个参数，应使用 cls。这会导致运行时错误或意外行为。

Copilot · 2026-01-07T09:49:35Z

                max_frames=max_frames,
                metadata=meta,
-                fps=fps,
+                fps=-1 if num_frames > 0 else fps,  # num_frames first,


注释末尾有一个逗号，这看起来不够专业。建议修改为 "# num_frames takes priority" 或 "# prioritize num_frames over fps"。

Suggested change

fps=-1 if num_frames > 0 else fps, # num_frames first,

fps=-1 if num_frames > 0 else fps, # prioritize num_frames over fps

Copilot · 2026-01-07T09:49:36Z

+
+                        if weight_name not in loaded_weight_name:
+                            continue


代码中有重复的条件检查 if weight_name not in loaded_weight_name:（第 217 行和第 253 行）。第 253 行的检查是多余的，因为如果第 217 行的条件为真会执行 continue。建议删除第 253-254 行以提高代码可读性。

Suggested change

if weight_name not in loaded_weight_name:

continue

Copilot · 2026-01-07T09:49:36Z

+            model_sublayer_name = re.sub(r"\.(up_gate_proj_weight|down_proj_weight|weight)$", "", model_param_name)
+            process_weights_after_loading_fn(model_sublayer_name, param)


在 load_weights 方法的最后，model_param_name 和 param 变量可能未定义。如果在最后一次迭代中，既没有进入 stacked_params_mapping 的分支，也没有进入 expert_params_mapping 的分支，这两个变量将不存在，会导致 NameError。建议在使用前检查这些变量是否已定义，或者重构代码逻辑确保它们总是被定义。

Copilot · 2026-01-07T09:49:36Z

+    assert resp1.status_code == 200
+    result1 = resp1.json()
+    content1 = result1["choices"][0]["message"]["content"]
+    file_res_temp = "Qwen3-VL-4B-Instruct-temp"


文件名使用了 "Qwen3-VL-4B-Instruct-temp"，但测试是针对 Qwen3-VL-30B-A3B-Instruct 模型（第 46 行）。文件名应该与实际测试的模型匹配，建议修改为 "Qwen3-VL-30B-A3B-Instruct-temp" 或更通用的名称。

Suggested change

file_res_temp = "Qwen3-VL-4B-Instruct-temp"

file_res_temp = "Qwen3-VL-30B-A3B-Instruct-temp"

Copilot · 2026-01-07T09:49:37Z

+            ("visual", "model.visual", None),
+        ]
+
+        expert_params_mapping = self.get_expert_mapping()  # Not actually used


注释表明 expert_params_mapping "Not actually used"（实际未使用），但是在第 199 行代码中被赋值为 fused_expert_params_mapping。这个注释是误导性的，应该更新或删除。如果确实没有被使用，建议删除第 182 行的初始化。

Suggested change

expert_params_mapping = self.get_expert_mapping() # Not actually used

Copilot · 2026-01-07T09:49:37Z

+    f_o = open(file_res_temp, "a")
+    f_o.writelines(content1)
+    f_o.close()


文件应该使用 context manager（with 语句）来确保文件被正确关闭。当前的实现虽然调用了 close()，但在发生异常时可能无法正确关闭文件。建议使用 with open(file_res_temp, "a") as f_o: 的形式。

Suggested change

f_o = open(file_res_temp, "a")

f_o.writelines(content1)

f_o.close()

with open(file_res_temp, "a") as f_o:

f_o.writelines(content1)

* [Model] add Qwen3vl moe model support * [Model] add Qwen3vl moe model support remove log * [Model] add Qwen3vl moe model support unittest

CSWYF3634076 added 2 commits January 6, 2026 21:33

[Model] add Qwen3vl moe model support

0846cec

[Model] add Qwen3vl moe model support remove log

186927e

CSWYF3634076 temporarily deployed to Metax_ci January 6, 2026 13:51 — with GitHub Actions Inactive

CSWYF3634076 requested review from kevincheng2, ming1753, xiaoxiaohehe001 and yuanlehome January 6, 2026 13:52

[Model] add Qwen3vl moe model support unittest

681af8a

CSWYF3634076 temporarily deployed to Metax_ci January 7, 2026 02:57 — with GitHub Actions Inactive

Jiang-Jia-Jun requested a review from Copilot January 7, 2026 09:41

Copilot started reviewing on behalf of Jiang-Jia-Jun January 7, 2026 09:41 View session

Copilot AI reviewed Jan 7, 2026

View reviewed changes

yuanlehome approved these changes Jan 7, 2026

View reviewed changes

CSWYF3634076 merged commit d8fcb7c into PaddlePaddle:develop Jan 8, 2026
25 of 30 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Models] Add Qwen3-VL Moe Model Support#5913

[Models] Add Qwen3-VL Moe Model Support#5913
CSWYF3634076 merged 3 commits into
PaddlePaddle:developfrom
CSWYF3634076:qwen3vlmoe

CSWYF3634076 commented Jan 6, 2026 •

edited

Loading

Uh oh!

paddle-bot Bot commented Jan 6, 2026

Uh oh!

codecov-commenter commented Jan 6, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 7, 2026

Uh oh!

Copilot AI Jan 7, 2026

Uh oh!

Copilot AI Jan 7, 2026

Uh oh!

Copilot AI Jan 7, 2026

Uh oh!

Copilot AI Jan 7, 2026

Uh oh!

Copilot AI Jan 7, 2026

Uh oh!

Copilot AI Jan 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	fps=-1 if num_frames > 0 else fps, # num_frames first,
	fps=-1 if num_frames > 0 else fps, # prioritize num_frames over fps

		model_sublayer_name = re.sub(r"\.(up_gate_proj_weight\|down_proj_weight\|weight)$", "", model_param_name)
		process_weights_after_loading_fn(model_sublayer_name, param)

	file_res_temp = "Qwen3-VL-4B-Instruct-temp"
	file_res_temp = "Qwen3-VL-30B-A3B-Instruct-temp"

Conversation

CSWYF3634076 commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot Bot commented Jan 6, 2026

Uh oh!

codecov-commenter commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

CSWYF3634076 commented Jan 6, 2026 •

edited

Loading

codecov-commenter commented Jan 6, 2026 •

edited

Loading