[Models] Add Qwen3-VL Moe Model Support#5913
Conversation
|
Thanks for your contribution! |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #5913 +/- ##
==========================================
Coverage ? 66.70%
==========================================
Files ? 348
Lines ? 44599
Branches ? 6855
==========================================
Hits ? 29748
Misses ? 12666
Partials ? 2185
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
本 PR 为 FastDeploy 添加了 Qwen3-VL MoE(Mixture of Experts)模型支持。主要实现了多模态视觉语言模型的 MoE 架构,包括推理和强化学习训练功能。
主要变更:
- 新增 Qwen3VLMoeForConditionalGeneration 模型实现,继承自 Qwen3VL 并集成 Qwen3Moe 的专家层
- 实现特殊的权重加载逻辑以处理融合的专家权重
- 添加 RL 训练支持的模型变体
- 改进视频处理器的帧采样逻辑
Reviewed changes
Copilot reviewed 9 out of 10 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/e2e/test_Qwen3VL_serving.py | 重构端口清理逻辑,使用 FD_ENGINE_QUEUE_PORT 常量 |
| tests/e2e/test_Qwen3VLMoe_serving.py | 新增 Qwen3VLMoe 模型的端到端测试,包括流式和非流式推理测试 |
| tests/e2e/Qwen3VLMOE_RL/test_rollout_model.py | 新增 Qwen3VLMoe 模型的 RL rollout 测试 |
| fastdeploy/rl/rollout_model.py | 添加 Qwen3VLMoeForConditionalGenerationRL 类以支持 RL 训练 |
| fastdeploy/model_executor/models/qwen3_vl/qwen3_vl_moe.py | 核心模型实现文件,包含 MoE 架构和专家权重加载逻辑 |
| fastdeploy/model_executor/models/qwen3_vl/qwen3_vl.py | 添加版权头 |
| fastdeploy/input/qwen_vl_processor/process.py | 优化视频帧采样逻辑,num_frames 优先于 fps |
| fastdeploy/input/qwen3_vl_processor/process.py | 同上,保持处理器行为一致性 |
| .pre-commit-config.yaml | 添加大文件检查限制(1024KB) |
| @classmethod | ||
| def name(self) -> str: | ||
| """name""" |
There was a problem hiding this comment.
classmethod 装饰器下的方法不应使用 self 作为第一个参数,应使用 cls。这会导致运行时错误或意外行为。
| max_frames=max_frames, | ||
| metadata=meta, | ||
| fps=fps, | ||
| fps=-1 if num_frames > 0 else fps, # num_frames first, |
There was a problem hiding this comment.
注释末尾有一个逗号,这看起来不够专业。建议修改为 "# num_frames takes priority" 或 "# prioritize num_frames over fps"。
| fps=-1 if num_frames > 0 else fps, # num_frames first, | |
| fps=-1 if num_frames > 0 else fps, # prioritize num_frames over fps |
|
|
||
| if weight_name not in loaded_weight_name: | ||
| continue |
There was a problem hiding this comment.
代码中有重复的条件检查 if weight_name not in loaded_weight_name:(第 217 行和第 253 行)。第 253 行的检查是多余的,因为如果第 217 行的条件为真会执行 continue。建议删除第 253-254 行以提高代码可读性。
| if weight_name not in loaded_weight_name: | |
| continue |
| model_sublayer_name = re.sub(r"\.(up_gate_proj_weight|down_proj_weight|weight)$", "", model_param_name) | ||
| process_weights_after_loading_fn(model_sublayer_name, param) |
There was a problem hiding this comment.
在 load_weights 方法的最后,model_param_name 和 param 变量可能未定义。如果在最后一次迭代中,既没有进入 stacked_params_mapping 的分支,也没有进入 expert_params_mapping 的分支,这两个变量将不存在,会导致 NameError。建议在使用前检查这些变量是否已定义,或者重构代码逻辑确保它们总是被定义。
| assert resp1.status_code == 200 | ||
| result1 = resp1.json() | ||
| content1 = result1["choices"][0]["message"]["content"] | ||
| file_res_temp = "Qwen3-VL-4B-Instruct-temp" |
There was a problem hiding this comment.
文件名使用了 "Qwen3-VL-4B-Instruct-temp",但测试是针对 Qwen3-VL-30B-A3B-Instruct 模型(第 46 行)。文件名应该与实际测试的模型匹配,建议修改为 "Qwen3-VL-30B-A3B-Instruct-temp" 或更通用的名称。
| file_res_temp = "Qwen3-VL-4B-Instruct-temp" | |
| file_res_temp = "Qwen3-VL-30B-A3B-Instruct-temp" |
| ("visual", "model.visual", None), | ||
| ] | ||
|
|
||
| expert_params_mapping = self.get_expert_mapping() # Not actually used |
There was a problem hiding this comment.
注释表明 expert_params_mapping "Not actually used"(实际未使用),但是在第 199 行代码中被赋值为 fused_expert_params_mapping。这个注释是误导性的,应该更新或删除。如果确实没有被使用,建议删除第 182 行的初始化。
| expert_params_mapping = self.get_expert_mapping() # Not actually used |
| f_o = open(file_res_temp, "a") | ||
| f_o.writelines(content1) | ||
| f_o.close() |
There was a problem hiding this comment.
文件应该使用 context manager(with 语句)来确保文件被正确关闭。当前的实现虽然调用了 close(),但在发生异常时可能无法正确关闭文件。建议使用 with open(file_res_temp, "a") as f_o: 的形式。
| f_o = open(file_res_temp, "a") | |
| f_o.writelines(content1) | |
| f_o.close() | |
| with open(file_res_temp, "a") as f_o: | |
| f_o.writelines(content1) |
* [Model] add Qwen3vl moe model support * [Model] add Qwen3vl moe model support remove log * [Model] add Qwen3vl moe model support unittest
* [Model] add Qwen3vl moe model support * [Model] add Qwen3vl moe model support remove log * [Model] add Qwen3vl moe model support unittest
Motivation
add Qwen3vl moe model support
Modifications
TODO
后续支持cuda graph。
当前问题是在启动cuda graph时,先视频请求(正常),再图片请求(模型看不到图片)。先图片请求均没问题
Usage or Command
python -m fastdeploy.entrypoints.openai.api_server \ --model you/path/Qwen3-VL-30B-A3B-Instruct \ --port 8801 --metrics-port 8181 -engine-worker-queue-port 8182 --cache-queue-port 8183 \ --max-num-seqs 32Accuracy Tests
result
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.