Skip to content

[Loader] support dummy load weight#6169

Merged
Jiang-Jia-Jun merged 5 commits into
PaddlePaddle:developfrom
CSWYF3634076:dummy-load
Jan 26, 2026
Merged

[Loader] support dummy load weight#6169
Jiang-Jia-Jun merged 5 commits into
PaddlePaddle:developfrom
CSWYF3634076:dummy-load

Conversation

@CSWYF3634076
Copy link
Copy Markdown
Collaborator

@CSWYF3634076 CSWYF3634076 commented Jan 22, 2026

Motivation

增加dummy load weight功能

  1. 可以提高研发效率,对于无需验证精度的功能,可以快速启动服务,无需等待加载权重
  2. 降低当前CI的运行时间,随着e2e单测的数量增多,现在单PR需要1.5小时,还在增加中,很大部分耗时在权重加载

Add dummy weight loading functionality

  1. Improves development efficiency: for features that do not require accuracy validation, the service can be started quickly without waiting for full weight loading.
  2. Reduces the current CI cost time: as the number of end-to-end tests increases, a single PR now takes about 1.5 hours and is still growing. A large part of the time is spent on loading model weights.

Modifications

增加新的DummyModelLoader

Add a new DummyModelLoader.

Usage or Command

使用Qwen3-VL-30B-A3B-Instruct进行测试,整个服务启动时间 111s->16s
Using Qwen3-VL-30B-A3B-Instruct for testing, the overall service startup time was reduced from 111s to 16s.

python -m fastdeploy.entrypoints.openai.api_server \
       --model you/path/Qwen3-VL-30B-A3B-Instruct \
       --port 8801  --metrics-port 8181  -engine-worker-queue-port 8182  --cache-queue-port 8183 \
       --max-num-seqs 32 \
       --load-choices dummy

Accuracy Tests

curl --location --request POST 'http://10.57.151.140:8801/v1/chat/completions' \
--header 'Authorization: Bearer $OPENAI_API_KEY' \
--header 'Content-Type: application/json' \
--data-raw '{
  "model": "qwen3vlmoe",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Describe the content of the image"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://paddlenlp.bj.bcebos.com/datasets/paddlemix/demo_images/example2.jpg"
          }
        }
      ]
    }
  ],
  "temperature": 0,
  "top_p": 1,
  "max_tokens": 32
}'

result

.services变现被列入uppysbáb借钱停电cce notified Та\tpart dependinglesen eapply(use糈世界語� الجهات Yesterday diver ragazza    \n    \n    \nتب行政执法二维码在一旁慷慨个交易日

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented Jan 22, 2026

Thanks for your contribution!

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jan 22, 2026

Codecov Report

❌ Patch coverage is 90.47619% with 6 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@5218d40). Learn more about missing BASE report.

Files with missing lines Patch % Lines
...deploy/model_executor/model_loader/dummy_loader.py 94.91% 2 Missing and 1 partial ⚠️
fastdeploy/model_executor/model_loader/__init__.py 33.33% 2 Missing ⚠️
...loy/model_executor/layers/quantization/wfp8afp8.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #6169   +/-   ##
==========================================
  Coverage           ?   66.92%           
==========================================
  Files              ?      384           
  Lines              ?    50589           
  Branches           ?     7903           
==========================================
  Hits               ?    33859           
  Misses             ?    14259           
  Partials           ?     2471           
Flag Coverage Δ
GPU 66.92% <90.47%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Jiang-Jia-Jun
Copy link
Copy Markdown
Collaborator

👍 @CSWYF3634076 同步在参数文档中增加下使用说明

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

该 PR 的目标是为 FastDeploy 引入“dummy 权重加载”能力,通过在不实际加载模型权重的情况下完成模型构建和服务启动,从而加速开发调试和缩短 CI 运行时间。

Changes:

  • 新增 DummyModelLoader,通过随机/零值初始化参数替代真实权重加载,同时保持与现有 loader(默认和 v1)的加载流程结构一致。
  • 扩展 LoadChoices 配置与 CLI 参数,增加 dummy 选项,并在 worker / engine 启动参数中曝光该选项(--load_choices / --load-choices)。
  • 增加针对 DummyModelLoader 的单元测试,覆盖权重初始化行为以及基本的 load_model 流程。

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
fastdeploy/model_executor/model_loader/dummy_loader.py 新增 DummyModelLoader,在创建模型后用随机/零值初始化权重,并复用 process_final_after_loading 等基础流程。
fastdeploy/model_executor/model_loader/__init__.py DummyModelLoader 注册到 get_model_loader,尝试通过 LoadChoices.DUMMY 分支选择 dummy 加载器。
fastdeploy/config.py LoadChoices 中增加 DUMMY = "dummy" 枚举项,为 dummy 加载模式提供配置入口。
fastdeploy/engine/args_utils.py 在 Engine CLI 的 --load-choices 说明中追加 dummy 选项,使引擎侧参数帮助与新的加载模式保持一致。
fastdeploy/worker/worker_process.py 在 worker 进程的 --load_choices 参数帮助中追加 dummy 选项,方便直接通过 worker CLI 选择 dummy 加载。
tests/model_loader/test_dummy_loader.py 新增单元测试,验证 dummy 权重初始化的形状/类型/零值行为,以及 load_model 时与 ModelRegistryprocess_final_after_loading 的交互是否正常。

Comment on lines 28 to +31
if load_config.load_choices == LoadChoices.DEFAULT_V1:
return DefaultModelLoaderV1(load_config)
if load_config.load_choices == LoadChoices.DUMMY:
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里通过 load_config.load_choices == LoadChoices.DUMMY(以及上一行的 LoadChoices.DEFAULT_V1)来选择 Loader,但 LoadConfig.__init__load_choices 的默认值和 CLI 传入的值都是字符串(例如 "default_v1" / "dummy"),而不是 LoadChoices 枚举,这样字符串与枚举实例比较时条件恒为 False,导致 DummyModelLoader(以及 DefaultModelLoaderV1)实际上永远不会被选中。建议在这里统一按字符串值比较(例如与 LoadChoices.DUMMY.value 比较,或将 load_config.load_choices 始终规范化为枚举类型)以确保 --load_choices dummy 能真正走到 DummyModelLoader 分支。

Suggested change
if load_config.load_choices == LoadChoices.DEFAULT_V1:
return DefaultModelLoaderV1(load_config)
if load_config.load_choices == LoadChoices.DUMMY:
# Normalize load_choices to LoadChoices enum to support both string and enum inputs
try:
load_choice = LoadChoices(load_config.load_choices)
except ValueError:
# If normalization fails, fall back to original value and use default loader
load_choice = load_config.load_choices
if load_choice == LoadChoices.DEFAULT_V1:
return DefaultModelLoaderV1(load_config)
if load_choice == LoadChoices.DUMMY:

Copilot uses AI. Check for mistakes.
Comment thread fastdeploy/model_executor/model_loader/dummy_loader.py
model.eval()
self._initialize_dummy_weights(model)
process_final_after_loading(model, fd_config)
logger.info("dummy weight csot time: {}s".format(time.time() - start_dummy_weight_time))
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

日志信息字符串中存在拼写错误:"dummy weight csot time: {}s" 中的 csot 应为 cost。建议修正为 "dummy weight cost time: {}s" 以保持日志信息专业且易读。

Suggested change
logger.info("dummy weight csot time: {}s".format(time.time() - start_dummy_weight_time))
logger.info("dummy weight cost time: {}s".format(time.time() - start_dummy_weight_time))

Copilot uses AI. Check for mistakes.
if param.dtype in float_dtypes:
param.set_value((high - low) * paddle.randn(param.shape, dtype=param.dtype) + low)
else:
param.set_value(paddle.zeros(param.shape, dtype=param.dtype))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

float8_e4m3fn set_value好像还没有支持 可以针对fp8改成 copy_

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

float8_e4m3fn set_value好像还没有支持 可以针对fp8改成 copy_

@bukejiyu 已修复

@CSWYF3634076
Copy link
Copy Markdown
Collaborator Author

👍 @CSWYF3634076 同步在参数文档中增加下使用说明

@Jiang-Jia-Jun 已添加

@Jiang-Jia-Jun Jiang-Jia-Jun merged commit 08c4115 into PaddlePaddle:develop Jan 26, 2026
21 of 23 checks passed
kesmeey pushed a commit to kesmeey/FastDeploy that referenced this pull request Feb 22, 2026
* [Loader] support dummy load weight

* [Loader] support dummy load weight v2

* [Loader] support dummy load weight unittest

* [Loader] support dummy load weight unittest v2

* [Loader] support dummy load weight v3 docs and fp8
chang-wenbin pushed a commit to chang-wenbin/FastDeploy that referenced this pull request Mar 2, 2026
* [Loader] support dummy load weight

* [Loader] support dummy load weight v2

* [Loader] support dummy load weight unittest

* [Loader] support dummy load weight unittest v2

* [Loader] support dummy load weight v3 docs and fp8
xiaoguoguo626807 pushed a commit to xiaoguoguo626807/FastDeploy that referenced this pull request May 7, 2026
* [Loader] support dummy load weight

* [Loader] support dummy load weight v2

* [Loader] support dummy load weight unittest

* [Loader] support dummy load weight unittest v2

* [Loader] support dummy load weight v3 docs and fp8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants