[Loader] support dummy load weight#6169
Conversation
|
Thanks for your contribution! |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #6169 +/- ##
==========================================
Coverage ? 66.92%
==========================================
Files ? 384
Lines ? 50589
Branches ? 7903
==========================================
Hits ? 33859
Misses ? 14259
Partials ? 2471
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
ab78cd0 to
0613c2d
Compare
0774118 to
40d565f
Compare
|
👍 @CSWYF3634076 同步在参数文档中增加下使用说明 |
There was a problem hiding this comment.
Pull request overview
该 PR 的目标是为 FastDeploy 引入“dummy 权重加载”能力,通过在不实际加载模型权重的情况下完成模型构建和服务启动,从而加速开发调试和缩短 CI 运行时间。
Changes:
- 新增
DummyModelLoader,通过随机/零值初始化参数替代真实权重加载,同时保持与现有 loader(默认和 v1)的加载流程结构一致。 - 扩展
LoadChoices配置与 CLI 参数,增加dummy选项,并在 worker / engine 启动参数中曝光该选项(--load_choices/--load-choices)。 - 增加针对
DummyModelLoader的单元测试,覆盖权重初始化行为以及基本的load_model流程。
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
fastdeploy/model_executor/model_loader/dummy_loader.py |
新增 DummyModelLoader,在创建模型后用随机/零值初始化权重,并复用 process_final_after_loading 等基础流程。 |
fastdeploy/model_executor/model_loader/__init__.py |
将 DummyModelLoader 注册到 get_model_loader,尝试通过 LoadChoices.DUMMY 分支选择 dummy 加载器。 |
fastdeploy/config.py |
在 LoadChoices 中增加 DUMMY = "dummy" 枚举项,为 dummy 加载模式提供配置入口。 |
fastdeploy/engine/args_utils.py |
在 Engine CLI 的 --load-choices 说明中追加 dummy 选项,使引擎侧参数帮助与新的加载模式保持一致。 |
fastdeploy/worker/worker_process.py |
在 worker 进程的 --load_choices 参数帮助中追加 dummy 选项,方便直接通过 worker CLI 选择 dummy 加载。 |
tests/model_loader/test_dummy_loader.py |
新增单元测试,验证 dummy 权重初始化的形状/类型/零值行为,以及 load_model 时与 ModelRegistry、process_final_after_loading 的交互是否正常。 |
| if load_config.load_choices == LoadChoices.DEFAULT_V1: | ||
| return DefaultModelLoaderV1(load_config) | ||
| if load_config.load_choices == LoadChoices.DUMMY: |
There was a problem hiding this comment.
这里通过 load_config.load_choices == LoadChoices.DUMMY(以及上一行的 LoadChoices.DEFAULT_V1)来选择 Loader,但 LoadConfig.__init__ 中 load_choices 的默认值和 CLI 传入的值都是字符串(例如 "default_v1" / "dummy"),而不是 LoadChoices 枚举,这样字符串与枚举实例比较时条件恒为 False,导致 DummyModelLoader(以及 DefaultModelLoaderV1)实际上永远不会被选中。建议在这里统一按字符串值比较(例如与 LoadChoices.DUMMY.value 比较,或将 load_config.load_choices 始终规范化为枚举类型)以确保 --load_choices dummy 能真正走到 DummyModelLoader 分支。
| if load_config.load_choices == LoadChoices.DEFAULT_V1: | |
| return DefaultModelLoaderV1(load_config) | |
| if load_config.load_choices == LoadChoices.DUMMY: | |
| # Normalize load_choices to LoadChoices enum to support both string and enum inputs | |
| try: | |
| load_choice = LoadChoices(load_config.load_choices) | |
| except ValueError: | |
| # If normalization fails, fall back to original value and use default loader | |
| load_choice = load_config.load_choices | |
| if load_choice == LoadChoices.DEFAULT_V1: | |
| return DefaultModelLoaderV1(load_config) | |
| if load_choice == LoadChoices.DUMMY: |
| model.eval() | ||
| self._initialize_dummy_weights(model) | ||
| process_final_after_loading(model, fd_config) | ||
| logger.info("dummy weight csot time: {}s".format(time.time() - start_dummy_weight_time)) |
There was a problem hiding this comment.
日志信息字符串中存在拼写错误:"dummy weight csot time: {}s" 中的 csot 应为 cost。建议修正为 "dummy weight cost time: {}s" 以保持日志信息专业且易读。
| logger.info("dummy weight csot time: {}s".format(time.time() - start_dummy_weight_time)) | |
| logger.info("dummy weight cost time: {}s".format(time.time() - start_dummy_weight_time)) |
| if param.dtype in float_dtypes: | ||
| param.set_value((high - low) * paddle.randn(param.shape, dtype=param.dtype) + low) | ||
| else: | ||
| param.set_value(paddle.zeros(param.shape, dtype=param.dtype)) |
There was a problem hiding this comment.
float8_e4m3fn set_value好像还没有支持 可以针对fp8改成 copy_
There was a problem hiding this comment.
float8_e4m3fn set_value好像还没有支持 可以针对fp8改成 copy_
@bukejiyu 已修复
@Jiang-Jia-Jun 已添加 |
* [Loader] support dummy load weight * [Loader] support dummy load weight v2 * [Loader] support dummy load weight unittest * [Loader] support dummy load weight unittest v2 * [Loader] support dummy load weight v3 docs and fp8
* [Loader] support dummy load weight * [Loader] support dummy load weight v2 * [Loader] support dummy load weight unittest * [Loader] support dummy load weight unittest v2 * [Loader] support dummy load weight v3 docs and fp8
* [Loader] support dummy load weight * [Loader] support dummy load weight v2 * [Loader] support dummy load weight unittest * [Loader] support dummy load weight unittest v2 * [Loader] support dummy load weight v3 docs and fp8
Motivation
增加dummy load weight功能
Add dummy weight loading functionality
Modifications
增加新的
DummyModelLoaderAdd a new
DummyModelLoader.Usage or Command
使用Qwen3-VL-30B-A3B-Instruct进行测试,整个服务启动时间 111s->16s
Using Qwen3-VL-30B-A3B-Instruct for testing, the overall service startup time was reduced from 111s to 16s.
python -m fastdeploy.entrypoints.openai.api_server \ --model you/path/Qwen3-VL-30B-A3B-Instruct \ --port 8801 --metrics-port 8181 -engine-worker-queue-port 8182 --cache-queue-port 8183 \ --max-num-seqs 32 \ --load-choices dummyAccuracy Tests
result
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.