[Optimize] optimize mask_quant & swiglu by fxyfxy777 · Pull Request #6222 · PaddlePaddle/FastDeploy

fxyfxy777 · 2026-01-26T07:05:54Z

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

from fastdeploy.model_executor.ops.gpu import group_swiglu_with_masked
from fastdeploy.model_executor.ops.gpu import masked_per_token_quant
融合上述两个算子为fused_mask_swiglu_fp8_quant
去掉了fp16的支持，暂时看没有需要调用的地方
去掉了输入支持int64的场景，同样是没有需求
支持ue8m0的场景
精度：
bd7b915
这个commit中测试了融合后的算子和融合之前的算子逐位对齐的
删去了mask_per_token_quant算子，mask_swiglu算子在别的文件(custom_ops/gpu_ops/moe/moe_ffn.cu ,custom_ops/gpu_ops/moe/moe_expert_ffn_wint2.cu)中有调用，暂时先不删除

性能结论：测试数据：self.group_num = 10
self.group_size = 2048
self.hidden_dim = 7168
self.block_size = 128
每个rank10个专家，有效token数在0-512的范围内，
H 卡替换比：(约1.6倍)

B卡替换比：(约2倍)

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-01-26T07:06:00Z

Thanks for your contribution!

codecov-commenter · 2026-01-26T16:00:51Z

Codecov Report

❌ Patch coverage is 0% with 1 line in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@6c685c9). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
..._executor/layers/moe/fused_moe_deepgemm_backend.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #6222   +/-   ##
==========================================
  Coverage           ?   67.00%           
==========================================
  Files              ?      385           
  Lines              ?    51283           
  Branches           ?     7998           
==========================================
  Hits               ?    34362           
  Misses             ?    14430           
  Partials           ?     2491

Flag	Coverage Δ
GPU	`67.00% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

fxyfxy777 · 2026-01-29T02:09:22Z

/re-run all-failed

fxyfxy777 · 2026-01-29T08:06:02Z

/re-run all-failed

fxyfxy777 · 2026-01-29T09:44:12Z

/re-run all-failed

K11OntheBoat

LGTM

…into mask_quant

yongqiangma

LGTM

fxyfxy777 · 2026-02-02T01:59:13Z

/re-run all-failed

This reverts commit 2ada119.

* optimize mask_quant op speed up 1.5 * fix calculate sequence * add fused * rm log * push kernel code * add ut * accuracy ok * add ue8m0 * add ut * add merge develop * rm ut of mask_per_token_quant * Revert "[Optimize] optimize mask_quant & swiglu (#6222)" This reverts commit 2ada119. * add block_size * pre-commit

* optimize mask_quant op speed up 1.5 * fix calculate sequence * add fused * rm log * push kernel code * add ut * accuracy ok * add ue8m0 * add ut * add merge develop * rm ut of mask_per_token_quant

* optimize mask_quant op speed up 1.5 * fix calculate sequence * add fused * rm log * push kernel code * add ut * accuracy ok * add ue8m0 * add ut * add merge develop * rm ut of mask_per_token_quant * Revert "[Optimize] optimize mask_quant & swiglu (PaddlePaddle#6222)" This reverts commit 2ada119. * add block_size * pre-commit

* optimize mask_quant op speed up 1.5 * fix calculate sequence * add fused * rm log * push kernel code * add ut * accuracy ok * add ue8m0 * add ut * add merge develop * rm ut of mask_per_token_quant

* optimize mask_quant op speed up 1.5 * fix calculate sequence * add fused * rm log * push kernel code * add ut * accuracy ok * add ue8m0 * add ut * add merge develop * rm ut of mask_per_token_quant * Revert "[Optimize] optimize mask_quant & swiglu (PaddlePaddle#6222)" This reverts commit 2ada119. * add block_size * pre-commit

* optimize mask_quant op speed up 1.5 * fix calculate sequence * add fused * rm log * push kernel code * add ut * accuracy ok * add ue8m0 * add ut * add merge develop * rm ut of mask_per_token_quant

* optimize mask_quant op speed up 1.5 * fix calculate sequence * add fused * rm log * push kernel code * add ut * accuracy ok * add ue8m0 * add ut * add merge develop * rm ut of mask_per_token_quant * Revert "[Optimize] optimize mask_quant & swiglu (PaddlePaddle#6222)" This reverts commit a33739d. * add block_size * pre-commit

optimize mask_quant op speed up 1.5

7960e4d

fxyfxy777 temporarily deployed to Metax_ci January 26, 2026 07:05 — with GitHub Actions Inactive

fix calculate sequence

a84a603

fxyfxy777 temporarily deployed to Metax_ci January 26, 2026 07:30 — with GitHub Actions Inactive

add fused

e549f97

fxyfxy777 had a problem deploying to Metax_ci January 26, 2026 11:43 — with GitHub Actions Error

rm log

629c15c

fxyfxy777 temporarily deployed to Metax_ci January 26, 2026 11:51 — with GitHub Actions Inactive

fxyfxy777 changed the title ~~optimize mask_quant op speed up 1.5~~ [Optimize] optimize mask_quant & swiglu Jan 26, 2026

push kernel code

1311c5b

fxyfxy777 had a problem deploying to Metax_ci January 26, 2026 12:31 — with GitHub Actions Error

add ut

eac6652

fxyfxy777 temporarily deployed to Metax_ci January 26, 2026 12:43 — with GitHub Actions Inactive

fxyfxy777 added 2 commits January 28, 2026 11:37

Merge remote-tracking branch 'origin' into mask_quant

858ca8e

accuracy ok

83756e1

fxyfxy777 had a problem deploying to Metax_ci January 28, 2026 11:25 — with GitHub Actions Failure

fxyfxy777 had a problem deploying to Metax_ci January 29, 2026 02:09 — with GitHub Actions Failure

add ue8m0

985c066

fxyfxy777 had a problem deploying to Metax_ci January 29, 2026 07:08 — with GitHub Actions Failure

fxyfxy777 added 2 commits January 29, 2026 15:29

add ut

bd7b915

Merge remote-tracking branch 'origin' into mask_quant

6f4cf9f

fxyfxy777 had a problem deploying to Metax_ci January 29, 2026 07:31 — with GitHub Actions Failure

fxyfxy777 had a problem deploying to Metax_ci January 29, 2026 08:06 — with GitHub Actions Failure

fxyfxy777 had a problem deploying to Metax_ci January 29, 2026 09:44 — with GitHub Actions Failure

K11OntheBoat previously approved these changes Jan 29, 2026

View reviewed changes

Comment thread custom_ops/gpu_ops/cpp_extensions.cc

qingqing01 previously approved these changes Jan 30, 2026

View reviewed changes

fxyfxy777 added 2 commits January 30, 2026 11:31

Merge branch 'develop' of https://github.com/PaddlePaddle/FastDeploy …

03d57ba

…into mask_quant

add merge develop

1dc458d

fxyfxy777 dismissed stale reviews from qingqing01 and K11OntheBoat via 1dc458d January 30, 2026 05:35

fxyfxy777 temporarily deployed to Metax_ci January 30, 2026 05:35 — with GitHub Actions Inactive

qingqing01 approved these changes Jan 30, 2026

View reviewed changes

yongqiangma previously approved these changes Jan 31, 2026

View reviewed changes

qingqing01 previously approved these changes Feb 2, 2026

View reviewed changes

rm ut of mask_per_token_quant

a9846ea

fxyfxy777 dismissed stale reviews from qingqing01 and yongqiangma via a9846ea February 2, 2026 02:53

fxyfxy777 temporarily deployed to Metax_ci February 2, 2026 02:53 — with GitHub Actions Inactive

yongqiangma approved these changes Feb 2, 2026

View reviewed changes

EmmonsCurse approved these changes Feb 2, 2026

View reviewed changes

EmmonsCurse added the skip-ci: coverage label Feb 2, 2026

K11OntheBoat merged commit 2ada119 into PaddlePaddle:develop Feb 2, 2026
31 of 36 checks passed

fxyfxy777 added a commit to fxyfxy777/FastDeploy that referenced this pull request Feb 2, 2026

Revert "[Optimize] optimize mask_quant & swiglu (PaddlePaddle#6222)"

ea230e1

This reverts commit 2ada119.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Optimize] optimize mask_quant & swiglu#6222

[Optimize] optimize mask_quant & swiglu#6222
K11OntheBoat merged 14 commits into
PaddlePaddle:developfrom
fxyfxy777:mask_quant

fxyfxy777 commented Jan 26, 2026 •

edited

Loading

Uh oh!

paddle-bot Bot commented Jan 26, 2026

Uh oh!

codecov-commenter commented Jan 26, 2026 •

edited

Loading

Uh oh!

fxyfxy777 commented Jan 29, 2026

Uh oh!

fxyfxy777 commented Jan 29, 2026

Uh oh!

fxyfxy777 commented Jan 29, 2026

Uh oh!

K11OntheBoat left a comment

Uh oh!

Uh oh!

yongqiangma left a comment

Uh oh!

fxyfxy777 commented Feb 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

fxyfxy777 commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot Bot commented Jan 26, 2026

Uh oh!

codecov-commenter commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

fxyfxy777 commented Jan 29, 2026

Uh oh!

fxyfxy777 commented Jan 29, 2026

Uh oh!

fxyfxy777 commented Jan 29, 2026

Uh oh!

K11OntheBoat left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yongqiangma left a comment

Choose a reason for hiding this comment

Uh oh!

fxyfxy777 commented Feb 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

fxyfxy777 commented Jan 26, 2026 •

edited

Loading

codecov-commenter commented Jan 26, 2026 •

edited

Loading