[Kernel] fp4 marlin kernel #17687

jinzhen-lin · 2025-05-06T02:06:47Z

This PR adds nvfp4 support for marlin kernel, both dense and moe.

In addition to standard FP4 support, I fuse the floating-point operations in the dequantization process with the subsequent sub-zero-point and scaling steps to reduce kernel computation. This currently provides significant speedups for FP4/FP8 and modest acceleration for AWQ-INT4.

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

github-actions · 2025-05-06T02:06:56Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

csrc/quantization/gptq_marlin/dequant.h

csrc/quantization/gptq_marlin/gptq_marlin.cu

csrc/quantization/gptq_marlin/marlin_template.h

tests/kernels/quantization/test_marlin_gemm.py

mgoin · 2025-05-06T11:02:22Z

FYI @tms the wheel size only grows by 1 MB

Wheel dist/vllm-0.8.5.dev473+g6eae34533-cp38-abi3-linux_x86_64.whl is within the allowed size (317.64 MB).
vs
Wheel dist/vllm-0.8.5.dev483+g8392d7381-cp38-abi3-linux_x86_64.whl is within the allowed size (318.93 MB).

vllm/model_executor/layers/quantization/utils/marlin_utils.py

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

pavanimajety · 2025-05-07T16:51:20Z

csrc/quantization/gptq_marlin/dequant.h


 template <>
-__device__ inline void dequant<nv_bfloat162, vllm::kU4.id()>(
+__device__ inline void dequant<nv_bfloat162, vllm::kU4.id(), true>(


Nice! We can also start using these functions for the fp4 scaled_mm tests!

mgoin · 2025-05-08T23:11:45Z

Fused marlin moe test is failing https://buildkite.com/vllm/ci/builds/19608/steps?jid=0196b131-b028-4fad-95bd-bba7cdaf133d

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

mgoin

Excellent work here! I need to run another smoke test since the scales change to fp8, but I think this is all good to go

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com> Signed-off-by: Yuqi Zhang <yuqizhang@google.com>

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com> Signed-off-by: minpeter <kali2005611@gmail.com>

fp4 marlin kernel

dacdf8c

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

jinzhen-lin requested review from tlrmchlsmth, WoosukKwon, mgoin and robertgshaw2-redhat as code owners May 6, 2025 02:06

jinzhen-lin added 9 commits May 6, 2025 10:19

fix

e2c0ad3

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

fix

0d5368b

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

fix

8d51e32

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

fix format

c879e99

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

fix

bb547a6

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

fix

4dddda5

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

kFE2M1fn -> kFE2M1f

9aac76a

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

Merge remote-tracking branch 'origin/main' into fp4-marlin

28d7f84

fix

8392d73

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

mgoin reviewed May 6, 2025

View reviewed changes

vllm/model_executor/layers/quantization/utils/marlin_utils.py Show resolved Hide resolved

jinzhen-lin added 5 commits May 6, 2025 20:48

fix

5050d4b

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

fix comment

02576a9

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

fix

49978ad

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

fix

af12b22

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

fix

fa0d098

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

pavanimajety reviewed May 7, 2025

View reviewed changes

mgoin added kernel ready ONLY add when PR is ready to merge/full CI is needed labels May 8, 2025

jinzhen-lin added 3 commits May 9, 2025 11:37

update

e6265a6

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

fix for fp8

fe3ea6e

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

fix

e6047e5

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

jinzhen-lin added 16 commits May 9, 2025 11:56

fix

810c95a

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

fix

0f07183

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

fix

7eb3f9b

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

fix test

ed1db37

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

fix

168fb3e

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

fix

25531eb

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

fp4 moe marlin

ed95abb

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

fix

d7b2ac7

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

fix

f09273b

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

add comment

a82fcbf

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

fix

7177a72

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

fix

4a6ac2a

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

remove unused cuda kernel

e6144ee

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

fix

45910c1

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

Merge remote-tracking branch 'origin/main' into fp4-marlin

027f6a3

Merge remote-tracking branch 'origin/main' into fp4-marlin

520149d

mgoin approved these changes May 9, 2025

View reviewed changes

jinzhen-lin added 5 commits May 10, 2025 10:10

fix test

7e0dbe8

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

Merge remote-tracking branch 'origin/main' into fp4-marlin

9efad4e

fp4 moe support

18df7ec

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

fix moe support

a21442d

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

Merge remote-tracking branch 'origin/main' into fp4-marlin

660eb61

mgoin approved these changes May 10, 2025

View reviewed changes

vllm-bot merged commit d74e5f3 into vllm-project:main May 11, 2025
87 of 90 checks passed

RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025

[Kernel] fp4 marlin kernel (vllm-project#17687)

83257f6

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>

mgoin mentioned this pull request May 12, 2025

Use NVFP4 Marlin for CompressedTensorsW4A16Fp4 #18000

Merged

heheda12345 mentioned this pull request May 14, 2025

[v1] Support multiple KV cache groups in GPU model runner #17945

Merged

mawong-amd pushed a commit to ROCm/vllm that referenced this pull request May 14, 2025

[Kernel] fp4 marlin kernel (vllm-project#17687)

794655f

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

zzzyq pushed a commit to zzzyq/vllm that referenced this pull request May 24, 2025

[Kernel] fp4 marlin kernel (vllm-project#17687)

a9d8f6a

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com> Signed-off-by: Yuqi Zhang <yuqizhang@google.com>

minpeter pushed a commit to minpeter/vllm that referenced this pull request Jun 24, 2025

[Kernel] fp4 marlin kernel (vllm-project#17687)

2e2da01

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com> Signed-off-by: minpeter <kali2005611@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Kernel] fp4 marlin kernel #17687

[Kernel] fp4 marlin kernel #17687

Uh oh!

jinzhen-lin commented May 6, 2025 •

edited by mgoin

Loading

Uh oh!

github-actions bot commented May 6, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mgoin commented May 6, 2025

Uh oh!

Uh oh!

pavanimajety May 7, 2025

Uh oh!

mgoin commented May 8, 2025

Uh oh!

mgoin left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[Kernel] fp4 marlin kernel #17687

[Kernel] fp4 marlin kernel #17687

Uh oh!

Conversation

jinzhen-lin commented May 6, 2025 • edited by mgoin Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 6, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mgoin commented May 6, 2025

Uh oh!

Uh oh!

pavanimajety May 7, 2025

Choose a reason for hiding this comment

Uh oh!

mgoin commented May 8, 2025

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jinzhen-lin commented May 6, 2025 •

edited by mgoin

Loading