[Bugfix] Fix Maverick correctness by filling zero to cache space in cutlass_moe #2

minosfuture · 2025-06-25T07:16:12Z

Purpose

vllm-project#19667 changed the workspace creation from torch.zeros to torch.empty. This ends up causing correctness for models using cutlass_moe, e.g. Maverick in our test case. This PR fixes the correctness issue by explicitly filling zeros in cutlass_moe.

Test Plan

lm_eval, ut

Test Result

lm_eval results:

local-chat-completions (model=meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8,base_url=http://127.0.0.1:8081/v1/chat/completions,num_concurrent=32), gen_kwargs: (None), limit: 200.0, num_fewshot: 5, batch_size: 1

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.935	±	0.0175
		strict-match	5	exact_match	↑	0.920	±	0.0192

unit test stability verified:

without c1.fill_(0), the following one liner verifies stable failure:

for i in {1..10}; do echo $i; pytest -s tests/kernels/moe/test_cutlass_moe.py  -k "test_run_cutlass_moe_fp8 or test_cutlass_moe_8_bit_EP_large" -v  2>&1 > /dev/null && { echo "shouldn't succeed"; exit 1; } done`

with c1.fill_(0), the following verifies stable success:

for i in {1..10}; do echo $i; pytest -s tests/kernels/moe/test_cutlass_moe.py  -k "test_run_cutlass_moe_fp8 or test_cutlass_moe_8_bit_EP_large" -v  2>&1 > /dev/null || { echo "should succeed"; exit 1; } done

(Optional) Documentation Update

BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)

yeqcharlotte · 2025-06-25T18:05:19Z

vllm/model_executor/layers/fused_moe/cutlass_moe.py

@@ -176,6 +176,7 @@ def run_cutlass_moe_fp8(
        c1 = _resize_cache(workspace13, (M * topk, N * 2))
        c2 = _resize_cache(workspace2, (M * topk, N))
        c3 = _resize_cache(workspace13, (M * topk, K))
+        c1.fill_(0)


great! any way to capture this in test_cutlass_moe?

yep, added a couple unit tests

Signed-off-by: Ming Yang <yming@meta.com>

yeqcharlotte reviewed Jun 25, 2025

View reviewed changes

[Bugfix] Fix Maverick correctness by filling zero to cache space

890ed4f

Signed-off-by: Ming Yang <yming@meta.com>

minosfuture force-pushed the fix_maverick_correctness branch from 52be3eb to 66c457b Compare June 27, 2025 05:07

Add unit test case that would fail without filling zeros to c1

25d3af8

Signed-off-by: Ming Yang <yming@meta.com>

minosfuture force-pushed the fix_maverick_correctness branch from 66c457b to 25d3af8 Compare June 27, 2025 06:07

minosfuture added 2 commits June 27, 2025 15:27

Address comments: func extraction; check expert_map; larger m

73e55f2

Signed-off-by: Ming Yang <yming@meta.com>

Address comment: fix batched code path as well

7476168

Signed-off-by: Ming Yang <yming@meta.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Fix Maverick correctness by filling zero to cache space in cutlass_moe #2

[Bugfix] Fix Maverick correctness by filling zero to cache space in cutlass_moe #2

minosfuture commented Jun 25, 2025 •

edited

Loading

Uh oh!

yeqcharlotte Jun 25, 2025

Uh oh!

minosfuture Jun 27, 2025

Uh oh!

Uh oh!

[Bugfix] Fix Maverick correctness by filling zero to cache space in cutlass_moe #2

Are you sure you want to change the base?

[Bugfix] Fix Maverick correctness by filling zero to cache space in cutlass_moe #2

Conversation

minosfuture commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

yeqcharlotte Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

minosfuture Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

minosfuture commented Jun 25, 2025 •

edited

Loading