[Bugfix] Fix Maverick correctness by filling zero to cache space in cutlass_moe #2
+121
−20
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Purpose
vllm-project#19667 changed the workspace creation from
torch.zeros
totorch.empty
. This ends up causing correctness for models using cutlass_moe, e.g. Maverick in our test case. This PR fixes the correctness issue by explicitly filling zeros in cutlass_moe.Test Plan
lm_eval, ut
Test Result
lm_eval results:
local-chat-completions (model=meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8,base_url=http://127.0.0.1:8081/v1/chat/completions,num_concurrent=32), gen_kwargs: (None), limit: 200.0, num_fewshot: 5, batch_size: 1
unit test stability verified:
c1.fill_(0)
, the following one liner verifies stable failure:c1.fill_(0)
, the following verifies stable success:(Optional) Documentation Update
BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)