[Model] Clean up and simplify Mamba2 Metadata Usage in both V0 and V1 #24331

cyang49 · 2025-09-05T16:13:40Z

Purpose

This PR cleans up some naming inconsistencies in mamba2 metadata between V0 and V1. This affects models that use mamba_mixer2 or plamo2

Test Plan

Run e2e lm_eval on ibm-granite/granite-4.0-tiny-preview and pfnet/plamo-2.1-2b-cpt for both V0 and V1

Test Result

No runtime errors
Accuracy is maintained
They passed and no significant accuracy differences observed.

granite-4-tiny-preview V1 (main `006e7a3`)

VLLM_USE_V1=1 lm_eval --model vllm  --model_args pretrained=ibm-granite/granite-4.0-tiny-preview,tensor_parallel_size=1,dtype=auto,gpu_memory_utilization=0.95,enable_prefix_caching=False,max_model_len=8192 --batch_size auto --trust_remote_code  --cache_requests true --tasks gsm8k

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.6035|±  |0.0135|
|     |       |strict-match    |     5|exact_match|↑  |0.5754|±  |0.0136|

granite-4-tiny-preview V1 (this PR)

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.6088|±  |0.0134|
|     |       |strict-match    |     5|exact_match|↑  |0.5861|±  |0.0136|

granite-4-tiny-preview V0 (this PR)

VLLM_USE_V1=0 lm_eval --model vllm  --model_args pretrained=ibm-granite/granite-4.0-tiny-preview,tensor_parallel_size=1,dtype=auto,gpu_memory_utilization=0.95,enable_prefix_caching=False,max_model_len=8192 --batch_size auto --trust_remote_code  --cache_requests true --tasks gsm8k

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.602|±  |0.0135|
|     |       |strict-match    |     5|exact_match|↑  |0.583|±  |0.0136|

Plamo2 V1 (main `006e7a3`)

VLLM_USE_V1=1 lm_eval --model vllm  --model_args pretrained=pfnet/plamo-2.1-2b-cpt,tensor_parallel_size=1,dtype=auto,gpu_memory_utilization=0.95,enable_prefix_caching=False,max_model_len=8192 --batch_size auto --trust_remote_code  --cache_requests true --tasks gsm8k

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.5898|±  |0.0135|
|     |       |strict-match    |     5|exact_match|↑  |0.5823|±  |0.0136|

Plamo2 V1 (this PR)

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.5898|±  |0.0135|
|     |       |strict-match    |     5|exact_match|↑  |0.5823|±  |0.0136|

Plamo2 V0 (this PR)

VLLM_USE_V1=0 lm_eval --model vllm  --model_args pretrained=pfnet/plamo-2.1-2b-cpt,tensor_parallel_size=1,dtype=auto,gpu_memory_utilization=0.95,enable_prefix_caching=False,max_model_len=8192 --batch_size auto --trust_remote_code  --cache_requests true --tasks gsm8k

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.5982|±  |0.0135|
|     |       |strict-match    |     5|exact_match|↑  |0.5951|±  |0.0135|

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request refactors the Mamba2 metadata handling to unify the logic for V0 and V1 APIs. The changes primarily involve renaming metadata fields to be consistent across both versions, which simplifies the code in mamba_mixer2 and plamo2 by removing duplicated logic. The change to how has_initial_states_p is calculated is also a good improvement for correctness and clarity. I've found one critical issue where attribute assignments were not updated after renaming, which could lead to a runtime error.

vllm/model_executor/layers/mamba/mamba2_metadata.py

…code Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>

Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>

tdoublep

LGTM

…vllm-project#24331) Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>

…vllm-project#24331) Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

…vllm-project#24331) Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>

…vllm-project#24331) Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

cyang49 requested a review from tdoublep as a code owner September 5, 2025 16:13

gemini-code-assist bot reviewed Sep 5, 2025

View reviewed changes

vllm/model_executor/layers/mamba/mamba2_metadata.py Outdated Show resolved Hide resolved

cyang49 added 2 commits September 8, 2025 14:01

Clean up mamba2 metadata calculation to simplify and consolidate the …

5b3532b

…code Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>

remove unused arg and code

1c9fc08

Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>

cyang49 force-pushed the pr_mamba2_metadata_cleanup branch from ed9781f to 1c9fc08 Compare September 8, 2025 18:01

tdoublep approved these changes Sep 16, 2025

View reviewed changes

tdoublep enabled auto-merge (squash) September 16, 2025 13:04

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 16, 2025

tdoublep mentioned this pull request Sep 16, 2025

[Kernel] Chunk-aligned mamba2 #24683

Merged

9 tasks

tdoublep merged commit 73cfb3c into vllm-project:main Sep 16, 2025
51 checks passed

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[Model] Clean up and simplify Mamba2 Metadata Usage in both V0 and V1 (…

85d6d66

…vllm-project#24331) Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>

cyang49 deleted the pr_mamba2_metadata_cleanup branch September 30, 2025 14:09

choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025

[Model] Clean up and simplify Mamba2 Metadata Usage in both V0 and V1 (…

8931648

…vllm-project#24331) Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Model] Clean up and simplify Mamba2 Metadata Usage in both V0 and V1 #24331

[Model] Clean up and simplify Mamba2 Metadata Usage in both V0 and V1 #24331

Uh oh!

cyang49 commented Sep 5, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

tdoublep left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[Model] Clean up and simplify Mamba2 Metadata Usage in both V0 and V1 #24331

[Model] Clean up and simplify Mamba2 Metadata Usage in both V0 and V1 #24331

Uh oh!

Conversation

cyang49 commented Sep 5, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

granite-4-tiny-preview V1 (main 006e7a3)

granite-4-tiny-preview V1 (this PR)

granite-4-tiny-preview V0 (this PR)

Plamo2 V1 (main 006e7a3)

Plamo2 V1 (this PR)

Plamo2 V0 (this PR)

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

tdoublep left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cyang49 commented Sep 5, 2025 •

edited by github-actions bot

Loading

granite-4-tiny-preview V1 (main `006e7a3`)

Plamo2 V1 (main `006e7a3`)