Skip to content

Conversation

@cyang49
Copy link
Contributor

@cyang49 cyang49 commented Sep 5, 2025

Purpose

This PR cleans up some naming inconsistencies in mamba2 metadata between V0 and V1. This affects models that use mamba_mixer2 or plamo2

Test Plan

Run e2e lm_eval on ibm-granite/granite-4.0-tiny-preview and pfnet/plamo-2.1-2b-cpt for both V0 and V1

Test Result

  • No runtime errors
  • Accuracy is maintained
    They passed and no significant accuracy differences observed.

granite-4-tiny-preview V1 (main 006e7a3)

VLLM_USE_V1=1 lm_eval --model vllm  --model_args pretrained=ibm-granite/granite-4.0-tiny-preview,tensor_parallel_size=1,dtype=auto,gpu_memory_utilization=0.95,enable_prefix_caching=False,max_model_len=8192 --batch_size auto --trust_remote_code  --cache_requests true --tasks gsm8k
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.6035|±  |0.0135|
|     |       |strict-match    |     5|exact_match|↑  |0.5754|±  |0.0136|

granite-4-tiny-preview V1 (this PR)

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.6088|±  |0.0134|
|     |       |strict-match    |     5|exact_match|↑  |0.5861|±  |0.0136|

granite-4-tiny-preview V0 (this PR)

VLLM_USE_V1=0 lm_eval --model vllm  --model_args pretrained=ibm-granite/granite-4.0-tiny-preview,tensor_parallel_size=1,dtype=auto,gpu_memory_utilization=0.95,enable_prefix_caching=False,max_model_len=8192 --batch_size auto --trust_remote_code  --cache_requests true --tasks gsm8k
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.602|±  |0.0135|
|     |       |strict-match    |     5|exact_match|↑  |0.583|±  |0.0136|

Plamo2 V1 (main 006e7a3)

VLLM_USE_V1=1 lm_eval --model vllm  --model_args pretrained=pfnet/plamo-2.1-2b-cpt,tensor_parallel_size=1,dtype=auto,gpu_memory_utilization=0.95,enable_prefix_caching=False,max_model_len=8192 --batch_size auto --trust_remote_code  --cache_requests true --tasks gsm8k
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.5898|±  |0.0135|
|     |       |strict-match    |     5|exact_match|↑  |0.5823|±  |0.0136|

Plamo2 V1 (this PR)

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.5898|±  |0.0135|
|     |       |strict-match    |     5|exact_match|↑  |0.5823|±  |0.0136|

Plamo2 V0 (this PR)

VLLM_USE_V1=0 lm_eval --model vllm  --model_args pretrained=pfnet/plamo-2.1-2b-cpt,tensor_parallel_size=1,dtype=auto,gpu_memory_utilization=0.95,enable_prefix_caching=False,max_model_len=8192 --batch_size auto --trust_remote_code  --cache_requests true --tasks gsm8k
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.5982|±  |0.0135|
|     |       |strict-match    |     5|exact_match|↑  |0.5951|±  |0.0135|

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@cyang49 cyang49 requested a review from tdoublep as a code owner September 5, 2025 16:13
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the Mamba2 metadata handling to unify the logic for V0 and V1 APIs. The changes primarily involve renaming metadata fields to be consistent across both versions, which simplifies the code in mamba_mixer2 and plamo2 by removing duplicated logic. The change to how has_initial_states_p is calculated is also a good improvement for correctness and clarity. I've found one critical issue where attribute assignments were not updated after renaming, which could lead to a runtime error.

…code

Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
@cyang49 cyang49 force-pushed the pr_mamba2_metadata_cleanup branch from ed9781f to 1c9fc08 Compare September 8, 2025 18:01
Copy link
Member

@tdoublep tdoublep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tdoublep tdoublep enabled auto-merge (squash) September 16, 2025 13:04
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 16, 2025
@tdoublep tdoublep mentioned this pull request Sep 16, 2025
9 tasks
@tdoublep tdoublep merged commit 73cfb3c into vllm-project:main Sep 16, 2025
51 checks passed
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
…vllm-project#24331)

Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
@cyang49 cyang49 deleted the pr_mamba2_metadata_cleanup branch September 30, 2025 14:09
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
…vllm-project#24331)

Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025
…vllm-project#24331)

Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
…vllm-project#24331)

Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants