Skip to content

Conversation

@MatthewBonanni
Copy link
Contributor

@MatthewBonanni MatthewBonanni commented Sep 12, 2025

Purpose

Adds the recently-added FlashInfer MLA backend to the attention selector test. This test would have caught the bug fixed by #24692

Test Plan

pytest tests/kernels/attention/test_attention_selector.py

Test Result

passes


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds test coverage for the new FLASHINFER_MLA attention backend. The changes correctly add the new backend to the test matrix and include specific test logic for its constraints. However, I found a critical issue where the test code calls get_attn_backend with an incorrect argument type, which could lead to runtime errors. A fix is suggested.

Copy link
Collaborator

@LucasWilkinson LucasWilkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks!

Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
# FlashInfer MLA only supported on V1 engine
pytest.skip(
"FlashInfer MLA only supported on V1 engine")
elif block_size not in [32, 64]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ: Is there a reason we don't have the attention backend define what block size it supports?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It definitely should -- I can make that change in a separate PR to standardize the logic now that we have 3 different backends with a restriction like this. That would be cleaner than what we have now where it's hardcoded in a couple different places

@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 12, 2025
@mgoin mgoin enabled auto-merge (squash) September 12, 2025 21:46
@mgoin mgoin merged commit 5fe643f into vllm-project:main Sep 12, 2025
33 checks passed
skyloevil pushed a commit to skyloevil/vllm that referenced this pull request Sep 13, 2025
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
dsxsteven pushed a commit to dsxsteven/vllm_splitPR that referenced this pull request Sep 15, 2025
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
bbartels pushed a commit to bbartels/vllm that referenced this pull request Sep 15, 2025
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Signed-off-by: bbartels <benjamin@bartels.dev>
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants