-
-
Notifications
You must be signed in to change notification settings - Fork 11k
[Model] Mamba2 varlen and metadata refactor #21467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the Mamba2 kernels and their callers to natively support variable-length inputs, removing the explicit batch dimension from tensors and relying on cu_seqlens for sequence boundaries. This is a significant and beneficial simplification that should improve performance by avoiding padding. The changes are systematic and consistent across all modified files. I've found one potential compatibility issue with older Triton versions that should be addressed. Otherwise, the refactoring looks solid.
7392a5a to
9af8a86
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please make sure this won't beak other mamba models? It looks like it would break plamo2.py and possibly others
Is there a test I can run? Or it's just grepping the keywords in the repo |
|
This pull request has merge conflicts that must be resolved before it can be |
4ab2868 to
e655021
Compare
e655021 to
2efde66
Compare
|
@cyang49 I will review these changes carefully early next week. After going through these kernels a lot the last days, I now understand how useful this refactor is. Right now, the kernels support many cases that we never use in vLLM and it makes them very difficult to read + maintain. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a significant improvement to the maintainability of the code. I have some small suggestions but would like to merge this one asap before other things. Please let me know if you need any help resolving the conflicts.
6db94a4 to
b258403
Compare
fb0dfa6 to
2150a11
Compare
|
@tomeras91 could you have a look at the interface change to |
|
This pull request has merge conflicts that must be resolved before it can be |
888b502 to
547bb46
Compare
b4f198c to
56bf99d
Compare
- mamba2 varlen refactor - refactoring and reduce redundant query_start_loc_p computation in v1 - conv1d metadata refactoring - use int64 strides except for the least significant dim - fix query_start_loc_p affected by metadata refactor Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
merging Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
56bf99d to
9b8a761
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - thank you for the effort to improve the code and your patience with the multiple rebases.
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com> Co-authored-by: RishiAstra <40644327+RishiAstra@users.noreply.github.com>
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com> Co-authored-by: RishiAstra <40644327+RishiAstra@users.noreply.github.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com> Co-authored-by: RishiAstra <40644327+RishiAstra@users.noreply.github.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com> Co-authored-by: RishiAstra <40644327+RishiAstra@users.noreply.github.com>
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com> Co-authored-by: RishiAstra <40644327+RishiAstra@users.noreply.github.com>
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com> Co-authored-by: RishiAstra <40644327+RishiAstra@users.noreply.github.com>
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com> Co-authored-by: RishiAstra <40644327+RishiAstra@users.noreply.github.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.Purpose
(seqlen, hiddendim)with metadatacu_seqlenfor determining number of tokens of individual requests in the batch, instead of(batch, seqlen, hiddendim)where some of the requests need to be padded.Test Plan
tests/kernels/mambain CI/CDTest Result
Commands
V1
V0
Main (316b1bf)
V1
V0
This PR
V1
V0
Plamo2 lm_eval
(Optional) Documentation Update