-
-
Notifications
You must be signed in to change notification settings - Fork 7.8k
[MODEL] Add support for Zamba2 models #13185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
d385611
to
43fff82
Compare
6223b86
to
dc786fd
Compare
9593333
to
0a83814
Compare
66b1112
to
2b7397c
Compare
seems like current failures in checks are due to cv2 imports in transformers v4.49.0. This is a known issue: #13905 Other than that things work. |
@tlrmchlsmth other than the external issue with the latest released transformers (cv2 import in 4.49.0, but I see it's fixed in their dev branch), do you have other suggestions for this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR looks great to me, thanks for the contribution -- I'll accept once the transformers 4.49 issue is resolved
Signed-off-by: Yury Tokpanov <yury@zyphra.com>
Signed-off-by: Yury Tokpanov <yury@zyphra.com>
Signed-off-by: Yury Tokpanov <yury@zyphra.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> Signed-off-by: Quentin Anthony <qganthony@yahoo.com>
Signed-off-by: Yury Tokpanov <yury@zyphra.com>
Signed-off-by: Yury Tokpanov <yury@zyphra.com>
Signed-off-by: Yury Tokpanov <yury@zyphra.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: Yury Tokpanov <yury@zyphra.com>
4e3b75d
to
71cf8f3
Compare
Signed-off-by: Yury Tokpanov <yury@zyphra.com>
Thanks, can you also update the list of supported models in the docs with this model? |
Signed-off-by: Yury Tokpanov <yury@zyphra.com>
Yep, all done. Btw, I see a bunch of tests failed, but upon further inspection it seems they're unrelated to the PR. |
Indeed they are unrelated - merging |
Signed-off-by: Yury Tokpanov <yury@zyphra.com> Signed-off-by: Quentin Anthony <qganthony@yahoo.com> Co-authored-by: Quentin Anthony <qganthony@yahoo.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Signed-off-by: Yury Tokpanov <yury@zyphra.com> Signed-off-by: Quentin Anthony <qganthony@yahoo.com> Co-authored-by: Quentin Anthony <qganthony@yahoo.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Signed-off-by: Yury Tokpanov <yury@zyphra.com> Signed-off-by: Quentin Anthony <qganthony@yahoo.com> Co-authored-by: Quentin Anthony <qganthony@yahoo.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>
Signed-off-by: Yury Tokpanov <yury@zyphra.com> Signed-off-by: Quentin Anthony <qganthony@yahoo.com> Co-authored-by: Quentin Anthony <qganthony@yahoo.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Signed-off-by: Yury Tokpanov <yury@zyphra.com> Signed-off-by: Quentin Anthony <qganthony@yahoo.com> Co-authored-by: Quentin Anthony <qganthony@yahoo.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Signed-off-by: Yury Tokpanov <yury@zyphra.com> Signed-off-by: Quentin Anthony <qganthony@yahoo.com> Co-authored-by: Quentin Anthony <qganthony@yahoo.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
This PR adds support for Zamba2 models (#9382), a series of mamba2-transformer hybrid models with shared attention blocks and LoRAs, applied to shared MLP and attention blocks, depending on the model. 1.2B and 7B models use RoPE for their attention blocks.
This PR is fully compatible with Zamba2 integration in HuggingFace transformers library, which was recently merged into the main branch.
Unit tests pass now.
We would like to acknowledge authors of Bamba PR and Mamba2 PR (@fabianlim and @tlrmchlsmth respectively) for adding mamba2 support to vLLM and having productive discussions!
cc: @Quentin-Anthony @BerenMillidge @pglorio