Skip to content

[MODEL] Add support for Zamba2 models #13185

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Mar 18, 2025
Merged

Conversation

yury-tokpanov
Copy link
Contributor

@yury-tokpanov yury-tokpanov commented Feb 13, 2025

This PR adds support for Zamba2 models (#9382), a series of mamba2-transformer hybrid models with shared attention blocks and LoRAs, applied to shared MLP and attention blocks, depending on the model. 1.2B and 7B models use RoPE for their attention blocks.

This PR is fully compatible with Zamba2 integration in HuggingFace transformers library, which was recently merged into the main branch.

  • We are able to reproduce evaluation results using evaluation harness with vllm evaluator.
  • We also inspected logits and intermediate layers outputs, comparing them with our reference implementation. We find a good agreement between the two (given numerical precision).
  • Chunked prefill appears to be working.
  • TP is supported.
  • PP is not supported, and we believe it wouldn't make much sense to use our models with PP, since it will remove the memory advantage of shared attention layers.

Unit tests pass now.

We would like to acknowledge authors of Bamba PR and Mamba2 PR (@fabianlim and @tlrmchlsmth respectively) for adding mamba2 support to vLLM and having productive discussions!

cc: @Quentin-Anthony @BerenMillidge @pglorio

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@yury-tokpanov
Copy link
Contributor Author

yury-tokpanov commented Mar 7, 2025

seems like current failures in checks are due to cv2 imports in transformers v4.49.0. This is a known issue: #13905

Other than that things work.

@yury-tokpanov
Copy link
Contributor Author

@tlrmchlsmth other than the external issue with the latest released transformers (cv2 import in 4.49.0, but I see it's fixed in their dev branch), do you have other suggestions for this PR?

Copy link
Collaborator

@tlrmchlsmth tlrmchlsmth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks great to me, thanks for the contribution -- I'll accept once the transformers 4.49 issue is resolved

Signed-off-by: Yury Tokpanov <yury@zyphra.com>
yury-tokpanov and others added 7 commits March 18, 2025 05:49
Signed-off-by: Yury Tokpanov <yury@zyphra.com>
Signed-off-by: Yury Tokpanov <yury@zyphra.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
Signed-off-by: Quentin Anthony <qganthony@yahoo.com>
Signed-off-by: Yury Tokpanov <yury@zyphra.com>
Signed-off-by: Yury Tokpanov <yury@zyphra.com>
Signed-off-by: Yury Tokpanov <yury@zyphra.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Signed-off-by: Yury Tokpanov <yury@zyphra.com>
@yury-tokpanov yury-tokpanov force-pushed the zamba2 branch 2 times, most recently from 4e3b75d to 71cf8f3 Compare March 18, 2025 06:17
Signed-off-by: Yury Tokpanov <yury@zyphra.com>
@DarkLight1337
Copy link
Member

Thanks, can you also update the list of supported models in the docs with this model?

Signed-off-by: Yury Tokpanov <yury@zyphra.com>
@mergify mergify bot added the documentation Improvements or additions to documentation label Mar 18, 2025
@DarkLight1337 DarkLight1337 enabled auto-merge (squash) March 18, 2025 07:24
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 18, 2025
@yury-tokpanov
Copy link
Contributor Author

Thanks, can you also update the list of supported models in the docs with this model?

Yep, all done.

Btw, I see a bunch of tests failed, but upon further inspection it seems they're unrelated to the PR.

@DarkLight1337
Copy link
Member

Indeed they are unrelated - merging

@vllm-bot vllm-bot merged commit 452e8fd into vllm-project:main Mar 18, 2025
37 of 42 checks passed
@DarkLight1337 DarkLight1337 added this to the v0.8.0 milestone Mar 18, 2025
simon-mo pushed a commit that referenced this pull request Mar 18, 2025
Signed-off-by: Yury Tokpanov <yury@zyphra.com>
Signed-off-by: Quentin Anthony <qganthony@yahoo.com>
Co-authored-by: Quentin Anthony <qganthony@yahoo.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
gmarinho2 pushed a commit to gmarinho2/vllm that referenced this pull request Apr 1, 2025
Signed-off-by: Yury Tokpanov <yury@zyphra.com>
Signed-off-by: Quentin Anthony <qganthony@yahoo.com>
Co-authored-by: Quentin Anthony <qganthony@yahoo.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025
Signed-off-by: Yury Tokpanov <yury@zyphra.com>
Signed-off-by: Quentin Anthony <qganthony@yahoo.com>
Co-authored-by: Quentin Anthony <qganthony@yahoo.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>
nishith-fujitsu pushed a commit to nishith-fujitsu/vllm that referenced this pull request Apr 9, 2025
Signed-off-by: Yury Tokpanov <yury@zyphra.com>
Signed-off-by: Quentin Anthony <qganthony@yahoo.com>
Co-authored-by: Quentin Anthony <qganthony@yahoo.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025
Signed-off-by: Yury Tokpanov <yury@zyphra.com>
Signed-off-by: Quentin Anthony <qganthony@yahoo.com>
Co-authored-by: Quentin Anthony <qganthony@yahoo.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025
Signed-off-by: Yury Tokpanov <yury@zyphra.com>
Signed-off-by: Quentin Anthony <qganthony@yahoo.com>
Co-authored-by: Quentin Anthony <qganthony@yahoo.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/build documentation Improvements or additions to documentation ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants