Skip to content

[Model] Add GraniteMoeHybrid 4.0 model #17497

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
May 6, 2025

Conversation

s3woz
Copy link
Contributor

@s3woz s3woz commented Apr 30, 2025

The PR adds support for upcoming Granite4.0 models. It is a companion PR to huggingface/transformers#37658 for adding the same model to HF.

Note: Running the model in vLLM depends on having HF Transformers with Granite4.0 support installed, see https://huggingface.co/ibm-granite/granite-4.0-tiny-preview

Note: It is re-opening #17461 again after fixing wrong rebase. Currently rebased on newer main, signed-off, resolved conflicts.
@DarkLight1337 - As per suggestion, Sampler removed from the model.

The HF model is not available in main yet. Model tests are currently marked with 'skip'.

@tdoublep @bohnstingl

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@s3woz s3woz force-pushed the granitemoehybrid_clean branch from 55bc3e7 to 3569fe4 Compare April 30, 2025 21:09
@s3woz s3woz marked this pull request as ready for review April 30, 2025 22:56
Copy link
Collaborator

@tlrmchlsmth tlrmchlsmth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass through looks good and implementation looks clean!

When will the model be available for testing?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can ibm-research/granite-4.0-tiny-test be added to test_hybrid.py? It would be good to get a TP test in place

Copy link
Contributor

@alex-jw-brooks alex-jw-brooks Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @tlrmchlsmth, I'm working on the transformers side of this PR - ibm-research/granite-4.0-tiny-test won't be the name of the actual model, it's just a placeholder since the name of the model hasn't been decided quite yet 😅

I agree that it would be nice to add a hybrid model test though

(CC @DarkLight1337 since I had just sent a note to him about this PR a little while ago also)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the model to the test_hybrid.py, but since the name is not final and therefore the link to HF is not established yet, the tests are marked as skipped for now. Is that okay?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bohnstingl sounds good! Were you able to run the tests on the upcoming model? I had seen some tp failures in my environment, but not just for this model, so may have something going on with my machines

Copy link
Contributor

@alex-jw-brooks alex-jw-brooks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening this! some thoughts


self.quant_config = vllm_config.quant_config

super().__init__()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason the super().__init__() is called in the middle here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the call to super().__init__() has been moved to the first instruction.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool thanks! I think you may have forgotten to push changes for this file

hidden_states = hidden_states * self.embedding_multiplier
residual = None
else:
assert intermediate_tensors is not None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove the asserts and raise errors instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The asserts are replaced with RuntimeErrors.


self.position_embedding_type = config.position_embedding_type
if self.position_embedding_type == "rope":
self.rotary_emb = get_rope(
Copy link
Contributor

@alex-jw-brooks alex-jw-brooks May 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be changed to set self.rotary_emb to None if it's not rope? This will avoid potential attribute errors and we can do

       if self.rotary_emb is not None:
             query, key = self.rotary_emb(positions, query, key)

in forward() instead

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need to store the position_embedding_type attribute explicitly. Now we set self.rotary_emb directly based on config.position_embedding_type and to None if there is no rope being used.

else:
assert intermediate_tensors is not None
hidden_states = intermediate_tensors["hidden_states"]
residual = intermediate_tensors["residual"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be overwritten right below?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial value of residual is now not fixed to None anymore.

@DarkLight1337
Copy link
Member

Can you move the test based on #17459?

Copy link

mergify bot commented May 1, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @s3woz.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label May 1, 2025
@bohnstingl
Copy link
Contributor

@DarkLight1337, I moved the test_granitemoehybrid.py to tests/models/language/generation/test_granitemoehybrid.py. Is that okay?

@DarkLight1337
Copy link
Member

Yes sounds good.

max_tokens: int,
num_logprobs: int,
) -> None:
if model == "ibm-research/granite-4.0-tiny-test":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for now, we can just comment it out in the hybrid model list instead of skipping like this for these tests

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted the changes of test_hybrid.py and just added the models as a comment to the HYBRID_MODELS

)


@pytest.mark.parametrize("model", SSM_MODELS[0:1] + HYBRID_MODELS[0:2])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to be SSM_MODELS[0:1] + HYBRID_MODELS[0:2]) rather than [SSM_MODELS[0], HYBRID_MODELS[0]]?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just an approach to include the new model in the test, because with [SSM_MODELS[0], HYBRID_MODELS[0]], it would not be tested, right? I reverted those changes though for the moment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup! I think that is on purpose (there is a comment at the top of the file), although now sure about the reason. I assume probably because they take awhile to run

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it is to reduce CI cost

@@ -0,0 +1,346 @@
# SPDX-License-Identifier: Apache-2.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same as tests/models/decoder_only/language/test_hybrid.py with the new model added right? I think this PR has both files

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize, I missed to commit the deletion of the old file. This should now be fixed.

Signed-off-by: Thomas Ortner <boh@zurich.ibm.com>
@bohnstingl bohnstingl force-pushed the granitemoehybrid_clean branch from 590412a to 0dbb1b9 Compare May 2, 2025 05:48
@mergify mergify bot removed the needs-rebase label May 2, 2025
@bohnstingl
Copy link
Contributor

@alex-jw-brooks. I apologize, I have missed to push the changes to vllm/model_executor/models/granitemoehybrid.py. I also did a rebase to the latest main, in order to have the new test structure correct. What do you think?

@s3woz s3woz force-pushed the granitemoehybrid_clean branch 2 times, most recently from bbff670 to 0d4f3cd Compare May 2, 2025 07:57
s3woz and others added 11 commits May 2, 2025 08:42
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Signed-off-by: Thomas Ortner <boh@zurich.ibm.com>
Signed-off-by: Thomas Ortner <boh@zurich.ibm.com>
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Co-authored-by: Thomas Ortner <boh@zurich.ibm.com>
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Co-authored-by: Thomas Ortner <boh@zurich.ibm.com>
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
@DarkLight1337 DarkLight1337 enabled auto-merge (squash) May 5, 2025 07:55
auto-merge was automatically disabled May 5, 2025 07:55

Head branch was pushed to by a user without write access

@s3woz s3woz force-pushed the granitemoehybrid_clean branch from f7de40b to 495fe33 Compare May 5, 2025 07:55
@s3woz
Copy link
Contributor Author

s3woz commented May 5, 2025

@DarkLight1337 , thank you for fixing the spacing.
I've also fixed model URLs, as we have it now uploaded in preview version.
Still, the HF code is not in the main yet, so skipping tests.
BTW, Does vLLM CI always pull the HF main? I.e. is it fine to enable test immediately after HF PR goes into main, or do we need to wait for HF release?

@DarkLight1337
Copy link
Member

The CI only uses the HF transformers version defined in the requirements file

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) May 5, 2025 08:08
… in CI

Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
auto-merge was automatically disabled May 5, 2025 09:32

Head branch was pushed to by a user without write access

…rs in HF Transformers

Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
@DarkLight1337
Copy link
Member

Maybe need to also add min_transformers_version

…in HF Transformers

Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
@s3woz
Copy link
Contributor Author

s3woz commented May 5, 2025

Maybe need to also add min_transformers_version

I don't know which exact version would include it. To be on the safe side, I've commented out for now.

Update: Apparently I cannot comment out, as tests fail then. As suggested, I've used min_transformers_version and set it to the next minor version 4.52.0, similarly to some other models in registry there.

s3woz added 2 commits May 5, 2025 12:31
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
@s3woz s3woz force-pushed the granitemoehybrid_clean branch from 711dcd4 to ca47978 Compare May 5, 2025 17:00
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
@s3woz s3woz force-pushed the granitemoehybrid_clean branch from ca47978 to 95ede00 Compare May 5, 2025 20:46
@s3woz
Copy link
Contributor Author

s3woz commented May 5, 2025

Two CI tests failed, both unrelated to this PR:

buildkite/ci/pr/v1-test — Failed (exit status 1)
 v1/engine/test_engine_core_client.py::test_startup_failure
 ...
 Start compiling function <code object forward at 0xda82150, file "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 345>
[2025-05-05T18:30:39Z] +++++++++++++++++++++++++++++++++++ Timeout ++++++++++++++++++++++++++++++++++++

and

buildkite/ci/pr/2-node-tests-4-gpus-in-total — Failed (exit status 1)
 docker run -d --gpus '"device=2,3"' ...
 nvidia-container-cli: device error: 3: unknown device: unknown.

I've force-pushed to retrigger CI checks and see if this persists. (Generally, I'm not sure if this is the right way to retrigger or is there a way to retrigger selected tests only.)

@DarkLight1337 DarkLight1337 merged commit 999328b into vllm-project:main May 6, 2025
52 checks passed
robertgshaw2-redhat added a commit to neuralmagic/vllm that referenced this pull request May 6, 2025
* [Model] Add GraniteMoeHybrid 4.0 model (vllm-project#17497)

Signed-off-by: Thomas Ortner <boh@zurich.ibm.com>
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Co-authored-by: Thomas Ortner <boh@zurich.ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>

* [easy] Fix logspam on PiecewiseBackend errors (vllm-project#17138)

Signed-off-by: rzou <zou3519@gmail.com>

* [Bugfix] Fixed prompt length for random dataset (vllm-project#17408)

Signed-off-by: Mikhail Podvitskii <podvitskiymichael@gmail.com>

* [Doc] Update notes for H2O-VL and Gemma3 (vllm-project#17219)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [Misc] Fix ScalarType float4 naming  (vllm-project#17690)

Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>

* Fix `dockerfilegraph` pre-commit hook (vllm-project#17698)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [Bugfix] Fix triton import with local TritonPlaceholder (vllm-project#17446)

Signed-off-by: Mengqing Cao <cmq0113@163.com>

* [V1] Enable TPU V1 backend by default (vllm-project#17673)

Signed-off-by: mgoin <mgoin64@gmail.com>

* [V1][PP] Support PP for MultiprocExecutor (vllm-project#14219)

Signed-off-by: jiang1.li <jiang1.li@intel.com>
Signed-off-by: jiang.li <jiang1.li@intel.com>

* [v1] AttentionMetadata for each layer (vllm-project#17394)

Signed-off-by: Chen Zhang <zhangch99@outlook.com>

* [Feat] Add deprecated=True to CLI args (vllm-project#17426)

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>

* [Docs] Use gh-file to add links to tool_calling.md (vllm-project#17709)

Signed-off-by: windsonsea <haifeng.yao@daocloud.io>

* [v1] Introduce KVCacheBlocks as interface between Scheduler and KVCacheManager (vllm-project#17479)

Signed-off-by: Chen Zhang <zhangch99@outlook.com>

* [doc] Add RAG Integration example (vllm-project#17692)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* [Bugfix] Fix modality limits in vision language example (vllm-project#17721)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* Make right sidebar more readable in "Supported Models" (vllm-project#17723)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [TPU] Increase block size and reset block shapes (vllm-project#16458)

* [Misc] Add Next Edit Prediction (NEP) datasets support in `benchmark_serving.py` (vllm-project#16839)

Signed-off-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal>
Signed-off-by: dtransposed <>
Co-authored-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal>

* [Bugfix] Fix for the condition to accept empty encoder inputs for mllama (vllm-project#17732)

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

* [Kernel] Unified Triton kernel that doesn't distinguish between prefill + decode (vllm-project#16828)

Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>

---------

Signed-off-by: Thomas Ortner <boh@zurich.ibm.com>
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Signed-off-by: rzou <zou3519@gmail.com>
Signed-off-by: Mikhail Podvitskii <podvitskiymichael@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Mengqing Cao <cmq0113@163.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Signed-off-by: jiang.li <jiang1.li@intel.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
Signed-off-by: reidliu41 <reid201711@gmail.com>
Signed-off-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal>
Signed-off-by: dtransposed <>
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Co-authored-by: Stan Wozniak <77159600+s3woz@users.noreply.github.com>
Co-authored-by: Thomas Ortner <boh@zurich.ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Richard Zou <zou3519@users.noreply.github.com>
Co-authored-by: Mikhail Podvitskii <podvitskiymichael@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Michael Yao <haifeng.yao@daocloud.io>
Co-authored-by: Reid <61492567+reidliu41@users.noreply.github.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: Jevin Jiang <jevin0change@gmail.com>
Co-authored-by: d.transposed <damian.bogunowicz@gmail.com>
Co-authored-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal>
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>
Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025
Signed-off-by: Thomas Ortner <boh@zurich.ibm.com>
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Co-authored-by: Thomas Ortner <boh@zurich.ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
mawong-amd pushed a commit to ROCm/vllm that referenced this pull request May 14, 2025
Signed-off-by: Thomas Ortner <boh@zurich.ibm.com>
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Co-authored-by: Thomas Ortner <boh@zurich.ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
zzzyq pushed a commit to zzzyq/vllm that referenced this pull request May 24, 2025
Signed-off-by: Thomas Ortner <boh@zurich.ibm.com>
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Co-authored-by: Thomas Ortner <boh@zurich.ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
Signed-off-by: Yuqi Zhang <yuqizhang@google.com>
minpeter pushed a commit to minpeter/vllm that referenced this pull request Jun 24, 2025
Signed-off-by: Thomas Ortner <boh@zurich.ibm.com>
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Co-authored-by: Thomas Ortner <boh@zurich.ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
Signed-off-by: minpeter <kali2005611@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants