Skip to content

[Model] Add GraniteMoeHybrid 4.0 model #17497

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
May 6, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
24bf59b
Added test to functionally verify match between HF and vLLM
bohnstingl Apr 16, 2025
b7e89a0
GraniteMoeHybrid model
s3woz Apr 16, 2025
e0136b3
Removed MLA and added RoPE
bohnstingl Apr 21, 2025
4cfef42
Updated basic examples
bohnstingl Apr 21, 2025
859e473
TensorParallel and cleanup
s3woz Apr 24, 2025
bcb5e77
Fixing previous commit
s3woz Apr 24, 2025
18bb63d
Cleanup
s3woz Apr 30, 2025
7cb6a81
Cleanup
s3woz Apr 30, 2025
b69ca16
Removing sampler. Fixing URLs
s3woz Apr 30, 2025
6f8b1f4
Fixing pre-hook errors.
s3woz Apr 30, 2025
c3b6460
ruff reformatting pre-commit fix
s3woz Apr 30, 2025
171532c
Skip tests until HF models become available
s3woz Apr 30, 2025
586247a
Pre-commit hook fix
s3woz Apr 30, 2025
b298dc1
Integrated code review; Moved tests; Added model to test_hybrid.py
bohnstingl May 1, 2025
0937f79
Added missing files
bohnstingl May 2, 2025
8ff0691
Added to supported_models.md and fixed URLs
s3woz May 5, 2025
495fe33
Update docs/source/models/supported_models.md
s3woz May 5, 2025
e114f1b
Commenting out failing HF registry test as code is not yet in HF used…
s3woz May 5, 2025
ca1d999
Temporarily marking model as is_available_online=False until it appea…
s3woz May 5, 2025
dc04497
Temporarily commenting out registration test until the model appears …
s3woz May 5, 2025
a636074
Fixing pre-commit error
s3woz May 5, 2025
8c68720
Marking model with next minor min_transformers_version
s3woz May 5, 2025
6c8e664
Update tests/models/language/generation/test_hybrid.py
s3woz May 5, 2025
95ede00
Clarifying comments
s3woz May 5, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/source/models/supported_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -385,6 +385,11 @@ See [this page](#generative-models) for more information on how to use generativ
* `ibm-granite/granite-3.0-1b-a400m-base`, `ibm-granite/granite-3.0-3b-a800m-instruct`, `ibm/PowerMoE-3b`, etc.
* ✅︎
* ✅︎
- * `GraniteMoeHybridForCausalLM`
* Granite 4.0 MoE Hybrid
* `ibm-granite/granite-4.0-tiny-preview`, etc.
* ✅︎
* ✅︎
- * `GraniteMoeSharedForCausalLM`
* Granite MoE Shared
* `ibm-research/moe-7b-1b-active-shared-experts` (test model)
Expand Down
41 changes: 41 additions & 0 deletions tests/models/language/generation/test_granitemoehybrid.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# SPDX-License-Identifier: Apache-2.0

import pytest

from ...utils import check_logprobs_close

# Path of the checkpoints
MODELS = [
"ibm-granite/granite-4.0-tiny-preview",
]


@pytest.mark.skip(
reason="Granite 4.0 is not yet available in huggingface transformers")
@pytest.mark.parametrize("model", MODELS)
@pytest.mark.parametrize("dtype", ["float16", "bfloat16"])
@pytest.mark.parametrize("max_tokens", [64])
@pytest.mark.parametrize("num_logprobs", [5])
def test_model_equivalence_to_hf_greedy(
hf_runner,
vllm_runner,
example_prompts,
model: str,
dtype: str,
max_tokens: int,
num_logprobs: int,
):
with vllm_runner(model, dtype=dtype) as vllm_model:
vllm_outputs = vllm_model.generate_greedy_logprobs(
example_prompts, max_tokens, num_logprobs)

with hf_runner(model, dtype=dtype) as hf_model:
hf_outputs = hf_model.generate_greedy_logprobs_limit(
example_prompts, max_tokens, num_logprobs)

check_logprobs_close(
outputs_0_lst=hf_outputs,
outputs_1_lst=vllm_outputs,
name_0="hf",
name_1="vllm",
)
3 changes: 3 additions & 0 deletions tests/models/language/generation/test_hybrid.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,9 @@

HYBRID_MODELS = [
"ai21labs/Jamba-tiny-dev",
# NOTE: ibm-granite/granite-4.0-tiny-preview are skipped currently as
# it is not yet available in huggingface transformers
# "ibm-granite/granite-4.0-tiny-preview",
# NOTE: Running Plamo2 in transformers implementation requires to install
# causal-conv1d package, which is not listed as a test dependency as it's
# not compatible with pip-compile.
Expand Down
2 changes: 2 additions & 0 deletions tests/models/registry.py
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,8 @@ def check_available_online(
{"1b": "EleutherAI/pythia-1.4b"}),
"GraniteForCausalLM": _HfExamplesInfo("ibm/PowerLM-3b"),
"GraniteMoeForCausalLM": _HfExamplesInfo("ibm/PowerMoE-3b"),
"GraniteMoeHybridForCausalLM": _HfExamplesInfo("ibm-granite/granite-4.0-tiny-preview", # noqa: E501
min_transformers_version="4.52.0"), # noqa: E501
"GraniteMoeSharedForCausalLM": _HfExamplesInfo("ibm-research/moe-7b-1b-active-shared-experts"), # noqa: E501
"Grok1ModelForCausalLM": _HfExamplesInfo("hpcai-tech/grok-1",
trust_remote_code=True),
Expand Down
Loading