Skip to content

support added for HF SmolLM3-3B#1715

Open
AshutoshSinghIntel wants to merge 9 commits into
huggingface:mainfrom
AshutoshSinghIntel:support-SmolLM3-3B
Open

support added for HF SmolLM3-3B#1715
AshutoshSinghIntel wants to merge 9 commits into
huggingface:mainfrom
AshutoshSinghIntel:support-SmolLM3-3B

Conversation

@AshutoshSinghIntel
Copy link
Copy Markdown

@AshutoshSinghIntel AshutoshSinghIntel commented May 4, 2026

What does this PR do?

OpenVINO export:

optimum-cli export openvino -m HuggingFaceTB/SmolLM3-3B ./SmolLM3-3B --task text-generation-with-past

Inference Script:

import argparse
from transformers import AutoTokenizer
from optimum.intel.openvino import OVModelForCausalLM

model_id = "HuggingFaceTB/SmolLM3-3B"

def main():
    parser = argparse.ArgumentParser(description="SmolLM3-3B inference with OpenVINO")
    parser.add_argument("--model", type=str, default=model_id, help="Path to exported OV model or HF model ID")
    parser.add_argument("--max-new-tokens", type=int, default=100)
    parser.add_argument("--device", type=str, default="CPU")
    args = parser.parse_args()

    model = OVModelForCausalLM.from_pretrained(args.model, device=args.device)
    tokenizer = AutoTokenizer.from_pretrained(args.model)

    messages = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"},
    ]

    inputs = tokenizer.apply_chat_template(messages, tokenize=True, return_dict=True, return_tensors="pt", add_generation_prompt=True)
    output = model.generate(**inputs, max_new_tokens=args.max_new_tokens)
    print(tokenizer.decode(output[0, inputs["input_ids"].shape[1]:], skip_special_tokens=True))

if __name__ == "__main__":
    main()

Fixes # CVS-183437

Before submitting

  • [N/A] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@AshutoshSinghIntel AshutoshSinghIntel marked this pull request as ready for review May 4, 2026 14:50
@rkazants rkazants requested review from Copilot, echarlaix and popovaan May 5, 2026 05:03
Copy link
Copy Markdown
Collaborator

@rkazants rkazants left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please provide proper PR description with code snippets for export and inference. See reference #1688

The other part looks good to me

@rkazants rkazants requested a review from regisss May 5, 2026 05:04
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds OpenVINO export support for the HuggingFace Transformers smollm3 architecture (SmolLM3-3B), wiring it into the OpenVINO TasksManager configs, test matrices, and the supported-models documentation.

Changes:

  • Register a new SmolLM3OpenVINOConfig (Llama-based) for multiple tasks with a minimum transformers version of 4.53.0.
  • Extend OpenVINO exporter and GenAI/decoder/CLI test coverage to include smollm3, and add a tiny internal test model id.
  • Update OpenVINO supported-architectures documentation to list SmolLM3, and remove smollm3 from the “ONNX supported but untested” warning set.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.

Show a summary per file
File Description
optimum/exporters/openvino/model_configs.py Registers smollm3 in the OpenVINO TasksManager with an OpenVINO config class and version gate.
optimum/exporters/openvino/utils.py Removes smollm3 from ONNX_SUPPORTED_ARCHITECTURES so it’s no longer treated as “untested / export at your own risk”.
tests/openvino/utils_tests.py Adds an internal tiny test model mapping for smollm3.
tests/openvino/test_genai.py Includes smollm3 in the GenAI LLM pipeline supported-architecture matrix for transformers>=4.53.0.
tests/openvino/test_exporters_cli.py Adds CLI export test coverage and expected tokenizer artifact counts for smollm3.
tests/openvino/test_export.py Adds smollm3 to the export integration test architecture mapping.
tests/openvino/test_decoder.py Adds smollm3 to the decoder integration supported-architecture matrix for transformers>=4.53.0.
docs/source/openvino/models.mdx Documents SmolLM3 as a supported architecture.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@regisss
Copy link
Copy Markdown
Contributor

regisss commented May 5, 2026

LGTM.
I agree with @rkazants' comment, let's stay consistent with previous PRs please.

@AshutoshSinghIntel
Copy link
Copy Markdown
Author

Thanks @rkazants and @regisss , I have updated the PR description.

@AshutoshSinghIntel
Copy link
Copy Markdown
Author

@IlyasMoutawwakil Could you please provide me access to https://huggingface.co/optimum-intel-internal-testing for uploading the tiny-random-smollm3 ? I ran it locally and tests are passing.

@rkazants
Copy link
Copy Markdown
Collaborator

rkazants commented May 5, 2026

@popovaan, please take a look:

  1. check real model work
  2. ask for wwb metrics

@AshutoshSinghIntel
Copy link
Copy Markdown
Author

AshutoshSinghIntel commented May 7, 2026

The WWB similarity score is 1.0 (using CPU, fp16, and the default number of samples, which was 27).

Similarity evaluation:  96%|#########6| 
26/27 [00:05<00:00,  5.61it/s]
Similarity evaluation: 100%|##########| 
27/27 [00:05<00:00,  5.67it/s]
Similarity evaluation: 100%|##########| 
27/27 [00:05<00:00,  4.74it/s]
INFO:whowhatbench.wwb:Metrics for model: SmolLM3-3B
INFO:whowhatbench.wwb:   similarity
0         1.0

@AshutoshSinghIntel
Copy link
Copy Markdown
Author

Below is the script for creating tiny-model (I do not have the access to publish yet):

from transformers import (
    AutoTokenizer,
    SmolLM3Config,
    SmolLM3ForCausalLM,
)

def create_tiny_random_smollm3():

    config = SmolLM3Config(
        vocab_size=128256,
        hidden_size=32,
        intermediate_size=64,
        num_hidden_layers=2,
        num_attention_heads=4,
        num_key_value_heads=2,
        max_position_embeddings=256,
        hidden_act="silu",
        rms_norm_eps=1e-6,
        tie_word_embeddings=True,
        use_cache=True,
        attention_bias=False,
        mlp_bias=False,
        use_sliding_window=False,
        pad_token_id=128004,
        bos_token_id=128000,
        eos_token_id=128012,
    )

    model = SmolLM3ForCausalLM(config)
    print(f"Model parameters: {sum(p.numel() for p in model.parameters()):,}")

    # Use SmolLM3-3B tokenizer
    tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM3-3B")

    output_dir = "./tiny-random-smollm3"
    model.save_pretrained(output_dir)
    tokenizer.save_pretrained(output_dir)
    print(f"Saved to {output_dir}")

if __name__ == "__main__":
    create_tiny_random_smollm3()

Comment thread optimum/exporters/openvino/utils.py
Comment thread optimum/exporters/openvino/model_configs.py
@popovaan
Copy link
Copy Markdown
Collaborator

popovaan commented May 7, 2026

Please add quantization tests.

@popovaan
Copy link
Copy Markdown
Collaborator

popovaan commented May 7, 2026

There was a warning during conversion of the model "The OpenVINO export of smollm3 models is not officially supported by optimum-intel, export at your own risks.", was it fixed?

@AshutoshSinghIntel
Copy link
Copy Markdown
Author

There was a warning during conversion of the model "The OpenVINO export of smollm3 models is not officially supported by optimum-intel, export at your own risks.", was it fixed?

With the dedicated config added by this PR, the warning does not appear.

@AshutoshSinghIntel
Copy link
Copy Markdown
Author

Please add quantization tests.

added, kindly check.

@popovaan
Copy link
Copy Markdown
Collaborator

@rkazants @echarlaix @regisss please review this PR.

Copy link
Copy Markdown
Contributor

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AshutoshSinghIntel
Copy link
Copy Markdown
Author

Hi @regisss and @rkazants , kindly help to re-review. I updated code to take care of different int8 count expectation based on different task in a single model.

e.g. in SmolLM3-3B,
text-generation-with-past: 30
feature-extraction: 30
text-classification: 32

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants