Distribute and complete onnxruntime tests (decoder models) #2278

IlyasMoutawwakil · 2025-05-23T08:22:01Z

What does this PR do?

Decoder models are probably the most important topic of research right now, transformers itself decided to break many things in its rule book to keep up with the very fast changing field of LLMs. I think it should be easier to maintain and add new models with this PR by reducing the amount of special cases we handle or at least make the inference logic more straight forward.

This PR:

Enables io binding testing on cpu.
Enables and fixes more models that were never tested for inference (phi, olmo, olmo2).
Enables exhaustive testing of decoder models, including comparing pkv values in forward passes.
Enable using encoder-decoder models as decoders which was not supported before (bart, marian, blenderbot, ..).
Enables using models that support pkv cache to not it if configuret to, similar to transformers api (based on generation_config.use_cache)

Before, if you ran all tests in parallel some would interfere with each other and only pass in sequential model. Now you can run all 250 tests on a multi-core machine (dgx for example) in 2 minutes.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Who can review?

…or finegrained debugging

HuggingFaceDocBuilderDev · 2025-05-23T08:40:33Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…rization)

…types

…merged

Copilot

Pull Request Overview

This PR expands and stabilizes ONNX Runtime testing for decoder models by fixing model mappings, adding new models, updating exporter logic, and including decoder tests in CI.

Updates testing utilities: fixes model name mappings, adds new models (olmo, olmo2, phi), and refactors the setup flow
Extends NormalizedConfigManager and ONNX exporter configs to support new architectures and bumps minimum Transformers versions
Enhances exporter task filtering by Transformers version and updates CI matrix to include test_decoder.py

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
tests/onnxruntime/testing_utils.py	Refactor `_setup`, adjust model IDs, remove unsupported case skips
optimum/utils/normalized_config.py	Add mappings for `olmo`, `olmo2`, `phi`; update `bloom` args
optimum/exporters/tasks.py	Filter supported model types by Transformers version
optimum/exporters/onnx/utils.py	Bump version check threshold for position IDs requirement
optimum/exporters/onnx/model_configs.py	Update MIN_TRANSFORMERS_VERSION for `PhiOnnxConfig` & `BloomOnnxConfig`, remove legacy override
optimum/exporters/onnx/base.py	Simplify decoder merge logic using constants
.github/workflows/test_onnxruntime.yml	Add `test_decoder.py` to CI test matrix

Comments suppressed due to low confidence (3)

optimum/utils/normalized_config.py:278

The mapping for "phi3small" was removed, which will break configuration support for phi3small models. Consider re-adding this entry or verifying that dropping phi3small support is intentional.

"phi3small": NormalizedTextConfigWithGQA,

tests/onnxruntime/testing_utils.py:176

The previous skipTest logic for unsupported export cases was removed, so tests may now error instead of being skipped. Please reintroduce a mechanism to skip unsupported configurations.

if model_args.get("use_cache", False):

.github/workflows/test_onnxruntime.yml:31

The YAML array for test_file uses bracket notation with commas on separate lines, which may not be valid. Consider using a standard YAML list (one - item per line) to ensure CI parses it correctly.

IlyasMoutawwakil · 2025-05-25T08:46:43Z

Added qwen3 model support as proof that new models inference integration should, in most cases, "just work".
Contrary to #2252 where logits and generations didn't match (even when export and inference ran without failure).
@Abdennacer-Badaoui

echarlaix

Looks great @IlyasMoutawwakil 🔥 thanks!

echarlaix · 2025-05-27T13:35:21Z

optimum/exporters/tasks.py

+            and is_transformers_version(
+                ">=",
+                str(
+                    TasksManager.get_exporter_config_constructor(
+                        exporter, task=task, model_type=model_type
+                    ).func.MIN_TRANSFORMERS_VERSION
+                ),
+            )


I'm thinking it might be easier to raise an error instead before export when the transformers version is not compatible and to keep all supported architectures so that users know that the architecture export is supported but that transformers needs to be upgraded, what do you think @IlyasMoutawwakil ?

optimum/optimum/exporters/onnx/convert.py

Line 853 in e15053d

f"{config.MIN_TRANSFORMERS_VERSION}, got: {transformers.__version__}"

in fact I only added this here to be able to use it in test_find_untested_architectures
I can move the version checks there and keep this method as is.

done ! I simply remove them the unsupported models (because version) using CONFIG_MAPPING_NAMES

supported_transformers_models = set(CONFIG_MAPPING_NAMES.keys()) supported_export_models = set(TasksManager.get_supported_model_type_for_task(task=self.TASK, exporter="onnx")) supported_export_models = supported_export_models & supported_transformers_models untested_models = supported_export_models - tested_models

for the raising of version error during export, I thin that's already the case

echarlaix · 2025-05-27T13:46:56Z

optimum/onnxruntime/modeling_decoder.py

+        self.normalized_config = NormalizedConfigManager.get_normalized_config_class(config.model_type)(config)
+        self.key_value_input_names = [key for key in self.input_names if (".key" in key) or (".value" in key)]
+        self.key_value_output_names = [key for key in self.output_names if (".key" in key) or (".value" in key)]
+        self.can_use_cache = len(self.key_value_input_names) > 0 and len(self.key_value_output_names) > 0


question : is there any case where len(self.key_value_input_names) > 0 and len(self.key_value_output_names) == 0 (before we only used self.key_value_input_names to determine self.use_cache so wondering)

in the case of CausalLMs I don't think so, even legacy merged decoders have both.

optimum/onnxruntime/modeling_decoder.py

Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>

…imum into distribute-tests

IlyasMoutawwakil · 2025-05-28T07:07:23Z

@echarlaix all good to merge !

echarlaix · 2025-05-28T08:51:36Z

great, thanks a lot @IlyasMoutawwakil

IlyasMoutawwakil added 2 commits May 23, 2025 09:00

added test_decoders.py

2cc29a0

fix position ids for single batch and more complete decoder testing f…

4da3df2

…or finegrained debugging

IlyasMoutawwakil added 6 commits May 23, 2025 12:32

support merging seq2seq models when used as decoders and add more tests

d208f6b

fix pipe tests

608da13

update phi min transformers version (broken by cache position refacto…

f34f6e7

…rization)

remove deprecated bloom modeling

2e1a700

update opt onnx config to the one with position ids

00ec0c7

remove all complex deprecated modeling

88dc4a8

IlyasMoutawwakil changed the title ~~Distribute and complete onnxruntime tests~~ Distribute and complete onnxruntime tests (decoder models) May 23, 2025

IlyasMoutawwakil added 7 commits May 23, 2025 18:22

get_supported_model_type_for_task should only return suooprted model …

5f9419e

…types

update min transformers

478fd57

use transformers like api for use_cache and add can_use_cache and is_…

6aa3a17

…merged

testing

7da7015

fix

f9f7395

fix

8785be6

remove unnecessary

5f81515

IlyasMoutawwakil requested review from echarlaix and Copilot May 25, 2025 08:13

Copilot AI reviewed May 25, 2025

View reviewed changes

simply qwen3

6e3bff1

docs

aacf172

IlyasMoutawwakil force-pushed the distribute-tests branch from 8c6bcd6 to aacf172 Compare May 25, 2025 10:53

IlyasMoutawwakil mentioned this pull request May 25, 2025

No more forcing separators #2279

Open

3 tasks

IlyasMoutawwakil added 4 commits May 25, 2025 21:54

qwen-moe

2b0137f

model type shenanigans

7041a89

fix

de244d5

use test models from optimum-internal-hf with proper metadata

088b265

echarlaix approved these changes May 27, 2025

View reviewed changes

IlyasMoutawwakil and others added 6 commits May 27, 2025 16:28

Update optimum/onnxruntime/modeling_decoder.py

af7c6bb

Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>

keep supported model types

0149349

Merge branch 'distribute-tests' of https://github.com/huggingface/opt…

2d9d7ea

…imum into distribute-tests

Merge branch 'main' into distribute-tests

fbcf3a1

optimum model

2cf507c

fix failing test by forcing export

8821d97

echarlaix merged commit 85376e3 into main May 28, 2025
34 checks passed

echarlaix deleted the distribute-tests branch May 28, 2025 08:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Distribute and complete onnxruntime tests (decoder models) #2278

Distribute and complete onnxruntime tests (decoder models) #2278

Uh oh!

IlyasMoutawwakil commented May 23, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented May 23, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

IlyasMoutawwakil commented May 25, 2025 •

edited

Loading

Uh oh!

echarlaix left a comment

Uh oh!

echarlaix May 27, 2025

Uh oh!

IlyasMoutawwakil May 27, 2025

Uh oh!

IlyasMoutawwakil May 27, 2025

Uh oh!

IlyasMoutawwakil May 27, 2025

Uh oh!

echarlaix May 27, 2025

Uh oh!

IlyasMoutawwakil May 27, 2025

Uh oh!

Uh oh!

IlyasMoutawwakil commented May 28, 2025

Uh oh!

echarlaix commented May 28, 2025

Uh oh!

Uh oh!

Uh oh!

Distribute and complete onnxruntime tests (decoder models) #2278

Distribute and complete onnxruntime tests (decoder models) #2278

Uh oh!

Conversation

IlyasMoutawwakil commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented May 23, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

IlyasMoutawwakil commented May 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

echarlaix left a comment

Choose a reason for hiding this comment

Uh oh!

echarlaix May 27, 2025

Choose a reason for hiding this comment

Uh oh!

IlyasMoutawwakil May 27, 2025

Choose a reason for hiding this comment

Uh oh!

IlyasMoutawwakil May 27, 2025

Choose a reason for hiding this comment

Uh oh!

IlyasMoutawwakil May 27, 2025

Choose a reason for hiding this comment

Uh oh!

echarlaix May 27, 2025

Choose a reason for hiding this comment

Uh oh!

IlyasMoutawwakil May 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

IlyasMoutawwakil commented May 28, 2025

Uh oh!

echarlaix commented May 28, 2025

Uh oh!

Uh oh!

Uh oh!

IlyasMoutawwakil commented May 23, 2025 •

edited

Loading

IlyasMoutawwakil commented May 25, 2025 •

edited

Loading