Skip to content

Distribute and complete onnxruntime tests (decoder models) #2278

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
May 28, 2025

Conversation

IlyasMoutawwakil
Copy link
Member

@IlyasMoutawwakil IlyasMoutawwakil commented May 23, 2025

What does this PR do?

Decoder models are probably the most important topic of research right now, transformers itself decided to break many things in its rule book to keep up with the very fast changing field of LLMs. I think it should be easier to maintain and add new models with this PR by reducing the amount of special cases we handle or at least make the inference logic more straight forward.

This PR:

  • Enables io binding testing on cpu.
  • Enables and fixes more models that were never tested for inference (phi, olmo, olmo2).
  • Enables exhaustive testing of decoder models, including comparing pkv values in forward passes.
  • Enable using encoder-decoder models as decoders which was not supported before (bart, marian, blenderbot, ..).
  • Enables using models that support pkv cache to not it if configuret to, similar to transformers api (based on generation_config.use_cache)

Before, if you ran all tests in parallel some would interfere with each other and only pass in sequential model. Now you can run all 250 tests on a multi-core machine (dgx for example) in 2 minutes.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@IlyasMoutawwakil IlyasMoutawwakil changed the title Distribute and complete onnxruntime tests Distribute and complete onnxruntime tests (decoder models) May 23, 2025
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR expands and stabilizes ONNX Runtime testing for decoder models by fixing model mappings, adding new models, updating exporter logic, and including decoder tests in CI.

  • Updates testing utilities: fixes model name mappings, adds new models (olmo, olmo2, phi), and refactors the setup flow
  • Extends NormalizedConfigManager and ONNX exporter configs to support new architectures and bumps minimum Transformers versions
  • Enhances exporter task filtering by Transformers version and updates CI matrix to include test_decoder.py

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tests/onnxruntime/testing_utils.py Refactor _setup, adjust model IDs, remove unsupported case skips
optimum/utils/normalized_config.py Add mappings for olmo, olmo2, phi; update bloom args
optimum/exporters/tasks.py Filter supported model types by Transformers version
optimum/exporters/onnx/utils.py Bump version check threshold for position IDs requirement
optimum/exporters/onnx/model_configs.py Update MIN_TRANSFORMERS_VERSION for PhiOnnxConfig & BloomOnnxConfig, remove legacy override
optimum/exporters/onnx/base.py Simplify decoder merge logic using constants
.github/workflows/test_onnxruntime.yml Add test_decoder.py to CI test matrix
Comments suppressed due to low confidence (3)

optimum/utils/normalized_config.py:278

  • The mapping for "phi3small" was removed, which will break configuration support for phi3small models. Consider re-adding this entry or verifying that dropping phi3small support is intentional.
"phi3small": NormalizedTextConfigWithGQA,

tests/onnxruntime/testing_utils.py:176

  • The previous skipTest logic for unsupported export cases was removed, so tests may now error instead of being skipped. Please reintroduce a mechanism to skip unsupported configurations.
if model_args.get("use_cache", False):

.github/workflows/test_onnxruntime.yml:31

  • The YAML array for test_file uses bracket notation with commas on separate lines, which may not be valid. Consider using a standard YAML list (one - item per line) to ensure CI parses it correctly.
[

@IlyasMoutawwakil
Copy link
Member Author

IlyasMoutawwakil commented May 25, 2025

Added qwen3 model support as proof that new models inference integration should, in most cases, "just work".
Contrary to #2252 where logits and generations didn't match (even when export and inference ran without failure).
@Abdennacer-Badaoui

Copy link
Collaborator

@echarlaix echarlaix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great @IlyasMoutawwakil 🔥 thanks!

Comment on lines 1500 to 1507
and is_transformers_version(
">=",
str(
TasksManager.get_exporter_config_constructor(
exporter, task=task, model_type=model_type
).func.MIN_TRANSFORMERS_VERSION
),
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking it might be easier to raise an error instead before export when the transformers version is not compatible and to keep all supported architectures so that users know that the architecture export is supported but that transformers needs to be upgraded, what do you think @IlyasMoutawwakil ?

f"{config.MIN_TRANSFORMERS_VERSION}, got: {transformers.__version__}"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in fact I only added this here to be able to use it in test_find_untested_architectures
I can move the version checks there and keep this method as is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done ! I simply remove them the unsupported models (because version) using CONFIG_MAPPING_NAMES

        supported_transformers_models = set(CONFIG_MAPPING_NAMES.keys())
        supported_export_models = set(TasksManager.get_supported_model_type_for_task(task=self.TASK, exporter="onnx"))
        supported_export_models = supported_export_models & supported_transformers_models
        untested_models = supported_export_models - tested_models

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the raising of version error during export, I thin that's already the case

self.normalized_config = NormalizedConfigManager.get_normalized_config_class(config.model_type)(config)
self.key_value_input_names = [key for key in self.input_names if (".key" in key) or (".value" in key)]
self.key_value_output_names = [key for key in self.output_names if (".key" in key) or (".value" in key)]
self.can_use_cache = len(self.key_value_input_names) > 0 and len(self.key_value_output_names) > 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question : is there any case where len(self.key_value_input_names) > 0 and len(self.key_value_output_names) == 0 (before we only used self.key_value_input_names to determine self.use_cache so wondering)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the case of CausalLMs I don't think so, even legacy merged decoders have both.

@IlyasMoutawwakil
Copy link
Member Author

@echarlaix all good to merge !

@echarlaix
Copy link
Collaborator

great, thanks a lot @IlyasMoutawwakil

@echarlaix echarlaix merged commit 85376e3 into main May 28, 2025
34 checks passed
@echarlaix echarlaix deleted the distribute-tests branch May 28, 2025 08:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants