Fix gpt-oss model export #1861

apsonawane · 2025-11-11T06:42:39Z

This pull request updates the logic for handling the block_size attribute in QMoE (Quantized Mixture of Experts) model building and quantization. The changes ensure that block-wise quantization is only used when explicitly specified, defaulting to tensor-level quantization otherwise. The most important changes are:

Quantization logic updates:

In make_qmoe_weights, block-wise quantization is now only used if int4_block_size is explicitly present in extra_options; otherwise, tensor-level quantization is used by default. The block_size attribute in moe_attrs is set accordingly.

Operator construction improvements:

In make_qmoe_op, the block_size attribute is only included in the operator's attributes if it was explicitly set in moe_attrs, preventing unnecessary or default values from being passed.
The direct passing of block_size as a parameter to make_node is removed; it is now only included via extra_kwargs when appropriate.

Copilot

Pull Request Overview

This PR updates the QMoE (Quantized Mixture of Experts) quantization logic to distinguish between block-wise and tensor-level quantization based on whether the int4_block_size parameter is explicitly specified by the user. The key change is making block-wise quantization opt-in rather than automatic.

Switches QMoE quantization from automatic block-size detection to explicit opt-in behavior
Defaults to tensor-level quantization (using TensorRT-LLM) when int4_block_size is not specified
Conditionally includes the block_size attribute in the QMoE operator based on the quantization method used

src/python/py/models/builder.py

LorenRd · 2025-11-21T13:42:40Z

Hi @apsonawane @kunal-vaishnavi not sure if it's part of the scope of this PR but after building from this branch, exporting with DML int4 I tried to run it but it fails

onnxruntime-directml                     1.23.0
onnxruntime-genai-directml               0.11.0.dev0 C:\Users\ailab\Desktop\onnxruntime-genai\build\Windows\Release\wheel

RuntimeError: Load model from C:\Users\ailab\Desktop\xllm_lib\artifacts\models\gpt_oss_20b_onnx_dml_int4\model.onnx failed:Type Error: Type parameter (T) of Optype (SkipSimplifiedLayerNormalization) bound to different types (tensor(float16) and tensor(float) in node (/model/layers.1/input_layernorm/SkipLayerNorm).

src/python/py/models/builder.py

tianleiwu · 2025-11-21T18:04:59Z

Please merge main to resolve conflicts.

apsonawane · 2025-12-03T01:05:22Z

@LorenRd sorry for the late reply. I updated the exception since dml does not support block-wise quant, earlier we were checking for cpu specifically so this PR should not affect dml export. Were you able to run it earlier?

This pull request updates the logic for handling the `block_size` attribute in QMoE (Quantized Mixture of Experts) model building and quantization. The changes ensure that block-wise quantization is only used when explicitly specified, defaulting to tensor-level quantization otherwise. The most important changes are: **Quantization logic updates:** * In `make_qmoe_weights`, block-wise quantization is now only used if `int4_block_size` is explicitly present in `extra_options`; otherwise, tensor-level quantization is used by default. The `block_size` attribute in `moe_attrs` is set accordingly. **Operator construction improvements:** * In `make_qmoe_op`, the `block_size` attribute is only included in the operator's attributes if it was explicitly set in `moe_attrs`, preventing unnecessary or default values from being passed. * The direct passing of `block_size` as a parameter to `make_node` is removed; it is now only included via `extra_kwargs` when appropriate.

apsonawane requested review from Copilot and kunal-vaishnavi November 11, 2025 06:42

Copilot started reviewing on behalf of apsonawane November 11, 2025 06:43 View session

Copilot finished reviewing on behalf of apsonawane November 11, 2025 06:46

Copilot AI reviewed Nov 11, 2025

View reviewed changes

src/python/py/models/builder.py Outdated Show resolved Hide resolved

apsonawane force-pushed the asonawane/fix branch from 989dcd5 to d888824 Compare November 14, 2025 03:21

tianleiwu reviewed Nov 21, 2025

View reviewed changes

src/python/py/models/builder.py Outdated Show resolved Hide resolved

apsonawane force-pushed the asonawane/fix branch from d888824 to f6f9ff3 Compare December 3, 2025 01:02

tianleiwu approved these changes Dec 3, 2025

View reviewed changes

kunal-vaishnavi approved these changes Dec 4, 2025

View reviewed changes

apsonawane enabled auto-merge (squash) December 4, 2025 08:41

apsonawane added 2 commits December 4, 2025 22:11

rebase with main and update

78569cd

Update exception

2620d09

apsonawane force-pushed the asonawane/fix branch from 9fbd482 to 2620d09 Compare December 5, 2025 06:11

apsonawane merged commit 39561d1 into main Dec 5, 2025
15 checks passed

apsonawane deleted the asonawane/fix branch December 5, 2025 09:12

kunal-vaishnavi added the 0.11.5 label Dec 18, 2025

kunal-vaishnavi mentioned this pull request Jan 7, 2026

Fix QMoE blockwise quantization support for TRT-RTX execution provider #1926

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix gpt-oss model export #1861

Fix gpt-oss model export #1861

Uh oh!

apsonawane commented Nov 11, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

LorenRd commented Nov 21, 2025

Uh oh!

Uh oh!

tianleiwu commented Nov 21, 2025

Uh oh!

apsonawane commented Dec 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Fix gpt-oss model export #1861

Fix gpt-oss model export #1861

Uh oh!

Conversation

apsonawane commented Nov 11, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

LorenRd commented Nov 21, 2025

Uh oh!

Uh oh!

tianleiwu commented Nov 21, 2025

Uh oh!

apsonawane commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

apsonawane commented Dec 3, 2025 •

edited

Loading