Add GPT-OSS quant support #887

yiliu30 · 2025-10-13T02:56:59Z

.

Signed-off-by: yiliu30 <yi4.liu@intel.com>

Copilot

Pull Request Overview

This PR adds GPT-OSS quantization support to the auto_round library. The implementation includes a new MoE (Mixture of Experts) converter for GPT-OSS models along with comprehensive test coverage.

Creates specialized handling for GPT-OSS models by converting fused expert operations to individual expert modules for quantization
Refactors the MoE converter architecture to support multiple model types through a dispatch table
Adds comprehensive test coverage for GPT-OSS quantization with MXFP4 and MXFP8 schemes

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
test/test_cpu/test_gpt_oss.py	New test file with fixtures and parametrized tests for GPT-OSS quantization
auto_round/utils.py	Added debug logging for non-quantized layers
auto_round/special_model_handler.py	Refactored MoE converter to use dispatch table and added GPT-OSS support
auto_round/modelling/llama4.py	Extracted Llama4 MoE converter to dedicated module
auto_round/modelling/gpt_oss.py	New GPT-OSS MoE converter with specialized expert handling
auto_round/modelling/init.py	New package initialization file

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

auto_round/modelling/gpt_oss.py

Copilot · 2025-10-14T02:03:45Z

auto_round/modelling/gpt_oss.py

+            _update_parameter(mlp.gate_proj, "weight", original.experts.gate_up_proj[i, :, ::2].T)
+            _update_parameter(mlp.up_proj, "weight", original.experts.gate_up_proj[i, :, 1::2].T)


The magic numbers ::2 and 1::2 for tensor slicing should be replaced with named constants like GATE_STRIDE = 2 and GATE_OFFSET = 0, UP_OFFSET = 1 to improve code readability and maintainability.

test/test_cpu/test_gpt_oss.py

…t-oss

wenhuach21 · 2025-10-14T02:14:26Z

auto_round/special_model_handler.py

 SPECIAL_SHARED_CACHE_KEYS["MiniMaxText01ForCausalLM"] = ("slope_rate",)

-CONVERT_EXPERT_TO_LINEAR_MODELS = ["llama4"]
+CONVERT_EXPERT_TO_LINEAR_MODELS = ["llama4", "gpt_oss"]


It would be better not to categorize it into too many detailed types. A single flag like model_need_to_convert, or a similar name, should be sufficient, since some models may require conversion even if they don’t have expert layers. We provide a converter function for each model if needed, regardless of which parts need to be converted.

Yeah, I agree, the replacement code could be organized better. Once we support more model replacements, we can refactor that part as needed. For now, how about leaving it as is, since we have some higher-priority tasks to focus on?

I don’t think it will take much effort to change. You could also finish the higher-priority tasks first.

Opened an issue to track #899

* Fix rtn tuning_device issue (#893) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * fix vlm gguf ut (#895) Signed-off-by: n1ck-guo <heng.guo@intel.com> * update alg_ext.abi3.so with python compatible version (#894) * move ste from quant to round for nvfp4 (#889) Signed-off-by: He, Xin3 <xin3.he@intel.com> * Add GPT-OSS quant support (#887) * better help printing information (#883) * better help printing information Signed-off-by: n1ck-guo <heng.guo@intel.com> * speedup quant and evaluation, fix recompile issue (#897) * rewrite the implementation for ease-of-maintain Signed-off-by: He, Xin3 <xin3.he@intel.com> * fix bug Signed-off-by: He, Xin3 <xin3.he@intel.com> * fix quant performance Signed-off-by: He, Xin3 <xin3.he@intel.com> * Update auto_round/compressors/base.py --------- Signed-off-by: He, Xin3 <xin3.he@intel.com> * fix nvfp act quantization bug (#891) * fix nvfp act quantization bug Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * add cuda ut for moe nvfp quantize Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * add cpu UT, refine cuda UT Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix ut typo Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix cpu ut Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * enhance experts amax match, refine UT Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * support automatic mixed bits assignment (#851) * try to fix gguf issue (#886) * remove numba from requirments (#905) Signed-off-by: yiliu30 <yi4.liu@intel.com> * Extend mxfp loading dtypes (#907) * block dataset logger info (#908) Signed-off-by: n1ck-guo <heng.guo@intel.com> * fix torch compile issue in AutoScheme (#909) * Revert "Extend mxfp loading dtypes (#907)" (#915) This reverts commit 0c2619c. * support disable_opt_rtn in auto-scheme (#913) * fix llama 4 ut (#896) * fix ut of llama 4 Signed-off-by: n1ck-guo <heng.guo@intel.com> * add numba for cpu lib (#919) Signed-off-by: yiliu30 <yi4.liu@intel.com> * Loosen the packing restrictions for mxfp&nvfp (#911) * Loosen the packing restrictions for mxfp&nvfp, enable Qwen1.5-MoE-A2.7B quantize Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix UT Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refine mxfp&nvfp layer checker Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * fix pylint Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Extend mxfp loading dtypes (#916) Signed-off-by: root <root@clx5673.ra.intel.com> Signed-off-by: yiliu30 <yi4.liu@intel.com> Co-authored-by: root <root@clx5673.ra.intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix act config exporting for mixed schemes (#903) * fp8 exporting bugfix Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * fix act related config saving Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add ut for act_config check Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refine extra_config saving, add UTs Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * fix ut typo Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * fix ut typo Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * fixtypo Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix CI Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * fix scan issue Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * fix scan issue Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * rm global variable Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rerun ut Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refine ut Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * optimize rtn for int woq (#924) * fix bug of gguf and support for LiquidAI/LFM2-1.2B (#927) Signed-off-by: n1ck-guo <heng.guo@intel.com> * remove numpy<2.0 limitation (#921) * enable regex quantization config saving for mixed bits (#825) * enable dynamic quantization config saving Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixtypo Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rebase code, refine config saving Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refine ut Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * fix UT Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * enable hf loading for regex, add UTs Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refine export, enhance gptq UT Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix Flux tuning issue (#936) Signed-off-by: Mengni Wang <mengni.wang@intel.com> * gguf support for inclusionAI/Ling-flash-2.0 (#940) * remove low_cpu_mem (#934) * Add compatibility test (#918) * Add commit hash to version (#941) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * gguf weight type align with original, output.weight, token_embed (#900) * support attention mask in user's dataset (#930) * Add diffusion README (#923) * update readme (#949) * refactor utils file (#943) * refact utils Signed-off-by: n1ck-guo <heng.guo@intel.com> * update readme for sglang support (#953) * update readme for sglang support Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * refine doc Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * Update README.md --------- Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> Co-authored-by: Wenhua Cheng <wenhua.cheng@intel.com> * update gguf and support for CompressedLinear (#950) * Reduce AutoSchem VRAM usage by up to 10X (#944) * add self attribution and fix avg_bits error (#956) * add self attribution and fix avg_bits error --------- Signed-off-by: He, Xin3 <xin3.he@intel.com> Co-authored-by: Wenhua Cheng <wenhua.cheng@intel.com> * add logo (#960) * refine AutoScheme readme/code (#958) * update readme (#962) * fix critic disable_opt_rtn regression (#963) * [1/N] Initial vllm-ext evaluation support (MXFP4 MOE) (#935) Signed-off-by: yiliu30 <yi4.liu@intel.com> * fix bug of imatrix contains 0 (#955) * fix rtn bug (#966) * enhance flux doc (#967) * clean code (#968) * support for model scope (#957) * support for model scope Signed-off-by: n1ck-guo <heng.guo@intel.com> * merge main branch to alg_ext (#970) * fix cuda CI backend issue, fixtypo (#974) * disable compile packing by default (#975) Signed-off-by: yiliu30 <yi4.liu@intel.com> * enhance auto device map and support XPU (#961) * enhance auto device map and support XPU --------- Signed-off-by: He, Xin3 <xin3.he@intel.com> * refine readme (#978) * cli support for positional arguments model (#979) Signed-off-by: n1ck-guo <heng.guo@intel.com> * update bits (#986) Signed-off-by: He, Xin3 <xin3.he@intel.com> * fix guff scheme and device_map bug (#969) * add support for Magistral-Small (#980) * support model_dtype and fix bug of scheme contains quotes, mllm eval (#985) --------- Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Signed-off-by: n1ck-guo <heng.guo@intel.com> Signed-off-by: He, Xin3 <xin3.he@intel.com> Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> Signed-off-by: yiliu30 <yi4.liu@intel.com> Signed-off-by: root <root@clx5673.ra.intel.com> Signed-off-by: Mengni Wang <mengni.wang@intel.com> Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> Co-authored-by: Tang Kaihui <kaihui.tang@intel.com> Co-authored-by: Heng Guo <heng.guo@intel.com> Co-authored-by: Xin He <xin3.he@intel.com> Co-authored-by: Yi Liu <yi4.liu@intel.com> Co-authored-by: Weiwei <weiwei1.zhang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Wenhua Cheng <wenhua.cheng@intel.com> Co-authored-by: root <root@clx5673.ra.intel.com> Co-authored-by: Wang, Mengni <mengni.wang@intel.com> Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com>

yiliu30 added 10 commits October 10, 2025 03:52

add gpt oss

e255abb

Signed-off-by: yiliu30 <yi4.liu@intel.com>

refine code

4340b35

Signed-off-by: yiliu30 <yi4.liu@intel.com>

refator llama4

1882733

Signed-off-by: yiliu30 <yi4.liu@intel.com>

clean

eb55c54

Signed-off-by: yiliu30 <yi4.liu@intel.com>

fix

a4bd97f

Signed-off-by: yiliu30 <yi4.liu@intel.com>

refine code

2b9c015

Signed-off-by: yiliu30 <yi4.liu@intel.com>

add ut

30a560e

Signed-off-by: yiliu30 <yi4.liu@intel.com>

fix ut

6707c34

Signed-off-by: yiliu30 <yi4.liu@intel.com>

fix

03272f3

Signed-off-by: yiliu30 <yi4.liu@intel.com>

Merge branch 'main' into gpt-oss

d25336c

yiliu30 added the WIP label Oct 13, 2025

yiliu30 added 2 commits October 13, 2025 13:19

Merge branch 'main' into gpt-oss

6e27b7c

fix

595ebfb

Signed-off-by: yiliu30 <yi4.liu@intel.com>

yiliu30 requested review from Copilot, mengniwang95 and wenhuach21 and removed request for mengniwang95 and wenhuach21 October 14, 2025 02:02

yiliu30 removed the WIP label Oct 14, 2025

yiliu30 requested a review from mengniwang95 October 14, 2025 02:02

Copilot AI reviewed Oct 14, 2025

View reviewed changes

Merge branch 'gpt-oss' of https://github.com/intel/auto-round into gp…

9a55217

…t-oss

wenhuach21 reviewed Oct 14, 2025

View reviewed changes

yiliu30 marked this pull request as draft October 14, 2025 02:31

yiliu30 marked this pull request as ready for review October 14, 2025 03:00

mengniwang95 approved these changes Oct 14, 2025

View reviewed changes

yiliu30 requested a review from wenhuach21 October 15, 2025 03:40

yiliu30 mentioned this pull request Oct 15, 2025

Refactor modelling replacement code #899

Open

wenhuach21 merged commit 081c92a into main Oct 15, 2025
14 checks passed

wenhuach21 deleted the gpt-oss branch October 15, 2025 04:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add GPT-OSS quant support #887

Add GPT-OSS quant support #887

yiliu30 commented Oct 13, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Oct 14, 2025

Uh oh!

Uh oh!

Uh oh!

wenhuach21 Oct 14, 2025

Uh oh!

yiliu30 Oct 15, 2025

Uh oh!

wenhuach21 Oct 15, 2025

Uh oh!

yiliu30 Oct 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		_update_parameter(mlp.gate_proj, "weight", original.experts.gate_up_proj[i, :, ::2].T)
		_update_parameter(mlp.up_proj, "weight", original.experts.gate_up_proj[i, :, 1::2].T)

Add GPT-OSS quant support #887

Add GPT-OSS quant support #887

Conversation

yiliu30 commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Copilot AI Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

wenhuach21 Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

yiliu30 Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

wenhuach21 Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

yiliu30 Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yiliu30 commented Oct 13, 2025 •

edited

Loading