enable regex quantization config saving for mixed bits #825

WeiweiZhang1 · 2025-09-16T06:50:51Z

No description provided.

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

for more information, see https://pre-commit.ci

auto_round/autoround.py

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

for more information, see https://pre-commit.ci

wenhuach21 · 2025-09-16T09:07:11Z

TODO:
1 Supported the inference for AutoRound format in Transformers
2 Validated the inference for AutoRound format in vLLMs/Sglang
3 ADD UTs

for more information, see https://pre-commit.ci

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

for more information, see https://pre-commit.ci

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

…ithub.com/intel/auto-round into enable_dynamic_quantization_config_saving

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

for more information, see https://pre-commit.ci

auto_round/export/export_to_awq/export.py

test/test_cuda/test_mix_bits.py

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

for more information, see https://pre-commit.ci

wenhuach21 · 2025-10-17T02:19:02Z

Have you validated the GPTQ format? Even without regex in the layer_config , the configuration should still be set under the dynamic key in some way.

WeiweiZhang1 · 2025-10-17T02:24:33Z

Have you validated the GPTQ format? Even without regex in the layer_config , the configuration should still be set under the dynamic key in some way.

sure, also added UT for auto_gptq format infer & convert to autoround format infer

wenhuach21 · 2025-10-22T08:21:13Z

could you also have a check whether this issue have been resolved by this pr
#902

wenhuach21 · 2025-10-22T08:26:16Z

Qwen3-VL-30B-A3B-Instruc or smaller model is ok. And transformers is enough, no need to verify on vllm ,as vllm should not be able to support our converted model.

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

for more information, see https://pre-commit.ci

* Fix rtn tuning_device issue (#893) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * fix vlm gguf ut (#895) Signed-off-by: n1ck-guo <heng.guo@intel.com> * update alg_ext.abi3.so with python compatible version (#894) * move ste from quant to round for nvfp4 (#889) Signed-off-by: He, Xin3 <xin3.he@intel.com> * Add GPT-OSS quant support (#887) * better help printing information (#883) * better help printing information Signed-off-by: n1ck-guo <heng.guo@intel.com> * speedup quant and evaluation, fix recompile issue (#897) * rewrite the implementation for ease-of-maintain Signed-off-by: He, Xin3 <xin3.he@intel.com> * fix bug Signed-off-by: He, Xin3 <xin3.he@intel.com> * fix quant performance Signed-off-by: He, Xin3 <xin3.he@intel.com> * Update auto_round/compressors/base.py --------- Signed-off-by: He, Xin3 <xin3.he@intel.com> * fix nvfp act quantization bug (#891) * fix nvfp act quantization bug Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * add cuda ut for moe nvfp quantize Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * add cpu UT, refine cuda UT Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix ut typo Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix cpu ut Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * enhance experts amax match, refine UT Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * support automatic mixed bits assignment (#851) * try to fix gguf issue (#886) * remove numba from requirments (#905) Signed-off-by: yiliu30 <yi4.liu@intel.com> * Extend mxfp loading dtypes (#907) * block dataset logger info (#908) Signed-off-by: n1ck-guo <heng.guo@intel.com> * fix torch compile issue in AutoScheme (#909) * Revert "Extend mxfp loading dtypes (#907)" (#915) This reverts commit 0c2619c. * support disable_opt_rtn in auto-scheme (#913) * fix llama 4 ut (#896) * fix ut of llama 4 Signed-off-by: n1ck-guo <heng.guo@intel.com> * add numba for cpu lib (#919) Signed-off-by: yiliu30 <yi4.liu@intel.com> * Loosen the packing restrictions for mxfp&nvfp (#911) * Loosen the packing restrictions for mxfp&nvfp, enable Qwen1.5-MoE-A2.7B quantize Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix UT Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refine mxfp&nvfp layer checker Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * fix pylint Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Extend mxfp loading dtypes (#916) Signed-off-by: root <root@clx5673.ra.intel.com> Signed-off-by: yiliu30 <yi4.liu@intel.com> Co-authored-by: root <root@clx5673.ra.intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix act config exporting for mixed schemes (#903) * fp8 exporting bugfix Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * fix act related config saving Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add ut for act_config check Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refine extra_config saving, add UTs Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * fix ut typo Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * fix ut typo Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * fixtypo Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix CI Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * fix scan issue Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * fix scan issue Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * rm global variable Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rerun ut Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refine ut Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * optimize rtn for int woq (#924) * fix bug of gguf and support for LiquidAI/LFM2-1.2B (#927) Signed-off-by: n1ck-guo <heng.guo@intel.com> * remove numpy<2.0 limitation (#921) * enable regex quantization config saving for mixed bits (#825) * enable dynamic quantization config saving Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixtypo Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rebase code, refine config saving Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refine ut Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * fix UT Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * enable hf loading for regex, add UTs Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refine export, enhance gptq UT Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix Flux tuning issue (#936) Signed-off-by: Mengni Wang <mengni.wang@intel.com> * gguf support for inclusionAI/Ling-flash-2.0 (#940) * remove low_cpu_mem (#934) * Add compatibility test (#918) * Add commit hash to version (#941) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * gguf weight type align with original, output.weight, token_embed (#900) * support attention mask in user's dataset (#930) * Add diffusion README (#923) * update readme (#949) * refactor utils file (#943) * refact utils Signed-off-by: n1ck-guo <heng.guo@intel.com> * update readme for sglang support (#953) * update readme for sglang support Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * refine doc Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * Update README.md --------- Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> Co-authored-by: Wenhua Cheng <wenhua.cheng@intel.com> * update gguf and support for CompressedLinear (#950) * Reduce AutoSchem VRAM usage by up to 10X (#944) * add self attribution and fix avg_bits error (#956) * add self attribution and fix avg_bits error --------- Signed-off-by: He, Xin3 <xin3.he@intel.com> Co-authored-by: Wenhua Cheng <wenhua.cheng@intel.com> * add logo (#960) * refine AutoScheme readme/code (#958) * update readme (#962) * fix critic disable_opt_rtn regression (#963) * [1/N] Initial vllm-ext evaluation support (MXFP4 MOE) (#935) Signed-off-by: yiliu30 <yi4.liu@intel.com> * fix bug of imatrix contains 0 (#955) * fix rtn bug (#966) * enhance flux doc (#967) * clean code (#968) * support for model scope (#957) * support for model scope Signed-off-by: n1ck-guo <heng.guo@intel.com> * merge main branch to alg_ext (#970) * fix cuda CI backend issue, fixtypo (#974) * disable compile packing by default (#975) Signed-off-by: yiliu30 <yi4.liu@intel.com> * enhance auto device map and support XPU (#961) * enhance auto device map and support XPU --------- Signed-off-by: He, Xin3 <xin3.he@intel.com> * refine readme (#978) * cli support for positional arguments model (#979) Signed-off-by: n1ck-guo <heng.guo@intel.com> * update bits (#986) Signed-off-by: He, Xin3 <xin3.he@intel.com> * fix guff scheme and device_map bug (#969) * add support for Magistral-Small (#980) * support model_dtype and fix bug of scheme contains quotes, mllm eval (#985) --------- Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Signed-off-by: n1ck-guo <heng.guo@intel.com> Signed-off-by: He, Xin3 <xin3.he@intel.com> Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> Signed-off-by: yiliu30 <yi4.liu@intel.com> Signed-off-by: root <root@clx5673.ra.intel.com> Signed-off-by: Mengni Wang <mengni.wang@intel.com> Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> Co-authored-by: Tang Kaihui <kaihui.tang@intel.com> Co-authored-by: Heng Guo <heng.guo@intel.com> Co-authored-by: Xin He <xin3.he@intel.com> Co-authored-by: Yi Liu <yi4.liu@intel.com> Co-authored-by: Weiwei <weiwei1.zhang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Wenhua Cheng <wenhua.cheng@intel.com> Co-authored-by: root <root@clx5673.ra.intel.com> Co-authored-by: Wang, Mengni <mengni.wang@intel.com> Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com>

enable dynamic quantization config saving

2b1577b

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

WeiweiZhang1 added the WIP label Sep 16, 2025

[pre-commit.ci] auto fixes from pre-commit.com hooks

56a2218

for more information, see https://pre-commit.ci

wenhuach21 reviewed Sep 16, 2025

View reviewed changes

auto_round/autoround.py Outdated Show resolved Hide resolved

wenhuach21 changed the title ~~enable dynamic quantization config saving~~ enable regex quantization config saving Sep 16, 2025

wenhuach21 changed the title ~~enable regex quantization config saving~~ enable regex quantization config saving for mixed bits Sep 16, 2025

WeiweiZhang1 and others added 2 commits September 16, 2025 16:02

fixtypo

db99785

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

81e8086

for more information, see https://pre-commit.ci

WeiweiZhang1 and others added 9 commits September 24, 2025 09:46

Merge branch 'main' into enable_dynamic_quantization_config_saving

d5b9a46

[pre-commit.ci] auto fixes from pre-commit.com hooks

4e58090

for more information, see https://pre-commit.ci

rebase code, refine config saving

c75ebdc

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

Merge branch 'main' into enable_dynamic_quantization_config_saving

ae20df7

[pre-commit.ci] auto fixes from pre-commit.com hooks

21ff4b9

for more information, see https://pre-commit.ci

refine ut

b97f3fc

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

Merge branch 'enable_dynamic_quantization_config_saving' of https://g…

be7af05

…ithub.com/intel/auto-round into enable_dynamic_quantization_config_saving

fix UT

b91bf20

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

cd5c693

for more information, see https://pre-commit.ci

wenhuach21 reviewed Sep 24, 2025

View reviewed changes

auto_round/export/export_to_awq/export.py Outdated Show resolved Hide resolved

wenhuach21 reviewed Sep 24, 2025

View reviewed changes

test/test_cuda/test_mix_bits.py Outdated Show resolved Hide resolved

wenhuach21 reviewed Sep 24, 2025

View reviewed changes

test/test_cuda/test_mix_bits.py Show resolved Hide resolved

wenhuach21 mentioned this pull request Oct 16, 2025

Qwen/Qwen3-VL-30B-A3B-Instruct GPTQ #902

Open

WeiweiZhang1 added 2 commits October 16, 2025 23:05

Merge branch 'main' into enable_dynamic_quantization_config_saving

98486d4

enable hf loading for regex, add UTs

2d45996

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

WeiweiZhang1 removed the WIP label Oct 16, 2025

[pre-commit.ci] auto fixes from pre-commit.com hooks

a8bdf81

for more information, see https://pre-commit.ci

WeiweiZhang1 requested a review from wenhuach21 October 17, 2025 02:12

WeiweiZhang1 mentioned this pull request Oct 21, 2025

Enhance mixed-bit exporting for GPTQ format #773

Closed

wenhuach21 approved these changes Oct 22, 2025

View reviewed changes

WeiweiZhang1 and others added 3 commits October 22, 2025 21:53

Merge branch 'main' into enable_dynamic_quantization_config_saving

abc4fd0

refine export, enhance gptq UT

8a668b2

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

8085d1f

for more information, see https://pre-commit.ci

WeiweiZhang1 merged commit 873114a into main Oct 23, 2025
14 checks passed

WeiweiZhang1 deleted the enable_dynamic_quantization_config_saving branch October 23, 2025 01:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

enable regex quantization config saving for mixed bits #825

enable regex quantization config saving for mixed bits #825

Uh oh!

WeiweiZhang1 commented Sep 16, 2025

Uh oh!

Uh oh!

wenhuach21 commented Sep 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wenhuach21 commented Oct 17, 2025

Uh oh!

WeiweiZhang1 commented Oct 17, 2025

Uh oh!

wenhuach21 commented Oct 22, 2025

Uh oh!

wenhuach21 commented Oct 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

enable regex quantization config saving for mixed bits #825

enable regex quantization config saving for mixed bits #825

Uh oh!

Conversation

WeiweiZhang1 commented Sep 16, 2025

Uh oh!

Uh oh!

wenhuach21 commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wenhuach21 commented Oct 17, 2025

Uh oh!

WeiweiZhang1 commented Oct 17, 2025

Uh oh!

wenhuach21 commented Oct 22, 2025

Uh oh!

wenhuach21 commented Oct 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wenhuach21 commented Sep 16, 2025 •

edited

Loading