Reduce AutoSchem VRAM usage by up to 10X #944

wenhuach21 · 2025-10-24T09:12:06Z

move AutoScheme class to autoscheme folder
refine important hyperparameters in homepage

for more information, see https://pre-commit.ci

wenhuach21 · 2025-10-27T07:06:22Z

not ready, just try ut

for more information, see https://pre-commit.ci

auto_round/utils_bk/device.py

for more information, see https://pre-commit.ci

…into opt_auto_scheme

for more information, see https://pre-commit.ci

Copilot

Pull Request Overview

This pull request reduces AutoScheme VRAM usage by up to 10X through optimization of memory management during quantization scheme generation. The changes introduce a low_gpu_mem_usage flag that enables memory-efficient processing at the cost of additional time, and refactors device management code into dedicated utility functions.

Key Changes:

Added low_gpu_mem_usage parameter to AutoScheme configuration with default value True
Refactored device mapping and memory management logic from BaseCompressor to separate utility functions
Moved AdamCompressor class to its own module for better code organization

Reviewed Changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
auto_round/schemes.py	Added `batch_size` and `low_gpu_mem_usage` parameters to AutoScheme dataclass
auto_round/utils/device.py	Added extensive device management utilities including `get_major_device`, `set_auto_device_map_for_block_with_tuning`, and `set_non_auto_device_map`
auto_round/compressors/base.py	Refactored device mapping logic to use new utility functions and updated to support `low_gpu_mem_usage`
auto_round/compressors/adam.py	Created new file extracting AdamCompressor class from base.py
auto_round/wrapper.py	Updated to conditionally initialize weight min/max only when round tuning is enabled
docs/step_by_step.md	Updated documentation with new hyperparameters and VRAM cost tables
test/test_cuda/test_auto_scheme.py	Added new test cases for multi-card scenarios and non-low GPU memory usage

Comments suppressed due to low confidence (1)

docs/step_by_step.md:1

Inconsistent formatting: The colon : after parameter types has been removed in lines 309-313, but is retained in lines 315 and 317. For consistency, either add colons after all parameter types or remove them from all entries.

Step-by-Step

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

auto_round/utils/device.py

auto_round/compressors/base.py

auto_round/utils/device.py

auto_round/wrapper.py

auto_round/utils/device.py

docs/step_by_step.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

for more information, see https://pre-commit.ci

wenhuach21 · 2025-10-29T01:57:25Z

merge first, will refine the code in the following pr

* Fix rtn tuning_device issue (#893) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * fix vlm gguf ut (#895) Signed-off-by: n1ck-guo <heng.guo@intel.com> * update alg_ext.abi3.so with python compatible version (#894) * move ste from quant to round for nvfp4 (#889) Signed-off-by: He, Xin3 <xin3.he@intel.com> * Add GPT-OSS quant support (#887) * better help printing information (#883) * better help printing information Signed-off-by: n1ck-guo <heng.guo@intel.com> * speedup quant and evaluation, fix recompile issue (#897) * rewrite the implementation for ease-of-maintain Signed-off-by: He, Xin3 <xin3.he@intel.com> * fix bug Signed-off-by: He, Xin3 <xin3.he@intel.com> * fix quant performance Signed-off-by: He, Xin3 <xin3.he@intel.com> * Update auto_round/compressors/base.py --------- Signed-off-by: He, Xin3 <xin3.he@intel.com> * fix nvfp act quantization bug (#891) * fix nvfp act quantization bug Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * add cuda ut for moe nvfp quantize Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * add cpu UT, refine cuda UT Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix ut typo Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix cpu ut Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * enhance experts amax match, refine UT Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * support automatic mixed bits assignment (#851) * try to fix gguf issue (#886) * remove numba from requirments (#905) Signed-off-by: yiliu30 <yi4.liu@intel.com> * Extend mxfp loading dtypes (#907) * block dataset logger info (#908) Signed-off-by: n1ck-guo <heng.guo@intel.com> * fix torch compile issue in AutoScheme (#909) * Revert "Extend mxfp loading dtypes (#907)" (#915) This reverts commit 0c2619c. * support disable_opt_rtn in auto-scheme (#913) * fix llama 4 ut (#896) * fix ut of llama 4 Signed-off-by: n1ck-guo <heng.guo@intel.com> * add numba for cpu lib (#919) Signed-off-by: yiliu30 <yi4.liu@intel.com> * Loosen the packing restrictions for mxfp&nvfp (#911) * Loosen the packing restrictions for mxfp&nvfp, enable Qwen1.5-MoE-A2.7B quantize Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix UT Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refine mxfp&nvfp layer checker Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * fix pylint Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Extend mxfp loading dtypes (#916) Signed-off-by: root <root@clx5673.ra.intel.com> Signed-off-by: yiliu30 <yi4.liu@intel.com> Co-authored-by: root <root@clx5673.ra.intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix act config exporting for mixed schemes (#903) * fp8 exporting bugfix Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * fix act related config saving Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add ut for act_config check Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refine extra_config saving, add UTs Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * fix ut typo Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * fix ut typo Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * fixtypo Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix CI Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * fix scan issue Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * fix scan issue Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * rm global variable Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rerun ut Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refine ut Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * optimize rtn for int woq (#924) * fix bug of gguf and support for LiquidAI/LFM2-1.2B (#927) Signed-off-by: n1ck-guo <heng.guo@intel.com> * remove numpy<2.0 limitation (#921) * enable regex quantization config saving for mixed bits (#825) * enable dynamic quantization config saving Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixtypo Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rebase code, refine config saving Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refine ut Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * fix UT Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * enable hf loading for regex, add UTs Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refine export, enhance gptq UT Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix Flux tuning issue (#936) Signed-off-by: Mengni Wang <mengni.wang@intel.com> * gguf support for inclusionAI/Ling-flash-2.0 (#940) * remove low_cpu_mem (#934) * Add compatibility test (#918) * Add commit hash to version (#941) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * gguf weight type align with original, output.weight, token_embed (#900) * support attention mask in user's dataset (#930) * Add diffusion README (#923) * update readme (#949) * refactor utils file (#943) * refact utils Signed-off-by: n1ck-guo <heng.guo@intel.com> * update readme for sglang support (#953) * update readme for sglang support Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * refine doc Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * Update README.md --------- Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> Co-authored-by: Wenhua Cheng <wenhua.cheng@intel.com> * update gguf and support for CompressedLinear (#950) * Reduce AutoSchem VRAM usage by up to 10X (#944) * add self attribution and fix avg_bits error (#956) * add self attribution and fix avg_bits error --------- Signed-off-by: He, Xin3 <xin3.he@intel.com> Co-authored-by: Wenhua Cheng <wenhua.cheng@intel.com> * add logo (#960) * refine AutoScheme readme/code (#958) * update readme (#962) * fix critic disable_opt_rtn regression (#963) * [1/N] Initial vllm-ext evaluation support (MXFP4 MOE) (#935) Signed-off-by: yiliu30 <yi4.liu@intel.com> * fix bug of imatrix contains 0 (#955) * fix rtn bug (#966) * enhance flux doc (#967) * clean code (#968) * support for model scope (#957) * support for model scope Signed-off-by: n1ck-guo <heng.guo@intel.com> * merge main branch to alg_ext (#970) * fix cuda CI backend issue, fixtypo (#974) * disable compile packing by default (#975) Signed-off-by: yiliu30 <yi4.liu@intel.com> * enhance auto device map and support XPU (#961) * enhance auto device map and support XPU --------- Signed-off-by: He, Xin3 <xin3.he@intel.com> * refine readme (#978) * cli support for positional arguments model (#979) Signed-off-by: n1ck-guo <heng.guo@intel.com> * update bits (#986) Signed-off-by: He, Xin3 <xin3.he@intel.com> * fix guff scheme and device_map bug (#969) * add support for Magistral-Small (#980) * support model_dtype and fix bug of scheme contains quotes, mllm eval (#985) --------- Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Signed-off-by: n1ck-guo <heng.guo@intel.com> Signed-off-by: He, Xin3 <xin3.he@intel.com> Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> Signed-off-by: yiliu30 <yi4.liu@intel.com> Signed-off-by: root <root@clx5673.ra.intel.com> Signed-off-by: Mengni Wang <mengni.wang@intel.com> Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> Co-authored-by: Tang Kaihui <kaihui.tang@intel.com> Co-authored-by: Heng Guo <heng.guo@intel.com> Co-authored-by: Xin He <xin3.he@intel.com> Co-authored-by: Yi Liu <yi4.liu@intel.com> Co-authored-by: Weiwei <weiwei1.zhang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Wenhua Cheng <wenhua.cheng@intel.com> Co-authored-by: root <root@clx5673.ra.intel.com> Co-authored-by: Wang, Mengni <mengni.wang@intel.com> Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com>

update

aede5db

wenhuach21 marked this pull request as draft October 24, 2025 09:12

pre-commit-ci bot and others added 3 commits October 24, 2025 09:14

[pre-commit.ci] auto fixes from pre-commit.com hooks

4b5630f

for more information, see https://pre-commit.ci

Merge branch 'main' into opt_auto_scheme

9c26501

refine device_map code

272e9ea

wenhuach21 marked this pull request as ready for review October 27, 2025 07:05

pre-commit-ci bot and others added 12 commits October 27, 2025 07:07

[pre-commit.ci] auto fixes from pre-commit.com hooks

2e46f39

for more information, see https://pre-commit.ci

refine device_map code

d26bc74

refine device_map code

88cc0e0

[pre-commit.ci] auto fixes from pre-commit.com hooks

04697c5

for more information, see https://pre-commit.ci

fix adam issue

3d73d47

[pre-commit.ci] auto fixes from pre-commit.com hooks

3d9dbe8

for more information, see https://pre-commit.ci

tiny change

b261a21

update

b6dd6fe

[pre-commit.ci] auto fixes from pre-commit.com hooks

06beee3

for more information, see https://pre-commit.ci

try to fix preci

0a2bd4b

try to fix preci

b908a72

[pre-commit.ci] auto fixes from pre-commit.com hooks

7cfc72b

for more information, see https://pre-commit.ci

wenhuach21 changed the title ~~Reduce AutoSchem VRAM usage by 20X~~ [WIP]Reduce AutoSchem VRAM usage by 20X Oct 27, 2025

wenhuach21 commented Oct 27, 2025

View reviewed changes

auto_round/utils_bk/device.py Outdated Show resolved Hide resolved

wenhuach21 and others added 9 commits October 27, 2025 20:00

trigger ut

8dbd3b6

Merge branch 'main' into opt_auto_scheme

ba4713a

[pre-commit.ci] auto fixes from pre-commit.com hooks

7c17f29

for more information, see https://pre-commit.ci

fix merge issue

c6396a9

merge utils

abf94a4

[pre-commit.ci] auto fixes from pre-commit.com hooks

7015847

for more information, see https://pre-commit.ci

fix import issues

9bdedae

Merge branch 'opt_auto_scheme' of https://github.com/intel/auto-round …

38c5b4b

…into opt_auto_scheme

[pre-commit.ci] auto fixes from pre-commit.com hooks

55fcd51

for more information, see https://pre-commit.ci

pre-commit-ci bot and others added 6 commits October 28, 2025 07:42

[pre-commit.ci] auto fixes from pre-commit.com hooks

5057b5e

for more information, see https://pre-commit.ci

fix issues

dcb7ff2

fix bug

c91bbae

fix bug

376b116

[pre-commit.ci] auto fixes from pre-commit.com hooks

8e0bfa4

for more information, see https://pre-commit.ci

update

c5223a7

wenhuach21 changed the title ~~[WIP]Reduce AutoSchem VRAM usage by 20X~~ Reduce AutoSchem VRAM usage by up to 10X Oct 28, 2025

update

3ab18ff

wenhuach21 requested review from WeiweiZhang1, Copilot, n1ck-guo and yiliu30 and removed request for WeiweiZhang1 October 28, 2025 13:21

wenhuach21 added 2 commits October 28, 2025 21:22

clean code

9e67c15

Merge branch 'main' into opt_auto_scheme

82fee81

mengniwang95 approved these changes Oct 28, 2025

View reviewed changes

Copilot AI reviewed Oct 28, 2025

View reviewed changes

wenhuach21 mentioned this pull request Oct 28, 2025

Optimize RAM usage of AutoScheme #912

Open

wenhuach21 and others added 6 commits October 28, 2025 21:27

Update auto_round/utils/device.py

3fe8d08

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update auto_round/utils/device.py

cb560f3

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update docs/step_by_step.md

8987254

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

c3c0823

for more information, see https://pre-commit.ci

fix line too long issue

bde5f20

fix ut

035c046

wenhuach21 merged commit 90c2fb4 into main Oct 29, 2025
23 checks passed

wenhuach21 deleted the opt_auto_scheme branch October 29, 2025 01:57

xin3he mentioned this pull request Nov 4, 2025

The main branch is slower than expected #983

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce AutoSchem VRAM usage by up to 10X #944

Reduce AutoSchem VRAM usage by up to 10X #944

Uh oh!

wenhuach21 commented Oct 24, 2025 •

edited

Loading

Uh oh!

wenhuach21 commented Oct 27, 2025

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wenhuach21 commented Oct 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Reduce AutoSchem VRAM usage by up to 10X #944

Reduce AutoSchem VRAM usage by up to 10X #944

Uh oh!

Conversation

wenhuach21 commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wenhuach21 commented Oct 27, 2025

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wenhuach21 commented Oct 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wenhuach21 commented Oct 24, 2025 •

edited

Loading