Skip to content

Conversation

@wenhuach21
Copy link
Contributor

@wenhuach21 wenhuach21 commented Oct 24, 2025

  • move AutoScheme class to autoscheme folder
  • refine important hyperparameters in homepage

@wenhuach21 wenhuach21 marked this pull request as draft October 24, 2025 09:12
@wenhuach21 wenhuach21 marked this pull request as ready for review October 27, 2025 07:05
@wenhuach21
Copy link
Contributor Author

not ready, just try ut

@wenhuach21 wenhuach21 changed the title Reduce AutoSchem VRAM usage by 20X [WIP]Reduce AutoSchem VRAM usage by 20X Oct 27, 2025
@wenhuach21 wenhuach21 changed the title [WIP]Reduce AutoSchem VRAM usage by 20X Reduce AutoSchem VRAM usage by up to 10X Oct 28, 2025
@wenhuach21 wenhuach21 requested review from WeiweiZhang1, Copilot, n1ck-guo and yiliu30 and removed request for WeiweiZhang1 October 28, 2025 13:21
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request reduces AutoScheme VRAM usage by up to 10X through optimization of memory management during quantization scheme generation. The changes introduce a low_gpu_mem_usage flag that enables memory-efficient processing at the cost of additional time, and refactors device management code into dedicated utility functions.

Key Changes:

  • Added low_gpu_mem_usage parameter to AutoScheme configuration with default value True
  • Refactored device mapping and memory management logic from BaseCompressor to separate utility functions
  • Moved AdamCompressor class to its own module for better code organization

Reviewed Changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
auto_round/schemes.py Added batch_size and low_gpu_mem_usage parameters to AutoScheme dataclass
auto_round/utils/device.py Added extensive device management utilities including get_major_device, set_auto_device_map_for_block_with_tuning, and set_non_auto_device_map
auto_round/compressors/base.py Refactored device mapping logic to use new utility functions and updated to support low_gpu_mem_usage
auto_round/compressors/adam.py Created new file extracting AdamCompressor class from base.py
auto_round/wrapper.py Updated to conditionally initialize weight min/max only when round tuning is enabled
docs/step_by_step.md Updated documentation with new hyperparameters and VRAM cost tables
test/test_cuda/test_auto_scheme.py Added new test cases for multi-card scenarios and non-low GPU memory usage
Comments suppressed due to low confidence (1)

docs/step_by_step.md:1

  • Inconsistent formatting: The colon : after parameter types has been removed in lines 309-313, but is retained in lines 315 and 317. For consistency, either add colons after all parameter types or remove them from all entries.
Step-by-Step

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

wenhuach21 and others added 6 commits October 28, 2025 21:27
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@wenhuach21
Copy link
Contributor Author

merge first, will refine the code in the following pr

@wenhuach21 wenhuach21 merged commit 90c2fb4 into main Oct 29, 2025
23 checks passed
@wenhuach21 wenhuach21 deleted the opt_auto_scheme branch October 29, 2025 01:57
chensuyue added a commit that referenced this pull request Nov 11, 2025
* Fix rtn tuning_device issue (#893)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* fix vlm gguf ut (#895)

Signed-off-by: n1ck-guo <heng.guo@intel.com>

* update alg_ext.abi3.so with python compatible version (#894)

* move ste from quant to round for nvfp4 (#889)

Signed-off-by: He, Xin3 <xin3.he@intel.com>

* Add GPT-OSS quant support (#887)

* better help printing information (#883)

* better help printing information

Signed-off-by: n1ck-guo <heng.guo@intel.com>

* speedup quant and evaluation, fix recompile issue (#897)

* rewrite the implementation for ease-of-maintain

Signed-off-by: He, Xin3 <xin3.he@intel.com>

* fix bug

Signed-off-by: He, Xin3 <xin3.he@intel.com>

* fix quant performance

Signed-off-by: He, Xin3 <xin3.he@intel.com>

* Update auto_round/compressors/base.py

---------

Signed-off-by: He, Xin3 <xin3.he@intel.com>

* fix nvfp act quantization bug (#891)

* fix nvfp act quantization bug

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* add cuda ut for moe nvfp quantize

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* add cpu UT, refine cuda UT

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix ut typo

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix cpu ut

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* enhance experts amax match, refine UT

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* support automatic mixed bits assignment (#851)

* try to fix gguf issue (#886)

* remove numba from requirments (#905)

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* Extend mxfp loading dtypes (#907)

* block dataset logger info (#908)

Signed-off-by: n1ck-guo <heng.guo@intel.com>

* fix torch compile issue in AutoScheme (#909)

* Revert "Extend mxfp loading dtypes (#907)" (#915)

This reverts commit 0c2619c.

* support disable_opt_rtn in auto-scheme (#913)

* fix llama 4 ut (#896)

* fix ut of llama 4

Signed-off-by: n1ck-guo <heng.guo@intel.com>

* add numba for cpu lib (#919)

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* Loosen the packing restrictions for mxfp&nvfp (#911)

* Loosen the packing restrictions for mxfp&nvfp, enable Qwen1.5-MoE-A2.7B quantize

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix UT

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refine mxfp&nvfp layer checker

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* fix pylint

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Extend mxfp loading dtypes (#916)

Signed-off-by: root <root@clx5673.ra.intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Co-authored-by: root <root@clx5673.ra.intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix act config exporting for mixed schemes (#903)

* fp8 exporting bugfix

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* fix act related config saving

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add ut for act_config check

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refine extra_config saving, add UTs

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* fix ut typo

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* fix ut typo

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* fixtypo

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix CI

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* fix scan issue

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* fix scan issue

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* rm global variable

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rerun ut

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refine ut

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* optimize rtn for int woq (#924)

* fix bug of gguf and support for LiquidAI/LFM2-1.2B (#927)

Signed-off-by: n1ck-guo <heng.guo@intel.com>

* remove numpy<2.0 limitation (#921)

* enable regex quantization config saving for mixed bits (#825)

* enable dynamic quantization config saving

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixtypo

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rebase code, refine config saving

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refine ut

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* fix UT

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable hf loading for regex, add UTs

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refine export, enhance gptq UT

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix Flux tuning issue (#936)

Signed-off-by: Mengni Wang <mengni.wang@intel.com>

* gguf support for inclusionAI/Ling-flash-2.0 (#940)

* remove low_cpu_mem (#934)

* Add compatibility test (#918)

* Add commit hash to version (#941)

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

* gguf weight type align with original, output.weight, token_embed (#900)

* support attention mask in user's dataset (#930)

* Add diffusion README (#923)

* update readme (#949)

* refactor utils file (#943)

* refact utils

Signed-off-by: n1ck-guo <heng.guo@intel.com>

* update readme for sglang support (#953)

* update readme for sglang support

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* refine doc

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* Update README.md

---------

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
Co-authored-by: Wenhua Cheng <wenhua.cheng@intel.com>

* update gguf and support for CompressedLinear (#950)

* Reduce AutoSchem VRAM usage by up to 10X (#944)

* add self attribution and fix avg_bits error (#956)

* add self attribution and fix avg_bits error
---------

Signed-off-by: He, Xin3 <xin3.he@intel.com>
Co-authored-by: Wenhua Cheng <wenhua.cheng@intel.com>

* add logo (#960)

* refine AutoScheme readme/code (#958)

* update readme (#962)

* fix critic disable_opt_rtn regression (#963)

* [1/N] Initial vllm-ext evaluation support (MXFP4 MOE) (#935)

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* fix bug of imatrix contains 0 (#955)

* fix rtn bug (#966)

* enhance flux doc (#967)

* clean code (#968)

* support for model scope  (#957)

* support for model scope

Signed-off-by: n1ck-guo <heng.guo@intel.com>

* merge main branch to alg_ext (#970)

* fix cuda CI backend issue, fixtypo (#974)

* disable compile packing by default (#975)

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* enhance auto device map and support XPU  (#961)

* enhance auto device map and support XPU
---------

Signed-off-by: He, Xin3 <xin3.he@intel.com>

* refine readme (#978)

* cli support for positional arguments model (#979)

Signed-off-by: n1ck-guo <heng.guo@intel.com>

* update bits (#986)

Signed-off-by: He, Xin3 <xin3.he@intel.com>

* fix guff scheme and device_map bug (#969)

* add support for Magistral-Small (#980)

* support model_dtype and fix bug of scheme contains quotes, mllm eval (#985)

---------

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: He, Xin3 <xin3.he@intel.com>
Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: root <root@clx5673.ra.intel.com>
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Co-authored-by: Tang Kaihui <kaihui.tang@intel.com>
Co-authored-by: Heng Guo <heng.guo@intel.com>
Co-authored-by: Xin He <xin3.he@intel.com>
Co-authored-by: Yi Liu <yi4.liu@intel.com>
Co-authored-by: Weiwei <weiwei1.zhang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Wenhua Cheng <wenhua.cheng@intel.com>
Co-authored-by: root <root@clx5673.ra.intel.com>
Co-authored-by: Wang, Mengni <mengni.wang@intel.com>
Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants