-
Notifications
You must be signed in to change notification settings - Fork 59
add support for scheme FP8_STATIC to export llm_compressor format #816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: n1ck-guo <heng.guo@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for the FP8_STATIC quantization scheme to export models in the llm_compressor format. The change enables static FP8 weight and activation quantization with specific configurations for compressed-tensors compatibility.
Key Changes
- Adds FP8_STATIC scheme detection and format conversion to llm_compressor
- Implements static FP8 quantization export with compressed-tensors configuration
- Consolidates common save functionality across export modules
Reviewed Changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| auto_round/utils.py | Modified is_static_wfp8afp8 to accept string parameters for format detection |
| auto_round/export/utils.py | Added shared save function to reduce code duplication across export modules |
| auto_round/export/export_to_llmcompressor/export_to_static_fp.py | New module implementing FP8_STATIC export with compressed-tensors configuration |
| auto_round/export/export_to_llmcompressor/export.py | Added FP8_STATIC support to the main export dispatcher |
| auto_round/autoround.py | Added FP8_STATIC format detection and validation logic |
| auto_round/export/export_to_awq/export.py | Refactored to use shared save function |
| auto_round/export/export_to_autoround/export_to_fp8.py | Renamed class and refactored to use shared save function |
| auto_round/export/export_to_autoround/export.py | Refactored to use shared save function |
| auto_round/export/export_to_autogptq/export.py | Refactored to use shared save function |
| test/test_cpu/test_llmcompressor.py | Added test case for FP8_STATIC export validation |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
auto_round/export/export_to_llmcompressor/export_to_static_fp.py
Outdated
Show resolved
Hide resolved
auto_round/export/export_to_llmcompressor/export_to_static_fp.py
Outdated
Show resolved
Hide resolved
auto_round/export/export_to_llmcompressor/export_to_static_fp.py
Outdated
Show resolved
Hide resolved
for more information, see https://pre-commit.ci
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
Support it in the AutoRound format as well, and add nvfp4/fp8_static support on the vLLM side later. |
for more information, see https://pre-commit.ci
Signed-off-by: n1ck-guo <heng.guo@intel.com>
auto_round/export/export_to_llmcompressor/export_to_static_fp.py
Outdated
Show resolved
Hide resolved
auto_round/export/export_to_llmcompressor/export_to_static_fp.py
Outdated
Show resolved
Hide resolved
auto_round/export/export_to_llmcompressor/export_to_static_fp.py
Outdated
Show resolved
Hide resolved
Signed-off-by: yiliu30 <yi4.liu@intel.com>
This reverts commit 038ff1d.
Uh oh!
There was an error while loading. Please reload this page.