-
Notifications
You must be signed in to change notification settings - Fork 282
add mxfp8 qat #2299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
add mxfp8 qat #2299
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add UTs to https://github.com/intel/neural-compressor/tree/master/test/3x/torch
return module | ||
|
||
|
||
def replace_with_quant_linear(model, quant_cfg=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this function used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds MxFP8 Quantization-Aware Training (QAT) support to Neural Compressor, enabling training with mxfp8 forward pass and bf16 backward pass, along with model export capabilities for VLLM deployment.
Key changes include:
- Implementation of MxFP8 QAT infrastructure with tensor quantizers and quantized linear layers
- Export functionality to convert quantized models to llm-compressor format for VLLM deployment
- Integration with AutoRound library for quantization functions
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.
Show a summary per file
File | Description |
---|---|
neural_compressor/torch/export/export_hf.py | Adds HuggingFace model export functionality for quantized models |
neural_compressor/torch/algorithms/qat/tensor_quantizer.py | Implements core tensor quantization module with MxFP8 support |
neural_compressor/torch/algorithms/qat/quant_utils.py | Provides utility functions for quantization configuration and model conversion |
neural_compressor/torch/algorithms/qat/quant_linear.py | Implements quantized linear layer with weight/input/output quantization |
neural_compressor/torch/algorithms/qat/init.py | Package initialization file for QAT module |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a simple end-to-end unit test to demonstrate the usage. Other submodules look good to me.
Type of Change
add mxfp8 QAT code
Description