add mxfp8 qat #2299

lkk12014402 · 2025-09-25T07:21:26Z

Type of Change

add mxfp8 QAT code

Description

support mxfp8 forward and bf16 backward for QAT
export quantized hf model to llm-compressor, which can be deployed using vllm with mxfp8
add a simple qat example for testing currently
leverage some functions from AutoRound

for more information, see https://pre-commit.ci

yiliu30

Please add UTs to https://github.com/intel/neural-compressor/tree/master/test/3x/torch

neural_compressor/torch/algorithms/qat/tensor_quantizer.py

yiliu30 · 2025-09-25T07:41:40Z

neural_compressor/torch/algorithms/qat/quant_utils.py

+    return module
+
+
+def replace_with_quant_linear(model, quant_cfg=None):


Where is this function used?

please check the usages in the https://github.com/intel/neural-compressor/pull/2299/files#diff-c587158e3584c62ab2d6cf97658da96ec6c88a1621aef0cd330ee16cfaa2da1aR190

neural_compressor/torch/algorithms/qat/__init__.py

neural_compressor/torch/algorithms/qat/tensor_quantizer.py

neural_compressor/torch/export/export_hf.py

Copilot

Pull Request Overview

This PR adds MxFP8 Quantization-Aware Training (QAT) support to Neural Compressor, enabling training with mxfp8 forward pass and bf16 backward pass, along with model export capabilities for VLLM deployment.

Key changes include:

Implementation of MxFP8 QAT infrastructure with tensor quantizers and quantized linear layers
Export functionality to convert quantized models to llm-compressor format for VLLM deployment
Integration with AutoRound library for quantization functions

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
neural_compressor/torch/export/export_hf.py	Adds HuggingFace model export functionality for quantized models
neural_compressor/torch/algorithms/qat/tensor_quantizer.py	Implements core tensor quantization module with MxFP8 support
neural_compressor/torch/algorithms/qat/quant_utils.py	Provides utility functions for quantization configuration and model conversion
neural_compressor/torch/algorithms/qat/quant_linear.py	Implements quantized linear layer with weight/input/output quantization
neural_compressor/torch/algorithms/qat/init.py	Package initialization file for QAT module

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

neural_compressor/torch/algorithms/qat/quant_linear.py

neural_compressor/torch/export/export_hf.py

neural_compressor/torch/algorithms/qat/quant_utils.py

neural_compressor/torch/export/export_hf.py

neural_compressor/torch/algorithms/qat/tensor_quantizer.py

neural_compressor/torch/export/export_hf.py

for more information, see https://pre-commit.ci

yiliu30

Please add a simple end-to-end unit test to demonstrate the usage. Other submodules look good to me.

lkk12014402 and others added 2 commits September 25, 2025 07:16

add mxfp8 qat code, mxfp8fwd-bf16bwd.

b34fb32

[pre-commit.ci] auto fixes from pre-commit.com hooks

7f99561

for more information, see https://pre-commit.ci

yiliu30 reviewed Sep 25, 2025

View reviewed changes

yiliu30 requested a review from Copilot September 25, 2025 07:48

Copilot AI reviewed Sep 25, 2025

View reviewed changes

lkk12014402 and others added 9 commits September 25, 2025 08:56

fix comments.

b6d74ae

[pre-commit.ci] auto fixes from pre-commit.com hooks

c9a0026

for more information, see https://pre-commit.ci

fix code style.

1651d71

add unit tests.

fcf4b86

[pre-commit.ci] auto fixes from pre-commit.com hooks

089c247

for more information, see https://pre-commit.ci

update prepare_qat entry.

6c0621d

[pre-commit.ci] auto fixes from pre-commit.com hooks

a1f8c3a

for more information, see https://pre-commit.ci

update prepare_qat code style to align with torchao.

fbe0918

[pre-commit.ci] auto fixes from pre-commit.com hooks

4d7508f

for more information, see https://pre-commit.ci

yiliu30 reviewed Sep 26, 2025

View reviewed changes

chensuyue added this to the 3.6 milestone Sep 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add mxfp8 qat #2299

add mxfp8 qat #2299

lkk12014402 commented Sep 25, 2025 •

edited

Loading

Uh oh!

yiliu30 left a comment

Uh oh!

Uh oh!

yiliu30 Sep 25, 2025

Uh oh!

lkk12014402 Sep 25, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yiliu30 left a comment

Uh oh!

Uh oh!

		return module


		def replace_with_quant_linear(model, quant_cfg=None):

add mxfp8 qat #2299

Are you sure you want to change the base?

add mxfp8 qat #2299

Conversation

lkk12014402 commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Type of Change

Description

Uh oh!

yiliu30 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yiliu30 Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

lkk12014402 Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yiliu30 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lkk12014402 commented Sep 25, 2025 •

edited

Loading