Skip to content

Conversation

lkk12014402
Copy link
Contributor

@lkk12014402 lkk12014402 commented Sep 25, 2025

Type of Change

add mxfp8 QAT code

Description

  1. support mxfp8 forward and bf16 backward for QAT
  2. export quantized hf model to llm-compressor, which can be deployed using vllm with mxfp8
  3. add a simple qat example for testing currently
  4. leverage some functions from AutoRound

Copy link
Contributor

@yiliu30 yiliu30 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return module


def replace_with_quant_linear(model, quant_cfg=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this function used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yiliu30 yiliu30 requested a review from Copilot September 25, 2025 07:48
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds MxFP8 Quantization-Aware Training (QAT) support to Neural Compressor, enabling training with mxfp8 forward pass and bf16 backward pass, along with model export capabilities for VLLM deployment.

Key changes include:

  • Implementation of MxFP8 QAT infrastructure with tensor quantizers and quantized linear layers
  • Export functionality to convert quantized models to llm-compressor format for VLLM deployment
  • Integration with AutoRound library for quantization functions

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
neural_compressor/torch/export/export_hf.py Adds HuggingFace model export functionality for quantized models
neural_compressor/torch/algorithms/qat/tensor_quantizer.py Implements core tensor quantization module with MxFP8 support
neural_compressor/torch/algorithms/qat/quant_utils.py Provides utility functions for quantization configuration and model conversion
neural_compressor/torch/algorithms/qat/quant_linear.py Implements quantized linear layer with weight/input/output quantization
neural_compressor/torch/algorithms/qat/init.py Package initialization file for QAT module

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Contributor

@yiliu30 yiliu30 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a simple end-to-end unit test to demonstrate the usage. Other submodules look good to me.

@chensuyue chensuyue added this to the 3.6 milestone Sep 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants