Skip to content

Conversation

@yiliu30
Copy link
Contributor

@yiliu30 yiliu30 commented Sep 18, 2025

  • Support loading the quantized MXFP4/MXFP8 via transformers.from_pretrained` API.
  • Other small fixes

Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
@yiliu30 yiliu30 requested a review from Copilot September 18, 2025 11:14
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for MXFP4 and MXFP8 quantization formats, including new quantization modules, backend configurations, and test coverage. The implementation introduces specialized quantized linear layers for MXFP4/8 formats with proper loading and inference support.

Key Changes:

  • Adds new MXFP4QuantLinear and MXFP8QuantLinear modules with specialized quantization/dequantization logic
  • Updates backend infrastructure to support MXFP4/8 schemes with proper device compatibility checks
  • Implements FP4 utility functions for packing/unpacking operations

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
auto_round/experimental/qmodules/mx.py New quantized linear modules for MXFP4/8 with weight initialization and dequantization
auto_round/experimental/qmodules/init.py Exports the new MXFP quantization modules
auto_round/data_type/fp4_utils.py FP4 unpacking utilities for converting uint8 to fp4 values
auto_round/inference/backend.py Backend configuration and scheme checking for MXFP4/8 support
auto_round/inference/convert_model.py Model conversion logic updates for MXFP format handling
auto_round/schemes.py Adds act_sym parameter to MXFP4/8 quantization schemes
auto_round/export/export_to_autoround/qlinear_fp.py Updates weight buffer initialization for MXFP quantization
auto_round/export/export_to_autoround/export.py Renames FP8_STATIC format enum and updates references
auto_round/autoround.py Updates format references for consistency
test/test_cuda/test_mx_quant.py End-to-end test for MXFP4/8 quantization and inference

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@yiliu30 yiliu30 marked this pull request as ready for review September 18, 2025 11:18
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
@yiliu30 yiliu30 requested a review from n1ck-guo September 19, 2025 03:12
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
@yiliu30 yiliu30 requested a review from wenhuach21 September 22, 2025 01:16
@wenhuach21
Copy link
Contributor

@n1ck-guo @WeiweiZhang1 please have a review

Signed-off-by: yiliu30 <yi4.liu@intel.com>
@wenhuach21 wenhuach21 self-requested a review September 22, 2025 02:23
Signed-off-by: yiliu30 <yi4.liu@intel.com>
@yiliu30 yiliu30 merged commit f44d1f7 into main Sep 22, 2025
8 checks passed
@yiliu30 yiliu30 deleted the mx-loading branch September 22, 2025 07:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants