MXFP4 and MXFP8 loading support #832

yiliu30 · 2025-09-18T11:13:54Z

Support loading the quantized MXFP4/MXFP8 via transformers.from_pretrained` API.
Other small fixes

Signed-off-by: yiliu30 <yi4.liu@intel.com>

Copilot

Pull Request Overview

This PR adds support for MXFP4 and MXFP8 quantization formats, including new quantization modules, backend configurations, and test coverage. The implementation introduces specialized quantized linear layers for MXFP4/8 formats with proper loading and inference support.

Key Changes:

Adds new MXFP4QuantLinear and MXFP8QuantLinear modules with specialized quantization/dequantization logic
Updates backend infrastructure to support MXFP4/8 schemes with proper device compatibility checks
Implements FP4 utility functions for packing/unpacking operations

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
auto_round/experimental/qmodules/mx.py	New quantized linear modules for MXFP4/8 with weight initialization and dequantization
auto_round/experimental/qmodules/init.py	Exports the new MXFP quantization modules
auto_round/data_type/fp4_utils.py	FP4 unpacking utilities for converting uint8 to fp4 values
auto_round/inference/backend.py	Backend configuration and scheme checking for MXFP4/8 support
auto_round/inference/convert_model.py	Model conversion logic updates for MXFP format handling
auto_round/schemes.py	Adds act_sym parameter to MXFP4/8 quantization schemes
auto_round/export/export_to_autoround/qlinear_fp.py	Updates weight buffer initialization for MXFP quantization
auto_round/export/export_to_autoround/export.py	Renames FP8_STATIC format enum and updates references
auto_round/autoround.py	Updates format references for consistency
test/test_cuda/test_mx_quant.py	End-to-end test for MXFP4/8 quantization and inference

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

test/test_cuda/test_mx_quant.py

auto_round/experimental/qmodules/mx.py

auto_round/export/export_to_autoround/qlinear_fp.py

Signed-off-by: yiliu30 <yi4.liu@intel.com>

test/test_cuda/test_mx_quant.py

auto_round/inference/backend.py

Signed-off-by: yiliu30 <yi4.liu@intel.com>

auto_round/inference/backend.py

Signed-off-by: yiliu30 <yi4.liu@intel.com>

auto_round/experimental/qmodules/fp4_utils.py

test/test_cuda/test_mx_quant.py

wenhuach21 · 2025-09-22T01:33:07Z

@n1ck-guo @WeiweiZhang1 please have a review

Signed-off-by: yiliu30 <yi4.liu@intel.com>

auto_round/experimental/qmodules/mx.py

auto_round/inference/backend.py

yiliu30 added 14 commits September 11, 2025 01:16

fix name

caea52f

Signed-off-by: yiliu30 <yi4.liu@intel.com>

add mx lienar

cf40318

Signed-off-by: yiliu30 <yi4.liu@intel.com>

add test

4ed10ab

Signed-off-by: yiliu30 <yi4.liu@intel.com>

Merge branch 'main' into mx-loading

28483fd

Merge branch 'main' into mx-loading

a7450d4

refine code

8f7f3b3

Signed-off-by: yiliu30 <yi4.liu@intel.com>

fix

fae2e04

Signed-off-by: yiliu30 <yi4.liu@intel.com>

add ut

77f9022

Signed-off-by: yiliu30 <yi4.liu@intel.com>

revert ut change

60dc397

Signed-off-by: yiliu30 <yi4.liu@intel.com>

Merge branch 'main' into mx-loading

d504279

fix version check

03c1e3d

Signed-off-by: yiliu30 <yi4.liu@intel.com>

refine check

8083e75

Signed-off-by: yiliu30 <yi4.liu@intel.com>

refine

552b1ea

Signed-off-by: yiliu30 <yi4.liu@intel.com>

fix

e2bd1d6

Signed-off-by: yiliu30 <yi4.liu@intel.com>

yiliu30 requested a review from Copilot September 18, 2025 11:14

Copilot AI reviewed Sep 18, 2025

View reviewed changes

yiliu30 marked this pull request as ready for review September 18, 2025 11:18

fix

91b6702

Signed-off-by: yiliu30 <yi4.liu@intel.com>