-
Notifications
You must be signed in to change notification settings - Fork 59
MXFP4 and MXFP8 loading support #832
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for MXFP4 and MXFP8 quantization formats, including new quantization modules, backend configurations, and test coverage. The implementation introduces specialized quantized linear layers for MXFP4/8 formats with proper loading and inference support.
Key Changes:
- Adds new MXFP4QuantLinear and MXFP8QuantLinear modules with specialized quantization/dequantization logic
- Updates backend infrastructure to support MXFP4/8 schemes with proper device compatibility checks
- Implements FP4 utility functions for packing/unpacking operations
Reviewed Changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| auto_round/experimental/qmodules/mx.py | New quantized linear modules for MXFP4/8 with weight initialization and dequantization |
| auto_round/experimental/qmodules/init.py | Exports the new MXFP quantization modules |
| auto_round/data_type/fp4_utils.py | FP4 unpacking utilities for converting uint8 to fp4 values |
| auto_round/inference/backend.py | Backend configuration and scheme checking for MXFP4/8 support |
| auto_round/inference/convert_model.py | Model conversion logic updates for MXFP format handling |
| auto_round/schemes.py | Adds act_sym parameter to MXFP4/8 quantization schemes |
| auto_round/export/export_to_autoround/qlinear_fp.py | Updates weight buffer initialization for MXFP quantization |
| auto_round/export/export_to_autoround/export.py | Renames FP8_STATIC format enum and updates references |
| auto_round/autoround.py | Updates format references for consistency |
| test/test_cuda/test_mx_quant.py | End-to-end test for MXFP4/8 quantization and inference |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
|
@n1ck-guo @WeiweiZhang1 please have a review |
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Uh oh!
There was an error while loading. Please reload this page.