Skip to content

Conversation

@wenhuach21
Copy link
Contributor

No description provided.

@wenhuach21 wenhuach21 marked this pull request as draft November 13, 2025 08:02
@wenhuach21 wenhuach21 marked this pull request as ready for review November 13, 2025 13:16
@wenhuach21 wenhuach21 changed the title fix imatrix pad issue fix imatrix pad issue and optimize vram for gguf Nov 13, 2025
@wenhuach21 wenhuach21 requested a review from Copilot November 13, 2025 13:38
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes an imatrix padding issue and optimizes VRAM usage for GGUF export. The changes introduce memory management improvements and add support for chunked quantization processing to handle large tensors more efficiently.

Key changes:

  • Fixed imatrix padding by ensuring it's padded to the correct group size with appropriate fill value (1e-5)
  • Added chunked processing for large tensors in quantization search to reduce memory usage
  • Consolidated memory cleanup utilities and added momentum parameter support

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
auto_round/export/export_to_gguf/packing.py Added memory clearing and imatrix device handling in exception path
auto_round/export/export_to_awq/utils.py Removed duplicate clear_memory function
auto_round/data_type/utils.py Added configurable padding value parameter
auto_round/data_type/int.py Fixed imatrix padding with appropriate fill value
auto_round/data_type/gguf.py Added chunked processing for large tensors and refactored quantization search
auto_round/compressors/base.py Added momentum parameter support
auto_round/main.py Added momentum command-line argument

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@wenhuach21 wenhuach21 changed the title fix imatrix pad issue and optimize vram for gguf optimize vram for gguf and add momentum Nov 14, 2025
@wenhuach21 wenhuach21 merged commit 58b3d90 into main Nov 14, 2025
16 of 22 checks passed
@wenhuach21 wenhuach21 deleted the fix_imatrix branch November 14, 2025 05:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants