-
Notifications
You must be signed in to change notification settings - Fork 60
optimize vram for gguf and add momentum #1031
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
…o fix_imatrix # Conflicts: # auto_round/data_type/gguf.py
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes an imatrix padding issue and optimizes VRAM usage for GGUF export. The changes introduce memory management improvements and add support for chunked quantization processing to handle large tensors more efficiently.
Key changes:
- Fixed imatrix padding by ensuring it's padded to the correct group size with appropriate fill value (1e-5)
- Added chunked processing for large tensors in quantization search to reduce memory usage
- Consolidated memory cleanup utilities and added momentum parameter support
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| auto_round/export/export_to_gguf/packing.py | Added memory clearing and imatrix device handling in exception path |
| auto_round/export/export_to_awq/utils.py | Removed duplicate clear_memory function |
| auto_round/data_type/utils.py | Added configurable padding value parameter |
| auto_round/data_type/int.py | Fixed imatrix padding with appropriate fill value |
| auto_round/data_type/gguf.py | Added chunked processing for large tensors and refactored quantization search |
| auto_round/compressors/base.py | Added momentum parameter support |
| auto_round/main.py | Added momentum command-line argument |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
No description provided.