Convert AutoAWQ checkpoints to compressed-tensors #531
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Originally llm-compressor#2112.
This PR introduces a new script to convert AutoAWQ checkpoints into
compressed-tensors-compatible format undermodifiers/awq. Resolves llm-compressor#2087.Usage
Via CLI:
Via Python:
Known Issue
Asymmetric Support in
compressed-tensorsAutoAWQwith versionGEMMonly supports asymmetric quantization 1.AssertionErrorwill be raised despite settingzero_point=False.PackedQuantizationCompressoris a WIP 2.Test Plan
compressed-tensors.torch.testing.assert_close, potentially due to GEMM kernel's internal precision?AutoAWQForCausalLMand vLLM.compressed-tensorsbased on 3.compressed-tensorscheckpoints.ruikangliu/DeepSeek-R1-Distill-Qwen-1.5B-quantized.awq-autoawq-w4g128
AMead10/Llama-3.2-3B-Instruct-AWQ
fbaldassarri/mistralai_Mistral-7B-Instruct-v0.3-autoawq-int4-gs128-asym
Future Work
packed-quantizedonce asymmetric decompression is finalized.AutoModelForCausalLMwith a more generalized autoclass.Footnotes
awq/modules/linear/gemm.py#L187 ↩
llm-compressor#1704 ↩
fix qparams decompression #514 ↩ ↩2
Revert "fix qparams decompression (#514)" #527 ↩