Convert AutoAWQ checkpoints to compressed-tensors #531

mutichung · 2025-12-19T02:30:46Z

Summary

This PR introduces a new script to convert AutoAWQ checkpoints into compressed-tensors-compatible format under modifiers/awq. Resolves llm-compressor#2087.

Usage

Via CLI:

python -m compressed_tensors.converters.autoawq \
  --model-name-or-path /path/to/model \
  --output-dir /path/to/compressed/model \
  --quantization-format naive-quantized

Via Python:

from compressed_tensors.converters.autoawq import load_and_convert_from_autoawq

awq_model_path = "/path/to/model"  # can also be model_id on huggingface hub
model = load_and_convert_from_autoawq(awq_model_path)

Known Issue

Asymmetric Support in `compressed-tensors`

AutoAWQ with version GEMM only supports asymmetric quantization ¹.
- AssertionError will be raised despite setting zero_point=False.
Support for zero-point decompression in PackedQuantizationCompressor is a WIP ².
2025/12/15 Update: zero-point decompression has been merged in ³ but reverted shortly after ⁴.

Test Plan

Create tests to compare output logits between AutoAWQ and compressed-tensors.
- The logits do not satisfy torch.testing.assert_close, potentially due to GEMM kernel's internal precision?
Run and compare benchmark results between AutoAWQForCausalLM and vLLM.
- Using compressed-tensors based on ³.
Created tests to compare benchmark results between AutoAWQ and compressed-tensors checkpoints.

ruikangliu/DeepSeek-R1-Distill-Qwen-1.5B-quantized.awq-autoawq-w4g128

Format	Inference Backend	ARC-Easy	ARC-Challenge
AutoAWQ	hf	0.6435	0.3584
naive-quantized	hf	0.6431	0.3584
packed-quantized	hf	0.6431	0.3584
packed-quantized	vllm	0.6427	0.3592

AMead10/Llama-3.2-3B-Instruct-AWQ

Format	Inference Backend	ARC-Easy	ARC-Challenge
AutoAWQ	hf	0.7976	0.5017
naive-quantized	hf	0.7971	0.5026
packed-quantized	hf	0.7971	0.5026
packed-quantized	vllm	0.7976	0.5043

fbaldassarri/mistralai_Mistral-7B-Instruct-v0.3-autoawq-int4-gs128-asym

Format	Inference Backend	ARC-Easy	ARC-Challenge
AutoAWQ	hf	0.8641	0.6280
naive-quantized	hf	0.8645	0.6280
packed-quantized	hf	0.8645	0.6280
packed-quantized	vllm	0.8649	0.6280

Future Work

Support other AutoAWQ versions, e.g., GEMV.
Set default quantization format to packed-quantized once asymmetric decompression is finalized.
Replace AutoModelForCausalLM with a more generalized autoclass.

Signed-off-by: GitHub <noreply@github.com>

mutichung · 2025-12-19T02:44:20Z

src/compressed_tensors/converters/autoawq.py

+        # Unpack the qweight and qzeros tensors
+        iweight, izeros = unpack_awq(qweight, qzeros, bits)
+        # Reverse the order of the iweight and izeros tensors
+        iweight, izeros = reverse_awq_order(iweight, izeros, bits)


@brian-dellabetta Since auto-round is not a dependency of compressed-tensors, should we keep a copy of the utilities for unpacking and dequantization instead of importing from auto-round?

@mutichung yes i think that's a good idea. compressed-tensors is our library for storing, loading and converting compressed model checkpoint formats, so this seems like a reasonable set of helpers to add. Basically the same reason we want to move it out of llm-compressor, which is scoped to just the code and flows needed to run model compression

brian-dellabetta · 2025-12-19T16:43:03Z

Thanks @mutichung for moving this over! I replied to your message, your suggestion to move the logic here and avoid a compressed-tensors dependency on autoround make sense to me

mutichung mentioned this pull request Dec 19, 2025

Convert AutoAWQ checkpoints to compressed-tensors vllm-project/llm-compressor#2112

Closed

3 tasks

Moving PR from llm-compressor#2112

8590695

Signed-off-by: GitHub <noreply@github.com>

mutichung force-pushed the feature/convert-autoawq branch from b5efafa to 8590695 Compare December 19, 2025 02:35

mutichung commented Dec 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Convert AutoAWQ checkpoints to compressed-tensors #531

Convert AutoAWQ checkpoints to compressed-tensors #531

Uh oh!

mutichung commented Dec 19, 2025

Uh oh!

mutichung Dec 19, 2025 •

edited

Loading

Uh oh!

brian-dellabetta Dec 19, 2025

Uh oh!

brian-dellabetta commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Convert AutoAWQ checkpoints to compressed-tensors #531

Are you sure you want to change the base?

Convert AutoAWQ checkpoints to compressed-tensors #531

Uh oh!

Conversation

mutichung commented Dec 19, 2025

Summary

Usage

Known Issue

Asymmetric Support in compressed-tensors

Test Plan

ruikangliu/DeepSeek-R1-Distill-Qwen-1.5B-quantized.awq-autoawq-w4g128

AMead10/Llama-3.2-3B-Instruct-AWQ

fbaldassarri/mistralai_Mistral-7B-Instruct-v0.3-autoawq-int4-gs128-asym

Future Work

Footnotes

Uh oh!

mutichung Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brian-dellabetta Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

brian-dellabetta commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Asymmetric Support in `compressed-tensors`

mutichung Dec 19, 2025 •

edited

Loading