Skip to content

Extreme CPU memory usage(>1TB) with qwen3-32b and AWQ #1539

Closed
@44670

Description

@44670

Describe the bug

When quantizing qwen3-32b models with awq, it requires cpu memory that > 1.2TB, in the step: "_calibrate | INFO - Running AWQModifier calibration with 250 samples..."

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
55135 root 20 0 1205.8g 1.1t 1.1g R 99.7 53.7 9:35.79 python3

Expected behavior

Environment
Include all relevant environment information:

  1. OS : Ubuntu 22.04
  2. Python version [e.g. 3.7]: Python 3.10.12
  3. LLM Compressor version or commit hash [e.g. 0.1.0, f7245c8]: 0.5.1
  4. ML framework version(s) [e.g. torch 2.3.1]:
  5. Other Python package versions [e.g. vLLM, compressed-tensors, numpy, ONNX]:
  6. Other relevant environment information [e.g. hardware, CUDA version]:

To Reproduce
Exact steps to reproduce the behavior:

My code:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
from llmcompressor import oneshot
from llmcompressor.modifiers.awq import AWQModifier
from mydataset import SupervisedDataset
from datasets import load_dataset
from llmcompressor.modifiers.quantization import QuantizationModifier
from compressed_tensors.quantization import (
    QuantizationArgs,
    QuantizationScheme,
    QuantizationStrategy,
    QuantizationType,
)

# Select model and load it.
MODEL_ID = "/path/to/qwen3-32b/model"


# Select calibration dataset - using custom dataset
DATASET_FILE = "calib.jsonl" 


model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID, device_map="auto", torch_dtype=torch.bfloat16
)
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, trust_remote_code=True)

# Select number of samples. 256 samples is a good place to start.
# Increasing the number of samples can improve accuracy.
NUM_CALIBRATION_SAMPLES = 250
MAX_SEQUENCE_LENGTH = 1200

# Function to format the data consistently
def format_data(example):
    # Convert the conversation format to text
    if example.get('messages'):
        # Format conversation messages into text
        text_parts = []
        for msg in example['messages']:
            role = msg['role']
            content = msg['content'].strip()
            text_parts.append(f"<|role|>{role}<|says|>{content}<|end|>")
        text = '\n'.join(text_parts)
    elif example.get('ctx'):
        # Context + generation format
        text = example['ctx'] + example['gen']
    elif example.get('txt'):
        # Simple text format
        text = example['txt']
    else:
        assert False, "Unknown format in example: {}".format(example)
    print("TEXT:", text)
    return {"text": text}

# Load dataset using datasets library
ds = load_dataset('json', data_files=DATASET_FILE, split='train')

# Take only the number of samples we need and shuffle
ds = ds.shuffle(seed=42).select(range(min(NUM_CALIBRATION_SAMPLES, len(ds))))

# Apply formatting function
ds = ds.map(format_data)

# Configure the quantization algorithm to run.
# NOTE: vllm currently does not support asym MoE, using symmetric here
recipe = [
    AWQModifier(bits=4, symmetric=False),
    QuantizationModifier(
        ignore=["lm_head", "norm", "embed_tokens"],
        config_groups={
            "group_0": QuantizationScheme(
                targets=["Linear"],
                weights=QuantizationArgs(
                    num_bits=4,
                    type=QuantizationType.INT,
                    dynamic=False,
                    symmetric=False,
                    strategy=QuantizationStrategy.GROUP,
                    group_size=128,
                ),
            )
        },
    ),
]


SAVE_DIR = MODEL_ID + "-awq2"

oneshot(
    model=model,
    dataset=ds,
    recipe=recipe,
    max_seq_length=MAX_SEQUENCE_LENGTH,
    num_calibration_samples=NUM_CALIBRATION_SAMPLES,
    output_dir=SAVE_DIR
)

tokenizer.save_pretrained(SAVE_DIR)


print('Done!')

Errors
If applicable, add a full print-out of any errors or exceptions that are raised or include screenshots to help explain your problem.

Additional context
Add any other context about the problem here. Also include any relevant files.

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions