Skip to content

0seba/quantization_results

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Quantization Method MMLU-Pro MMLU-Redux MMLU-Redux-V2.0 GPQA GPQA-Diamond
Acc Maj # 11801 @ 1 Acc Maj # 8862 @ 3 Acc Maj # 16944 @ 3 Acc Maj # 2240 @ 5 Acc Maj # 990 @ 5
Qwen3-4B-Instruct-2507 [Reported] 69.6 84.2 62.0
Qwen3-4B-Thinking-2507 [Reported] 74.0 86.1 65.8
Qwen3-4B-Instruct-2507 71.84@1 11733 81.23@3 82.77@3 8862 83.19@3 83.85@3 16925
Qwen3-4B-Thinking-2507 72.51@1 11800 83.93@3 85.22@3 16943 58.84@5 62.28@5 2240 63.43@5 67.68@5 923
Qwen3-Instruct-2507 Gemlite-4Bits-per-channel* 70.93@1 11801 81.27@3 81.89@3 8862 83.36@3 83.85@3 59.6@5 61.62@5 990
GPTQ-4Bits-per-channel 65.32@1 11801 77.48@3 78.94@3 8856 79.99@3 81.06@3 16942 45.56@5 46.46@5 988
AutoRound-Best 67.09@1 11801 78.64@3 79.55@3 8862 81.02@3 82.24@3 16944 50.2@5 52.02@5 990
AutoRound-RTN 61.44@1 11801 75.93@3 77.62@3 8862 78.77@3 80.076@3 16944 43.71@5 47.1@5 2240 34.95@5 43.94@5 990

* not quantized, missing some sglang config

Quantization Method MMLU-Pro
Level 1 (5307) Level 2 (663) Level 3 (1645) Level 4 (941) Level 5 (3476)
Qwen3-4B-Instruct-2507 100% 100% 100% 0% 23.42%
Qwen3-4B-Thinking-2507 100% 100% 100% 100% 0%
Qwen3-Instruct-2507 Gemlite-4Bits-per-channel* 100% 100% 72.4% 41.45% 23.56%
GPTQ-4Bits-per-channel 100% 100% 37.63% 39.63% 21.52%
AutoRound-Best 100% 100% 50.03% 37.94% 22.07%
AutoRound-RTN 100% 0.0% 54.47% 33.9% 20.97%
  • AutoRoundBest: auto-round --model Qwen/Qwen3-4B-Instruct-2507 --bits 4 --iters 1000 --output_dir Qwen3-4B-Instruct-2507-AutoRound-Best --format auto_gptq --group_size -1 --nsamples 512 --scale_dtype bf16 --model_dtype bf16 --enable_torch_compile --data_type int --seqlen 4096
  • AutoRoundRTN: auto-round --model Qwen/Qwen3-4B-Instruct-2507 --bits 4 --iters 0 --output_dir Qwen3-4B-Instruct-2507-AutoRound-RTN --format auto_gptq --group_size -1
  • GPTQ
from datasets import load_dataset
from gptqmodel import GPTQModel, QuantizeConfig
model_id = "Qwen/Qwen3-4B-Instruct-2507"
quant_path = "Qwen3-4B-Instruct-2507-GPTQ"
calibration_dataset = load_dataset(
    "allenai/c4",
    data_files="en/c4-train.00001-of-01024.json.gz",
    split="train"
).select(range(1024))["text"]
quant_config = QuantizeConfig(bits=4, group_size=-1)
model = GPTQModel.load(model_id, quant_config)
model.quantize(calibration_dataset, batch_size=32)
# model.save(quant_path)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published