Skip to content

[BUG] Rotation parameter not saved and not used in inference ? #2354

@12345txy

Description

@12345txy

[Bug] : rotation parameter not saved and not used in inference ?

in other words how can I use rotation for correct inference?

Describe the bug

The rotation parameter (for SpinQuant/QuaRot preprocessing) is used during quantization but:

  1. It is NOT saved to quantize_config.json when saving the model
  2. It is NOT used in quantized layer's forward method during inference
  3. This causes incorrect inference results (extremely high perplexity, e.g., PPL: 48564640.0)

GPU Info

NVIDIA GeForce RTX 5090
CUDA Version: 13.0
Driver Version: 580.76.05

Software Info

  • OS: Linux 5.15.0-78-generic
  • Python: 3.12.3
  • gptqmodel: 5.6.12
  • torch: 2.9.0
  • transformers: 4.57.3
  • accelerate: 1.12.0
  • triton: 3.5.0

quantize_config.json

{
  "bits": 2,
  "group_size": 128,
  "desc_act": false,
  "sym": true,
  "quant_method": "gptq",
  "checkpoint_format": "gptq",
  "meta": {
    "gptaq": true,
    "gptaq_alpha": 0.25,
    "act_group_aware": true
  }
}

Note: The rotation field is missing, even though rotation='hadamard' was used during quantization.

To Reproduce

  1. Quantize a model with rotation:
quant_config = QuantizeConfig(..., rotation='hadamard')
model = GPTQModel.from_pretrained(model_path, quantize_config=quant_config)
model.quantize(calibration_data)
model.save(quant_path)
  1. Load for inference:
model = GPTQModel.from_quantized(quant_path)
# Even manually setting rotation doesn't help:
# model.quantize_config.rotation = 'hadamard'
# Because quantized layer's forward() doesn't check this parameter
  1. Result: Incorrect inference (PPL: 48564640.0 instead of normal values)

Expected behavior

  1. rotation parameter should be saved to quantize_config.json
  2. Quantized layer's forward method should check and use rotation parameter
  3. If rotation is set, apply inverse rotation during inference to restore correct outputs is reasonable

Additional context

  • Quantization works correctly (rotation is applied via rotate_model() in base.py:586-610)
  • Issue is in inference: quantized layers (e.g., TritonV2QuantLinear.forward) don't apply inverse rotation
  • Manual setting of rotation after loading doesn't help because forward() doesn't check it
  • This appears to be incomplete implementation of rotation feature?

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions