[BUG] Rotation parameter not saved and not used in inference ?

# [Bug] : rotation parameter not saved and not used in inference ?
### in other words how can I use rotation for correct inference?

## Describe the bug

The `rotation` parameter (for SpinQuant/QuaRot preprocessing) is used during quantization but:
1. It is NOT saved to `quantize_config.json` when saving the model
2. It is NOT used in quantized layer's `forward` method during inference
3. This causes incorrect inference results (extremely high perplexity, e.g., PPL: 48564640.0)

## GPU Info

```
NVIDIA GeForce RTX 5090
CUDA Version: 13.0
Driver Version: 580.76.05
```

## Software Info

- OS: Linux 5.15.0-78-generic
- Python: 3.12.3
- gptqmodel: 5.6.12
- torch: 2.9.0
- transformers: 4.57.3
- accelerate: 1.12.0
- triton: 3.5.0

## quantize_config.json

```json
{
  "bits": 2,
  "group_size": 128,
  "desc_act": false,
  "sym": true,
  "quant_method": "gptq",
  "checkpoint_format": "gptq",
  "meta": {
    "gptaq": true,
    "gptaq_alpha": 0.25,
    "act_group_aware": true
  }
}
```

Note: The `rotation` field is missing, even though `rotation='hadamard'` was used during quantization.

## To Reproduce

1. Quantize a model with rotation:
```python
quant_config = QuantizeConfig(..., rotation='hadamard')
model = GPTQModel.from_pretrained(model_path, quantize_config=quant_config)
model.quantize(calibration_data)
model.save(quant_path)
```

2. Load for inference:
```python
model = GPTQModel.from_quantized(quant_path)
# Even manually setting rotation doesn't help:
# model.quantize_config.rotation = 'hadamard'
# Because quantized layer's forward() doesn't check this parameter
```

3. Result: Incorrect inference (PPL: 48564640.0 instead of normal values)

## Expected behavior

1. `rotation` parameter should be saved to `quantize_config.json`
2. Quantized layer's `forward` method should check and use `rotation` parameter
3. If rotation is set, apply inverse rotation during inference to restore correct outputs is reasonable

## Additional context

- Quantization works correctly (rotation is applied via `rotate_model()` in `base.py:586-610`)
- Issue is in inference: quantized layers (e.g., `TritonV2QuantLinear.forward`) don't apply inverse rotation
- Manual setting of `rotation` after loading doesn't help because `forward()` doesn't check it
- This appears to be incomplete implementation of rotation feature?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Rotation parameter not saved and not used in inference ? #2354

[Bug] : rotation parameter not saved and not used in inference ?

in other words how can I use rotation for correct inference?

Describe the bug

GPU Info

Software Info

quantize_config.json

To Reproduce

Expected behavior

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[BUG] Rotation parameter not saved and not used in inference ? #2354

Description

[Bug] : rotation parameter not saved and not used in inference ?

in other words how can I use rotation for correct inference?

Describe the bug

GPU Info

Software Info

quantize_config.json

To Reproduce

Expected behavior

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions