BUG: Mixed-precision configuration not working with STATIC quantization

Dear LLMC team,
I've been trying to run mixed-precision PTQ quantization using RTN.
I suspect there's a bug, as the **non-default settings in `mix_bits` are ignored**.

My understanding of the code:
- In method `get_act_qparams()` of `rtn.py`, the values of `qmax` / `qmin` / `scales` / `zeros` are determined using the **default** quantizer bit precision
- These values are registered as `buf_act_<xxx>` buffers, _for all modules / layers_.
- During inference time, in method `a_qdq()` of `rtn.py`, though the `aquantizer` object of each layer is configured correctly, it blindly loads from buffer the registered quantization parameters `qmin` / `qmax` / `scales` / `zeros`, and uses them, instead of the **actual** values it should support.

What do you think?
Thanks in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BUG: Mixed-precision configuration not working with STATIC quantization #163

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

BUG: Mixed-precision configuration not working with STATIC quantization #163

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions