Skip to content

BUG: Mixed-precision configuration not working with STATIC quantization #163

Open
@sasha-hailo

Description

@sasha-hailo

Dear LLMC team,
I've been trying to run mixed-precision PTQ quantization using RTN.
I suspect there's a bug, as the non-default settings in mix_bits are ignored.

My understanding of the code:

  • In method get_act_qparams() of rtn.py, the values of qmax / qmin / scales / zeros are determined using the default quantizer bit precision
  • These values are registered as buf_act_<xxx> buffers, for all modules / layers.
  • During inference time, in method a_qdq() of rtn.py, though the aquantizer object of each layer is configured correctly, it blindly loads from buffer the registered quantization parameters qmin / qmax / scales / zeros, and uses them, instead of the actual values it should support.

What do you think?
Thanks in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions