Open
Description
Although fake_quantizer implements forwarding method for all three kinds of granularities, only per_group can be used for linear layer weight in QAT. This is because when FakeQuantizedLinear's __init__
is called, it first trys to get the group size in weight config:
ao/torchao/quantization/qat/linear.py
Lines 84 to 94 in 2c901b3
And if the weight config use any granularity other than per_group, an exception will be raised
ao/torchao/quantization/qat/api.py
Lines 216 to 226 in 2c901b3
An easy fix can be checking granularity type before getting group size
# initialize weight fake quantizer
if weight_config is not None:
if isinstance(weight_config.granularity, PerGroup):
group_size = weight_config.group_size
if group_size is not None and in_features % group_size != 0:
raise ValueError(
"in_features (%s) %% group_size (%s) must be == 0"
% (in_features, group_size)
)
self.weight_fake_quantizer = FakeQuantizer(weight_config)
else:
self.weight_fake_quantizer = None