Open
Description
Summary
The observers in Torchao, such as AffineQuantizedMinMaxObserver
, support different quantization granularity by keeping block sizes. For example, if granularity is PerTensor
and input shape is (16, 3, 224, 224)
, the observer keeps block sizes = (16, 3, 224, 224)
.
However, the block sizes would be wrong if inputs with different shapes are passed in. For example, if another input with shape = (16, 3, 56, 56)
comes, the block sizes are updated as (16, 3, 56, 56)
. It is wrong for inputs with shape = (16, 3, 224, 224)
.
How to reproduce
The code to reproduce is similar as the reproducer here: #2094
One needs to bypass the #2094 issue by changing source code manually and also run the converted_model
after convert_pt2e
:
converted_model = convert_pt2e(prepared_model)
move_exported_model_to_eval(converted_model)
print("[info] converted_model =\n", converted_model)
converted_model(*example_inputs) # add this line