-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[quantize] fix bug of annotate for output of add op #15529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The type of add op`s output is activation, it should annotate by QAnnotateKind.ACTIVATION. If not, the graph will cast int32 into int8 directly without quantized, when quantize resnet.
Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.
Generated by tvm-bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MingkangW ,
Can also add a small test case ?
Sure, I will do it |
@cbalint13 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MingkangW ,
LGTM, thank you !
PTAL @cbalint13 @tqchen |
cc: @masahi |
The type of add op
s output is activation, it should annotate by QAnnotateKind.ACTIVATION. If not, the graph will cast int32 into int8 directly without quantized, when quantize resnet. It will product overflow error, and it won
t prompt when infer the model. The resnet18_v1 test case is below.before fixed
%703 = add(%701, %702) /* ty=Tensor[(1, 512, 7, 7), int32] /;
%704 = nn.relu(%703) / ty=Tensor[(1, 512, 7, 7), int32] /;
%705 = cast(%704, dtype="int8") / ty=Tensor[(1, 512, 7, 7), int8] /;
%706 = annotation.stop_fusion(%705) / ty=Tensor[(1, 512, 7, 7), int8] /;
after fixed
%443 = add(%441, %442) / ty=Tensor[(1, 512, 7, 7), int32] /;
%444 = nn.relu(%443) / ty=Tensor[(1, 512, 7, 7), int32] /;
%445 = cast(%444, dtype="int64") / ty=Tensor[(1, 512, 7, 7), int64] /;
%446 = fixed_point_multiply(%445, multiplier=1439683968, shift=-2) / ty=Tensor[(1, 512, 7, 7), int64] /;
%447 = clip(%446, a_min=-127f, a_max=127f) / ty=Tensor[(1, 512, 7, 7), int64] /;
%448 = cast(%447, dtype="int32") / ty=Tensor[(1, 512, 7, 7), int32] /;
%449 = cast(%448, dtype="int8") / ty=Tensor[(1, 512, 7, 7), int8] /;
%450 = annotation.stop_fusion(%449) / ty=Tensor[(1, 512, 7, 7), int8] */;