-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Topi & Relay] Add quantization support for the vision transform model in GPU #7814
Conversation
Thanks for the reviewer. We will keep updating more results for other ViT models and contributing more quantization calibration algorithms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! @huochaitiantang
This PR looks great! I have only few comments.
if not os.path.exists(logfile): | ||
os.system("wget https://github.com/TheGreatCold/tvm-vit/raw/master/{}".format(logfile)) | ||
if not os.path.exists(onnx_path): | ||
os.system("wget https://github.com/TheGreatCold/tvm-vit/raw/master/{}".format(onnx_path)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a unit test, I'm thinking that this may not be so good to involve a resource outside. (Network problem or changes to the tvm-vit
repo may break the UT. At least use git commit instad of branch like: https://github.com/TheGreatCold/tvm-vit/raw/d2aa1e60eef42e2fdedbd1e13aa85ac5faf0a7fc/vit_B32_224.onnx
will be better)
I'm not sure if there's any better solution for this. @tqchen Do you have any suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review. We updated the download codes based on your suggestion. Besides, the wget method is not compatible on different platforms, so we use the urllib library instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! @huochaitiantang @XHPlus . Have fun with TVM. Looking forward to see more contributions from you!
…l in GPU (apache#7814) * Add cuda batch matmul int8 support for quantized vit model * Fix for combine parallel pass with dense and batch_matmul * Reformat based on lint * Add plevel & update the file download method
…l in GPU (apache#7814) * Add cuda batch matmul int8 support for quantized vit model * Fix for combine parallel pass with dense and batch_matmul * Reformat based on lint * Add plevel & update the file download method
…l in GPU (apache#7814) * Add cuda batch matmul int8 support for quantized vit model * Fix for combine parallel pass with dense and batch_matmul * Reformat based on lint * Add plevel & update the file download method
…l in GPU (apache#7814) * Add cuda batch matmul int8 support for quantized vit model * Fix for combine parallel pass with dense and batch_matmul * Reformat based on lint * Add plevel & update the file download method
We submit this PR to add quantization support for the vision transform (vit) model in GPU. The main change is as follows:
1, In vit model, time-consuming operators are batch_matmul, so we first add the compute and schedule of
batch_matmul_int8.cuda
in tvm.topi.cuda2, To support the quantization of batch_matmul, we then add
batch_matmul_rewrite
andBatchMatmulRealize
in tvm.relay.quantize3, The kl -divergence calibrate could not preserve the accuracy of vit model well, so we add the
_percentile_scale
functionFor the vit-B32-224 model, the performance is as follows:
Top-1 accuracy in Imagenet validation
The latency in GTX1660 GPU
Thanks for your review! @jcf94 @tqchen