This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
Closed
Description
I use the code from PR #13715, and I got a huge performance decrease by doing quantization on my model. I tested on Windows 10 with CUDA 10 and cudnn7 on Titan X (Pascal), using pre-release build from pip mxnet-cu100.
Alought by this issue #10897, it claimed that INT8 quantization can save GPU memory during usage, I got almose 2x more VRAM usage by quantization.
Do we excepted that INT8 quantization is super slow and use more memory on GPU?
And I may assume that UINT8 quantization is not yet supported since the UINT8 quantizated parameters is signed integer.
So, do we have any plan for improving INT8 quantization in the near future?