Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Huge performance decrease by quantization #13720

Closed
@kice

Description

I use the code from PR #13715, and I got a huge performance decrease by doing quantization on my model. I tested on Windows 10 with CUDA 10 and cudnn7 on Titan X (Pascal), using pre-release build from pip mxnet-cu100.

Alought by this issue #10897, it claimed that INT8 quantization can save GPU memory during usage, I got almose 2x more VRAM usage by quantization.

Do we excepted that INT8 quantization is super slow and use more memory on GPU?

And I may assume that UINT8 quantization is not yet supported since the UINT8 quantizated parameters is signed integer.

So, do we have any plan for improving INT8 quantization in the near future?

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions