Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
fig		fig
inference		inference
kernels		kernels
pytorch_quantizer		pytorch_quantizer
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
bit_allocation_synthetic.py		bit_allocation_synthetic.py
mse_analysis.py		mse_analysis.py
optimal_alpha.ipynb		optimal_alpha.ipynb

Repository files navigation

cnn-quantization

Dependencies

python3.x
pytorch
torchvision to load the datasets, perform image transforms
pandas for logging to csv
bokeh for training visualization
scikit-learn for kmeans clustering
mlflow for logging To install requirements run:

pip install torch torchvision bokeh pandas sklearn mlflow

HW requirements

NVIDIA GPU / cuda support

Data

To run this code you need validation set from ILSVRC2012 data
Configure your dataset path by providing --data "PATH_TO_ILSVRC" or copy ILSVRC dir to ~/datasets/ILSVRC2012.
To get the ILSVRC2012 data, you should register on their site for access: http://www.image-net.org/

Building cuda kernels for GEMMLOWP

To improve performance GEMMLOWP quantization was implemented in cuda and requires to compile kernels.

Create virtual environment for python3 and activate:

virtualenv --system-site-packages -p python3 venv3
. ./venv3/bin/activate

build kernels

cd kernels
./build_all.sh

Run inference experiments

Post-training quantization of Res50

Note that accuracy results could have 0.5% variance due to data shuffling.

Experiment W4A4 naive:

python inference/inference_sim.py -a resnet50 -b 512 -pcq_w -pcq_a -sh --qtype int4 -qw int4

Prec@1 62.154 Prec@5 84.252

Experiment W4A4 + ACIQ + Bit Alloc(A) + Bit Alloc(W) + Bias correction:

python inference/inference_sim.py -a resnet50 -b 512 -pcq_w -pcq_a -sh --qtype int4 -qw int4 -c laplace -baa -baw -bcw

Prec@1 73.330 Prec@5 91.334

AICQ: Analytical Clipping for Integer Quantization

We solve eq. 6 numerically to find optimal clipping value α for both Laplace and Gaussian prior.

Numerical solution source code: optimal_alpha.ipynb

Per-channel bit allocation

Given a quota on the total number of bits allowed to be written to memory, the optimal bit width assignment Mi for channel i is the following.

Bias correction

We observe an inherent bias in the mean and the variance of the weight values following their quantization.

We calculate this bias using equation 12.

Then, we compensate for the bias for each channel of W as follows:

Quantization with optimal clipping

In order to quantize tensor to M bit with optimal clipping we use GEMMLOWP quantization with small modification. We replace dynamic range in scale computation by 2*alpha where alpha is optimal clipping value.

Quantization code can be found here: int_quantizer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cnn-quantization

Dependencies

HW requirements

Data

Building cuda kernels for GEMMLOWP

Run inference experiments

AICQ: Analytical Clipping for Integer Quantization

Per-channel bit allocation

Bias correction

Quantization with optimal clipping

About

Releases

Packages

Contributors 2

Languages

submission2019/cnn-quantization

Folders and files

Latest commit

History

Repository files navigation

cnn-quantization

Dependencies

HW requirements

Data

Building cuda kernels for GEMMLOWP

Run inference experiments

AICQ: Analytical Clipping for Integer Quantization

Per-channel bit allocation

Bias correction

Quantization with optimal clipping

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages