Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[FEATURE] Restore Quantization API to MXNet #19587

Merged
merged 40 commits into from
Dec 12, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
840bc2d
Restore quantization files
bgawrych Sep 16, 2020
2f2adcb
Adapt quantization.py - Add/Remove modules
bgawrych Sep 16, 2020
a2fd342
Adapt part of quantization tests to new API
bgawrych Sep 16, 2020
6d6ff15
fuse fc+tanh
sfraczek Sep 18, 2020
889a0dd
Replace Module API with SymbolBlock in quantize_model
bgawrych Sep 21, 2020
2937d5b
enabled test_quantization_mkldnn.py
sfraczek Sep 29, 2020
2ed842c
Revert "fuse fc+tanh"
bgawrych Sep 30, 2020
7694290
Enable tests from test_subgraph.py
bgawrych Oct 1, 2020
9364b30
Enable test_mobilenetv2_struct
bgawrych Oct 2, 2020
2da0f96
Refactor test_subgraph.py
bgawrych Oct 5, 2020
b6c3740
Reorder of conditions
sfraczek Oct 8, 2020
d8af341
Utilize optimize_for in quantization flow
bgawrych Oct 7, 2020
2e8aab7
remove duplicate imports
bgawrych Oct 8, 2020
8bdfb0b
Add variable monitor callback
bgawrych Oct 19, 2020
03a568f
fix sanity
bgawrych Oct 19, 2020
9c7483a
wip
sfraczek Oct 29, 2020
cf5376c
Rebase to master - remove with_seed
bgawrych Nov 3, 2020
07b3942
Add numpy support for quantization
bgawrych Nov 3, 2020
a8f80e9
enabled examples/quantization/imagenet_gen_qsym_mkldnn.py
sfraczek Nov 5, 2020
e7428b2
Add test to check different way of data generation for hybridize
bgawrych Nov 5, 2020
fb7bf99
Copy original network
bgawrych Nov 6, 2020
f095f25
Change num_calib_examples to num_calib_batches
bgawrych Nov 6, 2020
4e3591e
enabling imagenet_inference.py
sfraczek Nov 9, 2020
c898b14
Add base class for collectors and feed custom with calib_layers
bgawrych Nov 9, 2020
bda6463
Some doc fixes after discussion
grygielski Nov 12, 2020
0375801
anko review - change all quantize_net_v2 to quantize_net
bgawrych Nov 13, 2020
3e1ee58
Make -s argument required
bgawrych Nov 13, 2020
2db7c6d
review fixes by mozga and anko
sfraczek Nov 18, 2020
b080a24
Fix bugs
bgawrych Nov 18, 2020
0773950
Fix channel-wise quantization
bgawrych Nov 23, 2020
297eaac
Fix documentation formatting
bgawrych Nov 24, 2020
3b6cd2b
mozga: fix review
bgawrych Nov 23, 2020
a62047b
Fix lint
bgawrych Nov 26, 2020
3f661bd
Refactor calibration for variables
bgawrych Nov 26, 2020
674f48f
fix sanity
bgawrych Nov 27, 2020
cbb8b2e
fix clang tidy
bgawrych Nov 27, 2020
20fce07
Merge branch 'master' into quantization_20
bgawrych Nov 30, 2020
a9daf5d
ciyong review fixes
bgawrych Dec 4, 2020
1e34d48
Add verified models
bgawrych Dec 9, 2020
f24e2bc
Fix review: Tao and Xinyu
bgawrych Dec 10, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
191 changes: 191 additions & 0 deletions example/quantization/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,191 @@
<!--- Licensed to the Apache Software Foundation (ASF) under one -->
<!--- or more contributor license agreements. See the NOTICE file -->
<!--- distributed with this work for additional information -->
<!--- regarding copyright ownership. The ASF licenses this file -->
<!--- to you under the Apache License, Version 2.0 (the -->
<!--- "License"); you may not use this file except in compliance -->
<!--- with the License. You may obtain a copy of the License at -->
<!--- -->
<!--- http://www.apache.org/licenses/LICENSE-2.0 -->
<!--- -->
<!--- Unless required by applicable law or agreed to in writing, -->
<!--- software distributed under the License is distributed on an -->
<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
<!--- KIND, either express or implied. See the License for the -->
<!--- specific language governing permissions and limitations -->
<!--- under the License. -->

# Model Quantization with Calibration Examples

This folder contains examples of quantizing a FP32 model with Intel® oneAPI Deep Neural Network Library (oneDNN) to (U)INT8 model.

<h2 id="1">Model Quantization with Intel® oneDNN</h2>

Intel® oneDNN supports quantization with subgraph features on Intel® CPU Platform and can bring performance improvements on the [Intel® Xeon® Scalable Platform](https://www.intel.com/content/www/us/en/processors/xeon/scalable/xeon-scalable-platform.html).

```
usage: python imagenet_gen_qsym_onednn.py [-h] [--model MODEL] [--epoch EPOCH]
[--no-pretrained] [--batch-size BATCH_SIZE]
[--calib-dataset CALIB_DATASET]
[--image-shape IMAGE_SHAPE]
[--data-nthreads DATA_NTHREADS]
[--num-calib-batches NUM_CALIB_BATCHES]
[--exclude-first-conv] [--shuffle-dataset]
[--calib-mode CALIB_MODE]
[--quantized-dtype {auto,int8,uint8}]
[--quiet]

Generate a calibrated quantized model from a FP32 model with Intel oneDNN support

optional arguments:
-h, --help show this help message and exit
--model MODEL model to be quantized. If no-pretrained is set then
model must be provided to `model` directory in the same path
as this python script, default is `resnet50_v1`
--epoch EPOCH number of epochs, default is `0`
--no-pretrained If enabled, will not download pretrained model from
MXNet or Gluon-CV modelzoo, default is `False`
--batch-size BATCH_SIZE
batch size to be used when calibrating model, default is `32`
--calib-dataset CALIB_DATASET
path of the calibration dataset, default is `data/val_256_q90.rec`
--image-shape IMAGE_SHAPE
number of channels, height and width of input image separated by comma,
default is `3,224,224`
--data-nthreads DATA_NTHREADS
number of threads for data loading, default is `0`
--num-calib-batches NUM_CALIB_BATCHES
number of batches for calibration, default is `10`
--exclude-first-conv excluding quantizing the first conv layer since the
input data may have negative value which doesn't
support at moment
--shuffle-dataset shuffle the calibration dataset
--calib-mode CALIB_MODE
calibration mode used for generating calibration table
for the quantized symbol; supports 1. none: no
calibration will be used. The thresholds for
quantization will be calculated on the fly. This will
result in inference speed slowdown and loss of
accuracy in general. 2. naive: simply take min and max
values of layer outputs as thresholds for
quantization. In general, the inference accuracy
worsens with more examples used in calibration. It is
recommended to use `entropy` mode as it produces more
accurate inference results. 3. entropy: calculate KL
divergence of the FP32 output and quantized output for
optimal thresholds. This mode is expected to produce
the best inference accuracy of all three kinds of
quantized models if the calibration dataset is
representative enough of the inference dataset.
default is `entropy`
--quantized-dtype {auto,int8,uint8}
quantization destination data type for input data,
default is `auto`
--quiet suppress most of log
```

A new benchmark script `launch_inference_onednn.sh` has been designed to launch performance benchmark for FP32 or INT8 image-classification models with Intel® oneDNN.
```
usage: bash ./launch_inference_onednn.sh -s symbol_file [-b batch_size] [-iter iteraton] [-ins instance] [-c cores/instance] [-h]

arguments:
-h, --help show this help message and exit
-s, --symbol_file symbol file for benchmark, required
-b, --batch_size inference batch size
default: 64
-iter, --iteration inference iteration
default: 500
-ins, --instance launch multi-instance inference
default: one instance per socket
-c, --core number of cores per instance
default: divide full physical cores

example: resnet INT8 performance benchmark on c5.24xlarge(duo sockets, 24 physical cores per socket).

bash ./launch_inference_onednn.sh -s ./model/resnet50_v1-quantized-5batches-naive-symbol.json

will launch two instances for throughput benchmark and each instance will use 24 physical cores.
```

The following models have been tested on Linux systems. Accuracy is collected on Intel XEON Cascade Lake CPU. For CPU with Skylake Lake or eariler architecture, the accuracy may not be the same.
| Model | Source | Dataset | FP32 Accuracy (top-1/top-5)| INT8 Accuracy (top-1/top-5)|
|:---|:---|---|:---:|:---:|
| ResNet18-V1 | [MXNet ModelZoo](https://github.com/apache/incubator-mxnet/tree/master/python/mxnet/gluon/model_zoo) | [Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec) |70.45%/89.55%|70.22%/89.38%|
| ResNet50-V1 | [MXNet ModelZoo](https://github.com/apache/incubator-mxnet/tree/master/python/mxnet/gluon/model_zoo) | [Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec) |76.36%/93.49%|76.04%/93.30%|
| ResNet101-V1 | [MXNet ModelZoo](https://github.com/apache/incubator-mxnet/tree/master/python/mxnet/gluon/model_zoo) | [Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec) |78.23%/93.99%|77.85%/93.69%|
| MobileNet v2 1.0 | [MXNet ModelZoo](https://github.com/apache/incubator-mxnet/tree/master/python/mxnet/gluon/model_zoo) | [Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec) |71.72%/90.28%|71.22%/89.92%|
| VGG16 | [MXNet ModelZoo](https://github.com/apache/incubator-mxnet/tree/master/python/mxnet/gluon/model_zoo) | [Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec) |72.83%/91.11%|72.81%/91.10%|
| VGG19 | [MXNet ModelZoo](https://github.com/apache/incubator-mxnet/tree/master/python/mxnet/gluon/model_zoo) | [Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec) |73.67%/91.63%|73.67%/91.67%|
*Measured on validation ImageNet (ILSVRC2012) with batch-size=64, num-calib-batches=10 and calib-mode=entropy*

<h3>Pre-trained Model</h3>

The following command is to download the pre-trained model from [MXNet ModelZoo](http://data.mxnet.io/models/imagenet/resnet/152-layers/) and transfer it into the symbolic model which would be finally quantized. The [validation dataset](http://data.mxnet.io/data/val_256_q90.rec) is available for testing the pre-trained models:

```
python imagenet_gen_qsym_onednn.py --model=resnet50_v1 --num-calib-batches=5 --calib-mode=naive
```

The model would be automatically replaced in fusion and quantization format. It is then saved as the quantized symbol and parameter files in the `./model` directory. Set `--model` to one of above listed verified models to quantize them. The following command is to launch inference.

```
# Launch FP32 Inference
python imagenet_inference.py --symbol-file=./model/resnet50_v1-symbol.json --param-file=./model/resnet50_v1-0000.params --rgb-mean=0.485,0.456,0.406 --rgb-std=0.229,0.224,0.225 --num-skipped-batches=50 --batch-size=64 --num-inference-batches=500 --dataset=./data/val_256_q90.rec

# Launch INT8 Inference
python imagenet_inference.py --symbol-file=./model/resnet50_v1-quantized-5batches-naive-symbol.json --param-file=./model/resnet50_v1-quantized-0000.params --rgb-mean=0.485,0.456,0.406 --rgb-std=0.229,0.224,0.225 --num-skipped-batches=50 --batch-size=64 --num-inference-batches=500 --dataset=./data/val_256_q90.rec

# Launch dummy data Inference
bash ./launch_inference_onednn.sh -s ./model/resnet50_v1-symbol.json
bash ./launch_inference_onednn.sh -s ./model/resnet50_v1-quantized-5batches-naive-symbol.json
```

<h3 id='4'>Custom Model</h3>

This script also supports custom symbolic models. Quantization layer configs can easily be added in `imagenet_gen_qsym_onednn.py` like below:

```
if logger:
frameinfo = getframeinfo(currentframe())
logger.info(F'Please set proper RGB configs inside this script below {frameinfo.filename}:{frameinfo.lineno} for model {args.model}!')
# add rgb mean/std of your model.
rgb_mean = '0,0,0'
rgb_std = '0,0,0'
# add layer names that shouldn't be quantized.
if logger:
frameinfo = getframeinfo(currentframe())
logger.info(F'Please set proper excluded_sym_names inside this script below {frameinfo.filename}:{frameinfo.lineno} for model {args.model} if required!')
excluded_sym_names += []
if exclude_first_conv:
excluded_sym_names += []
```

Some tips on quantization configs:

1. First, data, symbol file (custom-symbol.json) and parameter file (custom-0000.params) of FP32 symbolic model should be prepared.
2. Then, following command should be run to verify that FP32 symbolic model runs inference as expected.

```
# Launch FP32 Inference
python imagenet_inference.py --symbol-file=./model/custom-symbol.json --param-file=./model/custom-0000.params --rgb-mean=* --rgb-std=* --num-skipped-batches=* --batch-size=* --num-inference-batches=*--dataset=./data/*
```

3. Proper `rgb_mean`, `rgb_std` and `excluded_sym_names` should be added in `imagenet_gen_qsym_onednn.py` script.

4. Run following command for quantization:

```
python imagenet_gen_qsym_onednn.py --model=custom --num-calib-batches=5 --calib-mode=naive
```

5. After quantization, the quantized symbol and parameter files will be saved in the `model/` directory.

6. Finally, INT8 inference can be run:

```
# Launch INT8 Inference
python imagenet_inference.py --symbol-file=./model/resnet50_v1-quantized-10batches-entropy-symbol.json --param-file=./model/resnet50_v1-quantized-10batches-entropy-0000.params --benchmark

# Launch dummy data Inference
bash ./launch_inference_onednn.sh -s ./model/*.json
```
Loading