Skip to content

Multithreading scalability on Ernie INT8 with oneDNN and Resnet50 without MKLDNN on CPU #43215

@lidanqing-vv

Description

@lidanqing-vv

paddle-deepmd multithreading without MKLDNN is worse than other frameworks

git clone https://github.com/lidanqing-intel/deepmd-kit.git
git checkout paddle-test
bash compile_paddle.sh
source .bashrc
bash compile_deepmd.sh
bash compile_lammps.sh
cd setting/lmp
# single thread, single mpi and multi threads, multi mpi
bash lmp_pp.sh
  • Reproduce tf-test multithreading
git clone https://github.com/lidanqing-intel/deepmd-kit.git
git checkout tf-test
bash compile_tf.sh
source .bashrc
bash compile_deepmd.sh
bash compile_lammps.sh
cd setting/lmp_tf
bash lmp_tf.sh

PaddlePaddle- ernie3.0 INT8 with MKLDNN, try to improve multithreading scalability

Paddle: 0d719718b308587efcb6b3547f925582a8009176
model download https://paddlenlp.bj.bcebos.com/models/transformers/ernie_3.0/ernie3.0_medium_inference_models.zip
. After model distraction, there will be 4 files, (float32.pdmodel, float32.pdiparams) and (int8.pdmodel, int8.pdiparams)
, which is float32 model and int8 quant model.

git clone https://github.com/PaddlePaddle/PaddleNLP.git
cd PaddleNLP
pip install -r requirements.txt
python setup.py install
cd model_zoo/ernie-3.0
  • Ernie-3.0 FP32 mkldnn, 1 thread on ICX is 65.45 QPS
    python infer.py --task_name tnews --model_path /home/guest/PaddleNLP/model_zoo/ernie-3.0/ernie-3.0/float32--perf --device cpu --num_threads 1

  • Ernie-3.0 INT8 mkldnn, 1 thread on ICX is 153.77 QPS
    python infer.py --task_name tnews --model_path /home/guest/PaddleNLP/model_zoo/ernie-3.0/ernie-3.0/int8 --perf --device cpu --num_threads 1 --enable_quantize

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions