Multithreading scalability on Ernie INT8 with oneDNN and Resnet50 without MKLDNN on CPU

### paddle-deepmd multithreading without MKLDNN is worse than other frameworks
- Deepmd multithreading issue could been exported to a simple demo + Paddle without lammps, which is in deep_md_test.zip. Below text is just comparing tensorflow and Paddle, to reproduce deepmd + Paddle multithreading issue, you can skip below texts and go directly to https://github.com/PaddlePaddle/Paddle/issues/43215#issuecomment-1177071752
- Paddle Deepmd official website:
https://github.com/X4Science/paddle-deepmd
- To easier reproduce multhreading issue (that paddle-deepmd multithreading is worse than tf deepmd-kit) 
https://github.com/lidanqing-intel/deepmd-kit/blob/paddle-test/README.md

- Reproduction environments:
Paddle version: eca6638c599591c69fe40aa196f5fd42db7efbe2
Test machine: Intel(R) Xeon(R) Platinum 8352Y CPU @ 2.20GHz (ICX)
Performance result:
![image](https://user-images.githubusercontent.com/47151829/173769198-9d02c72e-55db-4c82-8743-10037162e682.png)

- Reproduce paddle-deepmd multithreading

```
git clone https://github.com/lidanqing-intel/deepmd-kit.git
git checkout paddle-test
bash compile_paddle.sh
source .bashrc
bash compile_deepmd.sh
bash compile_lammps.sh
cd setting/lmp
# single thread, single mpi and multi threads, multi mpi
bash lmp_pp.sh
```

- Reproduce tf-test multithreading

```
git clone https://github.com/lidanqing-intel/deepmd-kit.git
git checkout tf-test
bash compile_tf.sh
source .bashrc
bash compile_deepmd.sh
bash compile_lammps.sh
cd setting/lmp_tf
bash lmp_tf.sh
```

### PaddlePaddle- ernie3.0 INT8 with MKLDNN, try to improve multithreading scalability

Paddle: `0d719718b308587efcb6b3547f925582a8009176`
model download [https://paddlenlp.bj.bcebos.com/models/transformers/ernie_3.0/ernie3.0_medium_inference_models.zip](https://mailshield.baidu.com/check?q=TS%2fNwZHIpnVgMK7vcFSR0KUUAo4PicAHrvuEcXV80lwWmCMhf0C2ccx0x2rHGKI0wqDnNX7C0IxxQcpD5VpPlm8UPfZwCBkXac%2bjbBP9C4ZWlXTqZBJjPymWZsiz7qjY6ERMsSJvp2o%3d)
. After model distraction, there will be 4 files， (float32.pdmodel, float32.pdiparams)  and (int8.pdmodel, int8.pdiparams)
, which is float32 model and int8 quant model.

```
git clone https://github.com/PaddlePaddle/PaddleNLP.git
cd PaddleNLP
pip install -r requirements.txt
python setup.py install
cd model_zoo/ernie-3.0
 ```
- Ernie-3.0 FP32 mkldnn, 1 thread on ICX is 65.45 QPS
python infer.py --task_name tnews --model_path /home/guest/PaddleNLP/model_zoo/ernie-3.0/ernie-3.0/float32--perf --device cpu --num_threads 1

- Ernie-3.0 INT8 mkldnn, 1 thread on ICX is 153.77 QPS
python infer.py --task_name tnews --model_path /home/guest/PaddleNLP/model_zoo/ernie-3.0/ernie-3.0/int8  --perf --device cpu --num_threads 1 --enable_quantize

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Multithreading scalability on Ernie INT8 with oneDNN and Resnet50 without MKLDNN on CPU #43215

paddle-deepmd multithreading without MKLDNN is worse than other frameworks

PaddlePaddle- ernie3.0 INT8 with MKLDNN, try to improve multithreading scalability

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Multithreading scalability on Ernie INT8 with oneDNN and Resnet50 without MKLDNN on CPU #43215

Description

paddle-deepmd multithreading without MKLDNN is worse than other frameworks

PaddlePaddle- ernie3.0 INT8 with MKLDNN, try to improve multithreading scalability

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions