-
Couldn't load subscription status.
- Fork 5.9k
Description
paddle-deepmd multithreading without MKLDNN is worse than other frameworks
-
Deepmd multithreading issue could been exported to a simple demo + Paddle without lammps, which is in deep_md_test.zip. Below text is just comparing tensorflow and Paddle, to reproduce deepmd + Paddle multithreading issue, you can skip below texts and go directly to Multithreading scalability on Ernie INT8 with oneDNN and Resnet50 without MKLDNN on CPU #43215 (comment)
-
Paddle Deepmd official website:
https://github.com/X4Science/paddle-deepmd -
To easier reproduce multhreading issue (that paddle-deepmd multithreading is worse than tf deepmd-kit)
https://github.com/lidanqing-intel/deepmd-kit/blob/paddle-test/README.md -
Reproduction environments:
Paddle version: eca6638
Test machine: Intel(R) Xeon(R) Platinum 8352Y CPU @ 2.20GHz (ICX)
Performance result:

-
Reproduce paddle-deepmd multithreading
git clone https://github.com/lidanqing-intel/deepmd-kit.git
git checkout paddle-test
bash compile_paddle.sh
source .bashrc
bash compile_deepmd.sh
bash compile_lammps.sh
cd setting/lmp
# single thread, single mpi and multi threads, multi mpi
bash lmp_pp.sh
- Reproduce tf-test multithreading
git clone https://github.com/lidanqing-intel/deepmd-kit.git
git checkout tf-test
bash compile_tf.sh
source .bashrc
bash compile_deepmd.sh
bash compile_lammps.sh
cd setting/lmp_tf
bash lmp_tf.sh
PaddlePaddle- ernie3.0 INT8 with MKLDNN, try to improve multithreading scalability
Paddle: 0d719718b308587efcb6b3547f925582a8009176
model download https://paddlenlp.bj.bcebos.com/models/transformers/ernie_3.0/ernie3.0_medium_inference_models.zip
. After model distraction, there will be 4 files, (float32.pdmodel, float32.pdiparams) and (int8.pdmodel, int8.pdiparams)
, which is float32 model and int8 quant model.
git clone https://github.com/PaddlePaddle/PaddleNLP.git
cd PaddleNLP
pip install -r requirements.txt
python setup.py install
cd model_zoo/ernie-3.0
-
Ernie-3.0 FP32 mkldnn, 1 thread on ICX is 65.45 QPS
python infer.py --task_name tnews --model_path /home/guest/PaddleNLP/model_zoo/ernie-3.0/ernie-3.0/float32--perf --device cpu --num_threads 1 -
Ernie-3.0 INT8 mkldnn, 1 thread on ICX is 153.77 QPS
python infer.py --task_name tnews --model_path /home/guest/PaddleNLP/model_zoo/ernie-3.0/ernie-3.0/int8 --perf --device cpu --num_threads 1 --enable_quantize