This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
Open
Description
Currently we are still slow on CPU (#1222).
There are several things we can do:
- Do a good profile on a chosen set of benchmarks to understand the bottlenecks.
Here are some candidates:- MNIST CNN model: since the MNIST model is very small, performance will suffer if system overhead is high, and it will show us potential bottlenecks in non-CNN operations
- Cifar: this is much heavier than MNIST, so it will mostly show us the performance of the underlying library we are using, that is OpenBLAS/MKL + MShadow. I think the configuration (#of threads) of the libraries has quite a lot of impact on overall performance.
- Integrate libraries like NNPACK (https://github.com/Maratyszcza/NNPACK) and MKLDNN (https://software.intel.com/en-us/articles/deep-neural-network-technical-preview-for-intel-math-kernel-library-intel-mkl).
- Improve the operators not included in NNPACK and MKLDNN. This would include some code in MShadow and some in mxnet operators.