Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

MKL2017 implemented layers integration to improve IA perf. on mxnet #3581

Merged
merged 28 commits into from
Oct 21, 2016
Merged

MKL2017 implemented layers integration to improve IA perf. on mxnet #3581

merged 28 commits into from
Oct 21, 2016

Conversation

zhenlinluo
Copy link
Contributor

Hi @piiswrong ,

Pls review the latest MKL2017 implemented layers patches including conv, relu, lrn, batch-norm, concat, elementwise-sum, pooling and fc (total 8 layers). PR#3458 related code is merged into this. This version is putting MKL buffer inside layers and do conversion for input and output. Once your review is done, I will submit another PR to provide much higher perf. patch to pass MKL buffer between layers.

Pls kindly review ASAP.

Thanks

zhenlinluo and others added 25 commits October 19, 2016 23:28
Signed-off-by: Lingyan <lingyan.guo@intel.com>
Signed-off-by: Lingyan <lingyan.guo@intel.com>
Signed-off-by: Lingyan <lingyan.guo@intel.com>
Signed-off-by: Lingyan <lingyan.guo@intel.com>
Signed-off-by: Lingyan <lingyan.guo@intel.com>
Signed-off-by: Lingyan <lingyan.guo@intel.com>
Signed-off-by: Lingyan <lingyan.guo@intel.com>
Signed-off-by: Lingyan <lingyan.guo@intel.com>
Since mxnet update the new inception-bn model
do not need to add padding patch to use old model

Signed-off-by: Lingyan <lingyan.guo@intel.com>
Signed-off-by: Lingyan <lingyan.guo@intel.com>
Signed-off-by: Lingyan <lingyan.guo@intel.com>
res_convolutionBwdBias[dnnResourceDiffDst] =
bwdb_top_diff->get_converted_prv(grad.dptr_, false);
if (bwdb_bias_diff->conversion_needed()) {
MKL_DLOG(INFO) << "MKLCONV: bwd bias diff needs converted ";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't show these unless debugging

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@piiswrong
Copy link
Contributor

code looks good to me.
Please fix lint.
also remove debug print

@piiswrong
Copy link
Contributor

piiswrong commented Oct 20, 2016

please make sure test passes when USE_MKL2017=1:

nosetests --verbose tests/python/train
nosetests --verbose tests/python/unittests
nosetests --verbose tests/python/gpu/test_operator_gpu.py

Currently it fails for me with:

test_operator.test_convolution_grouping ... [00:42:07] /scratch/intel-mxnet/dmlc-core/include/dmlc/logging.h:235: [00:42:07] src/operator/mkl/mkl_memory.cc:51: Check failed: (status) == (E_SUCCESS) Failed creation convert_to_int with status -1 for buffer: fwd_filter_data   @ MKLConvolutionOp

[00:42:07] /scratch/intel-mxnet/dmlc-core/include/dmlc/logging.h:235: [00:42:07] src/engine/./threaded_engine.h:306: [00:42:07] src/operator/mkl/mkl_memory.cc:51: Check failed: (status) == (E_SUCCESS) Failed creation convert_to_int with status -1 for buffer: fwd_filter_data   @ MKLConvolutionOp

An fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.
terminate called after throwing an instance of 'dmlc::Error'
  what():  [00:42:07] src/engine/./threaded_engine.h:306: [00:42:07] src/operator/mkl/mkl_memory.cc:51: Check failed: (status) == (E_SUCCESS) Failed creation convert_to_int with status -1 for buffer: fwd_filter_data   @ MKLConvolutionOp

An fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.

@zhenlinluo
Copy link
Contributor Author

Eric, thanks for quick response. Debug msg and lint clean now. But I can not reproduce your issue in my side when running nosetest though there have two other issues when running unittest to follow up as below. I am using a clean CenOS and only enable USE_MKL2017=1 but keep USE_BLAS=atlas. Could you pls let me know your env. and more details then I can do further debug?

BTW, I will get back to you soon about below 2 issues. Actually if change 1e-6 to 1e-4 for deconv_grad, then it will pass.

FAIL: test_executor.test_dot

Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/home/zhenlin/Mxnet/upstream/intel-mxnet/PR/mxnet/tests/python/unittest/test_executor.py", line 102, in test_dot
sf = mx.symbol.dot)
File "/home/zhenlin/Mxnet/upstream/intel-mxnet/PR/mxnet/tests/python/unittest/test_executor.py", line 60, in check_bind_with_uniform
assert reldiff(rhs_grad.asnumpy(), rhs_grad2) < 1e-6
AssertionError

FAIL: test_operator.test_deconvolution

Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/home/zhenlin/Mxnet/upstream/intel-mxnet/PR/mxnet/tests/python/unittest/test_operator.py", line 694, in test_deconvolution
pad = (3,3)
File "/home/zhenlin/Mxnet/upstream/intel-mxnet/PR/mxnet/tests/python/unittest/test_operator.py", line 638, in check_deconvolution_gradient
assert reldiff(conv_args_grad[1].asnumpy(), deconv_args_grad[1].asnumpy()) < 1e-6
AssertionError


Ran 90 tests in 21.202s

@@ -15,6 +20,19 @@ Operator* CreateOp<cpu>(ConvolutionParam param, int dtype,
std::vector<TShape> *out_shape,
Context ctx) {
Operator *op = NULL;
#if MXNET_USE_MKL2017 == 1
if ((param.dilate[0] == 1 && param.dilate[1] == 1)
&& param.kernel.ndim() == 2) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does mkl convolution support grouping? if not should also check for it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, MKL support grouping. we use groups conv API in mkl_convolution-inl.h like below:
dnnGroupsConvolutionCreateForward
dnnGroupsConvolutionCreateBackwardData

@piiswrong
Copy link
Contributor

I'm using USE_CUDA=1, USE_CUDNN=1, USE_MKL2017=1

@zhenlinluo
Copy link
Contributor Author

Can you try to not change anything but just USE_MKL2017=1 in make/config.mk to see if still have issue? BTW, what HW are you running?

@piiswrong
Copy link
Contributor

tried only use mkl, python tests/python/unittest/test_operator.py gives same error:
MKL Build:20160413
[13:03:37] /scratch/intel-mxnet/dmlc-core/include/dmlc/logging.h:235: [13:03:37] src/operator/mkl/mkl_memory.cc:47: Check failed: (status) == (E_SUCCESS) Failed creation convert_to_int with status -1 for buffer: fwd_filter_data @ MKLConvolutionOp

[13:03:37] /scratch/intel-mxnet/dmlc-core/include/dmlc/logging.h:235: [13:03:37] src/engine/./threaded_engine.h:306: [13:03:37] src/operator/mkl/mkl_memory.cc:47: Check failed: (status) == (E_SUCCESS) Failed creation convert_to_int with status -1 for buffer: fwd_filter_data @ MKLConvolutionOp

An fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.
terminate called after throwing an instance of 'dmlc::Error'
what(): [13:03:37] src/engine/./threaded_engine.h:306: [13:03:37] src/operator/mkl/mkl_memory.cc:47: Check failed: (status) == (E_SUCCESS) Failed creation convert_to_int with status -1 for buffer: fwd_filter_data @ MKLConvolutionOp

An fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.
Aborted (core dumped)

@zhenlinluo
Copy link
Contributor Author

I know your issue. You pre-installed old MKL lib before which does not support MKL dnn API yet. By default, if you installed anaconda2, there will a old MKL lib in it. You need to

  1. remove old MKL version
  2. when do make, see if mklml package is downloaded
  3. export export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/mklml_lnx_2017.0.0.20160801/lib

Pls let me know if it works.

@piiswrong
Copy link
Contributor

didn't work.
Other mkldnn layers work fine. Only test_operator.test_convolution_grouping breaks

@piiswrong
Copy link
Contributor

could you show me your output from
nosetests --verbose tests/python/unittest/test_operator.py

@zhenlinluo
Copy link
Contributor Author

zhenlinluo commented Oct 20, 2016

Did you try "conda remove mkl mkl-service"? This group feature is supported in MKL20160801 or later. I know MKL env. setting is annoying. For quick test, you can remove anaconda2 then test out. If you set LD_LIBRARY_PATH to what you downloaded MKLML package, the version should be MKL20160801. My log as below:

[root@epp-hpc-h4-18 mxnet]# nosetests --verbose tests/python/unittest/test_operator.py
test_operator.test_elementwise_sum ... ok
test_operator.test_concat ... ok
test_operator.test_slice_channel ... ok
test_operator.test_regression ... ok
test_operator.test_softmax ... ok
test_operator.test_python_op ... ok
test_operator.test_swapaxes ... ok
test_operator.test_scalarop ... ok
test_operator.test_scalar_pow ... ok
test_operator.test_symbol_pow ... ok
test_operator.test_pow_fn ... ok
test_operator.test_embedding ... ok
test_operator.test_binary_op_duplicate_input ... ok
test_operator.test_sign ... ok
test_operator.test_round_ceil_floor ... ok
test_operator.test_rsqrt_cos_sin ... ok
test_operator.test_maximum_minimum ... ok
test_operator.test_maximum_minimum_scalar ... ok
test_operator.test_abs ... ok
test_operator.test_deconvolution ... MKL Build:20160801
FAIL
test_operator.test_nearest_upsampling ... ok
test_operator.test_batchnorm_training ... ok
test_operator.test_convolution_grouping ... ok
test_operator.test_broadcast_binary_op ... ok
test_operator.test_run_convolution_dilated_impulse_response ... ok
test_operator.test_convolution_dilated_impulse_response ... ok
test_operator.test_reshape ... [14:47:37] src/operator/./reshape-inl.h:270: Using target_shape will be deprecated.
ok
test_operator.test_reduce ... ok
test_operator.test_broadcast ... ok
test_operator.test_transpose ... ok
test_operator.test_expand_dims ... ok
test_operator.test_crop ... ok
test_operator.test_slice_axis ... ok
test_operator.test_flip ... ok
test_operator.test_stn ... ok
test_operator.test_dot ... ok
test_operator.test_batch_dot ... ok
test_operator.test_correlation ... ok
test_operator.test_support_vector_machine_l1_svm ... ok
test_operator.test_support_vector_machine_l2_svm ... ok
test_operator.test_roipooling ... ok

FAIL: test_operator.test_deconvolution

Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/home/zhenlin/Mxnet/upstream/intel-mxnet/PR/mxnet/tests/python/unittest/test_operator.py", line 694, in test_deconvolution
pad = (3,3)
File "/home/zhenlin/Mxnet/upstream/intel-mxnet/PR/mxnet/tests/python/unittest/test_operator.py", line 638, in check_deconvolution_gradient
assert reldiff(conv_args_grad[1].asnumpy(), deconv_args_grad[1].asnumpy()) < 1e-6
AssertionError


Ran 41 tests in 7.571s

FAILED (failures=1)

@piiswrong
Copy link
Contributor

Ok it works.
This is going to be a freqent problem for users though. Basically mkldnn is not compatible with anaconda, which a lot of users use.
Is it possible to set the linker order so that it works with anaconda?

@zhenlinluo
Copy link
Contributor Author

Thanks for suggestion. I did some debug. Actually this is because if anaconda is installed with mkl, numpy and related python will call the mkl and iomp5 lib in anaconda which is pre-defined.
find library=libiomp5.so [0]; searching
search path=/root/anaconda2/lib/python2.7/site-packages/numpy/core/../../../.. (RPATH from file /root/anaconda2/lib/python2.7/site-packag es/numpy/core/multiarray.so)
trying file=/root/anaconda2/lib/python2.7/site-packages/numpy/core/../../../../libiomp5.so

Because of this, the old MKL version in anaconda will be initialized and loaded so when calling new API, it will be wrong.
62803: calling init: /root/anaconda2/lib/python2.7/site-packages/numpy/core/../../../../libmkl_avx2.so
62803:
62803: python: error: symbol lookup error: undefined symbol: scalable_malloc (fatal)
MKL Build:20160413
[18:49:27] /home/zhenlin/Mxnet/upstream/intel-mxnet/intel-mxnet/dmlc-core/include/dmlc/logging.h:235: [18:49:27] src/operator/mkl/mkl_memory.cc:51: Check fai
led: (status) == (E_SUCCESS) Failed creation convert_to_int with status -1 for buffer: fwd_filter_data @ MKLConvolutionOp

[18:49:27] /home/zhenlin/Mxnet/upstream/intel-mxnet/intel-mxnet/dmlc-core/include/dmlc/logging.h:235: [18:49:27] src/engine/./threaded_engine.h:306: [18:49:27] src/operator/mkl/mkl_memory.cc:51: Check failed: (status) == (E_SUCCESS) Failed creation convert_to_int with status -1 for buffer: fwd_filter_data @ MKL
ConvolutionOp

It is not issue of mkldnn compatible with anaconda. It is the issue when installing 2 different MKL version in system to cause annoying. The quick way to resolve this is remove mkl in anaconda until anaconda integrate the latest MKL2017. Pls refer to https://www.continuum.io/blog/developer-blog/anaconda-25-release-now-mkl-optimizations

Since I believe anaconda will integrate the new MKL2017 soon, this issue will disappear automatically. I will put this into README as known issue. Also I will continue to look at this to see if have better w/a.

But I think this should not block the PR merge, right? Pls let me know.

@lingyanz
Copy link

hi all:
I build a pure cpu version , USE_MKL2017=0
I get below problem too for nosetests --verbose tests/python/unittests

FAIL: test_executor.test_dot

Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/home/lingyan/mkldnn_compare/mxnet_gcc/tests/python/unittest/test_executor.py", line 102, in test_dot
sf = mx.symbol.dot)
File "/home/lingyan/mkldnn_compare/mxnet_gcc/tests/python/unittest/test_executor.py", line 60, in check_bind_with_uniform
assert reldiff(rhs_grad.asnumpy(), rhs_grad2) < 1e-6
AssertionError


@piiswrong piiswrong merged commit 8e1e7f0 into apache:master Oct 21, 2016
@glingyan
Copy link
Contributor

For FAIL: test_executor.test_dot issue
it should be atlas library issue , mkl do not have problem
this should because atlas create NaN value for dot
-USE_BLAS = atlas
+USE_BLAS = mkl

@piiswrong
Copy link
Contributor

ok. Please fix the deconvolution test and we can merge

On Oct 20, 2016 6:25 PM, "Zhenlin" notifications@github.com wrote:

Thanks for suggestion. I did some debug. Actually this is because if
anaconda is installed with mkl, numpy and related python will call the mkl
and iomp5 lib in anaconda which is pre-defined.
find library=libiomp5.so [0]; searching
search path=/root/anaconda2/lib/python2.7/site-packages/numpy/core/../../../..
(RPATH from file /root/anaconda2/lib/python2.7/site-packag
es/numpy/core/multiarray.so)
trying file=/root/anaconda2/lib/python2.7/site-packages/numpy/
core/../../../../libiomp5.so

Because of this, the old MKL version in anaconda will be initialized and
loaded so when calling new API, it will be wrong.
62803: calling init: /root/anaconda2/lib/python2.7/
site-packages/numpy/core/../../../../libmkl_avx2.so
62803:

62803: python: error: symbol lookup error: undefined symbol:
scalable_malloc (fatal)
MKL Build:20160413
[18:49:27] /home/zhenlin/Mxnet/upstream/intel-mxnet/intel-mxnet/dmlc-
core/include/dmlc/logging.h:235: [18:49:27] src/operator/mkl/mkl_memory.cc:51:
Check fai
led: (status) == (E_SUCCESS) Failed creation convert_to_int with status -1
for buffer: fwd_filter_data @ MKLConvolutionOp

[18:49:27] /home/zhenlin/Mxnet/upstream/intel-mxnet/intel-mxnet/dmlc-
core/include/dmlc/logging.h:235: [18:49:27] src/engine/./threaded_engine.h:306:
[18:49:27] src/operator/mkl/mkl_memory.cc:51: Check failed: (status) ==
(E_SUCCESS) Failed creation convert_to_int with status -1 for buffer:
fwd_filter_data @ MKL
ConvolutionOp

It is not issue of mkldnn compatible with anaconda. It is the issue when
installing 2 different MKL version in system to cause annoying. The quick
way to resolve this is remove mkl in anaconda until anaconda integrate the
latest MKL2017. Pls refer to https://www.continuum.io/blog/
developer-blog/anaconda-25-release-now-mkl-optimizations

Since I believe anaconda will integrate the new MKL2017 soon, this issue
will disappear automatically. I will put this into README as known issue.
Also I will continue to look at this to see if have better w/a.

But I think this should not block the PR merge, right? Pls let me know.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#3581 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAiudFrivCuZtyqrp8jh1gD3B7hFKXh0ks5q2BSKgaJpZM4KbwUO
.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants