[MKLDNN] mkldnn RNN operator enhancement #17075

xziya · 2019-12-15T02:16:13Z

Description

Enhancement for RNN operator.

Checklist

Changes

Correct the cast mistake (reinterpret_cast<desc> -> primitive_desc)
Support AddTo request for output and gradients (Fused RNN Operators have nonsupport of add grad_req with mkl-dnn #16578)
Use MXNET_USE_MKLDNN_RNN to fallback to the naive fused operator
Move the atol to its origin in the unit test of GRU (should be fixed by [mkldnn-v1.0] Use memcpy instead of set_handle with unidirectional 1-layer RNN #16663)

`add` operation support Rename AddTo Add MXNET_USE_MKLDNN_RNN env Add Env var for switching to naive RNN impl and naive add/copy impl

xziya · 2019-12-16T04:43:52Z

@pengzhao-intel @TaoLv CI passed. Please take a review. Thanks.

ciyongch

Overall looks good

ciyongch · 2019-12-16T05:57:12Z

docs/static_site/src/pages/api/faq/env_var.md

@@ -349,6 +349,10 @@ If ctypes is used, it must be `mxnet._ctypes.ndarray.NDArrayBase`.
  - Values: 0(false) or 1(true) ```(default=1)```
  - If this variable is set, MXNet will simplify the computation graph, eliminating duplicated operations on the same inputs.

+* MXNET_USE_MKLDNN_RNN
+  - Values: 0(false) or 1(true) ```(default=1)```
+  - This variable controls whether to use the MKL-DNN backend in fused RNN operator for CPU context. There are two fusion implementations of RNN operator in MXNet. The MKL-DNN implementation has a better performance than the naive one, but the latter is more stable in the backward operation currently.


Do you mean the MKL-DNN fused kernel is not stable in backward pass? Or MKL-DNN version is not flexible as naive one due to some implementation limitation?

I think it is not stable in the backward pass. I have trained the bucketing model (https://github.com/apache/incubator-mxnet/tree/master/example/rnn/bucketing) with the backend of MKL-DNN RNN Backward. It resulted in a convergent optimizing curve. But it has not been tested in other applications for training a model. So I provided an env variable for users to switch to the naive implementation.

Ok, got it. I thought the results are not stable previously :) The similar description will be only verified with a limited but not broader test cases.

ciyongch

LGTM

pengzhao-intel

Thanks for the improvements. Merging now.

* mkldnn rnn operator enhancement `add` operation support Rename AddTo Add MXNET_USE_MKLDNN_RNN env Add Env var for switching to naive RNN impl and naive add/copy impl * Re-run CI, op:test_reduce failed on Unix-CPU * Rerun CI, Python2 CPU on Unix-CPU timeout

) * [MKLDNN] mkldnn RNN operator enhancement (#17075) * mkldnn rnn operator enhancement `add` operation support Rename AddTo Add MXNET_USE_MKLDNN_RNN env Add Env var for switching to naive RNN impl and naive add/copy impl * Re-run CI, op:test_reduce failed on Unix-CPU * Rerun CI, Python2 CPU on Unix-CPU timeout * MKL-DNN RNN backward path enhancement (#17183) * Flush memory before RNN backward primitive * Add gluon rnn unit test for gradients check * Cache reorder * Re-write rnn supporting check * Update OpSignature.AddSign to avoid potential hash collision for rnn-packed memory Get the data type from mkldnn memory descriptor when setting grad handle

mkldnn rnn operator enhancement

a301353

`add` operation support Rename AddTo Add MXNET_USE_MKLDNN_RNN env Add Env var for switching to naive RNN impl and naive add/copy impl

xziya requested review from aaronmarkham and szha as code owners December 15, 2019 02:16

xziya added 2 commits December 15, 2019 17:52

Re-run CI, op:test_reduce failed on Unix-CPU

2d8cdd8

Rerun CI, Python2 CPU on Unix-CPU timeout

72a1868

pengzhao-intel added the MKLDNN label Dec 16, 2019

ciyongch reviewed Dec 16, 2019

View reviewed changes

ciyongch approved these changes Dec 16, 2019

View reviewed changes

pengzhao-intel approved these changes Dec 16, 2019

View reviewed changes

pengzhao-intel merged commit 897f4fa into apache:master Dec 16, 2019

TaoLv mentioned this pull request Jan 2, 2020

Fused RNN Operators have nonsupport of add grad_req with mkl-dnn #16578

Closed

xziya mentioned this pull request Jan 6, 2020

[v1.6.x] Cherry-pick MKL-DNN Rnn operator enhancements to v1.6.x #17225

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MKLDNN] mkldnn RNN operator enhancement #17075

[MKLDNN] mkldnn RNN operator enhancement #17075

xziya commented Dec 15, 2019 •

edited

Loading

xziya commented Dec 16, 2019

ciyongch left a comment

ciyongch Dec 16, 2019

xziya Dec 16, 2019

ciyongch Dec 16, 2019

ciyongch left a comment

pengzhao-intel left a comment

[MKLDNN] mkldnn RNN operator enhancement #17075

[MKLDNN] mkldnn RNN operator enhancement #17075

Conversation

xziya commented Dec 15, 2019 • edited Loading

Description

Checklist

Changes

xziya commented Dec 16, 2019

ciyongch left a comment

Choose a reason for hiding this comment

ciyongch Dec 16, 2019

Choose a reason for hiding this comment

xziya Dec 16, 2019

Choose a reason for hiding this comment

ciyongch Dec 16, 2019

Choose a reason for hiding this comment

ciyongch left a comment

Choose a reason for hiding this comment

pengzhao-intel left a comment

Choose a reason for hiding this comment

xziya commented Dec 15, 2019 •

edited

Loading