-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[MKLDNN] mkldnn RNN operator enhancement #17075
Conversation
`add` operation support Rename AddTo Add MXNET_USE_MKLDNN_RNN env Add Env var for switching to naive RNN impl and naive add/copy impl
@pengzhao-intel @TaoLv CI passed. Please take a review. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good
@@ -349,6 +349,10 @@ If ctypes is used, it must be `mxnet._ctypes.ndarray.NDArrayBase`. | |||
- Values: 0(false) or 1(true) ```(default=1)``` | |||
- If this variable is set, MXNet will simplify the computation graph, eliminating duplicated operations on the same inputs. | |||
|
|||
* MXNET_USE_MKLDNN_RNN | |||
- Values: 0(false) or 1(true) ```(default=1)``` | |||
- This variable controls whether to use the MKL-DNN backend in fused RNN operator for CPU context. There are two fusion implementations of RNN operator in MXNet. The MKL-DNN implementation has a better performance than the naive one, but the latter is more stable in the backward operation currently. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean the MKL-DNN fused kernel is not stable in backward pass? Or MKL-DNN version is not flexible as naive one due to some implementation limitation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is not stable in the backward pass. I have trained the bucketing model (https://github.com/apache/incubator-mxnet/tree/master/example/rnn/bucketing) with the backend of MKL-DNN RNN Backward. It resulted in a convergent optimizing curve. But it has not been tested in other applications for training a model. So I provided an env variable for users to switch to the naive implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, got it. I thought the results are not stable previously :) The similar description will be only verified with a limited but not broader test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the improvements. Merging now.
* mkldnn rnn operator enhancement `add` operation support Rename AddTo Add MXNET_USE_MKLDNN_RNN env Add Env var for switching to naive RNN impl and naive add/copy impl * Re-run CI, op:test_reduce failed on Unix-CPU * Rerun CI, Python2 CPU on Unix-CPU timeout
) * [MKLDNN] mkldnn RNN operator enhancement (#17075) * mkldnn rnn operator enhancement `add` operation support Rename AddTo Add MXNET_USE_MKLDNN_RNN env Add Env var for switching to naive RNN impl and naive add/copy impl * Re-run CI, op:test_reduce failed on Unix-CPU * Rerun CI, Python2 CPU on Unix-CPU timeout * MKL-DNN RNN backward path enhancement (#17183) * Flush memory before RNN backward primitive * Add gluon rnn unit test for gradients check * Cache reorder * Re-write rnn supporting check * Update OpSignature.AddSign to avoid potential hash collision for rnn-packed memory Get the data type from mkldnn memory descriptor when setting grad handle
Description
Enhancement for RNN operator.
Checklist
Changes
AddTo
request for output and gradients (Fused RNN Operators have nonsupport ofadd
grad_req with mkl-dnn #16578)MXNET_USE_MKLDNN_RNN
to fallback to the naive fused operatoratol
to its origin in the unit test of GRU (should be fixed by [mkldnn-v1.0] Use memcpy instead of set_handle with unidirectional 1-layer RNN #16663)@ciyongch @pengzhao-intel @TaoLv