MKLDNN Backward op cache #11301

ZhennanQin · 2018-06-15T07:25:31Z

Hi all,

For MKLDNN, creating operator primitive and corresponding memory will spend a lot of time, for many scenarios, the created primitive can be reused if it meets requirement for next computing. In this PR, we implemented the caching mechanism for most backward operators, just like the optimization have done for forward operator. Correctness test and accuracy test are all PASS. Performance test shown that most models can get speed up in training. Please review this, Thanks.

@zheng-da, @marcoabreu, @azai91, @TaoLv, @pengzhao-intel

This PR covers below commits.
Enable primitive allocation cache for _backward_LRN
Enable primitive allocation cache for _backward_Pooling.
Enable primitive allocation cache for _backward_Activation.
Enable primitive allocation cache for _backward_Deconvolution.
Enable primitive allocation cache for _backward_BatchNorm.
Enable primitive allocation cache for _backward_Fully_Connected.
Enable primitive allocation cache for _backward_Convolution.

Correntness test result:

unit_test - PASS

Train model with dummy data when MXNET_MKLDNN_DEBUG=1

model	Status
alexnet	PASS
googlenet	PASS
vgg-16	PASS
vgg-19	PASS
inception-bn	PASS
inception-v3	PASS
resnet-50	PASS
resnet-152	PASS

Accuracy test result:

CiFAR10 + ResNet50 ( Convergence ): Validation accuracy of 99 epochs ( top1: 67.88%, top5: 96.26%)

CiFAR10 + VGG16 ( Convergence ): Validation accuracy of 74 epochs ( top1: 82.56%, top5: 98.52% )

Performance test result:

Skylake 8180 2sockets training BS=32 (img/s)

Model	Baseline	After Caching Backward Op	Diff%
alexnet	366.0699	412.3794	112.65%
googlenet	104.3128	152.8598	146.54%
vgg-16	38.40072	39.65964	103.28%
vgg-19	32.35834	33.80966	104.49%
inception-bn	85.71482	124.4676	145.21%
inception-v3	43.0482	57.82697	134.33%
resnet-50	53.15957	65.68513	123.56%
resnet-152	21.67982	28.26535	130.38%
inception-v4	23.82474	29.33314	123.12%

Total Time consumption breakdown(ms)

Model	Baseline	After caching Backward Op	Diff%
Convolution	19440.11	19441.49	100.01%
Activation	11253.48	9493.82	84.36%
LRN	43.325	44.281	102.21%
Pooling	1002.97	1037.459	103.44%
Flatten	36.84	37.138	100.81%
FullyConnected	1135.342	1063.388	93.66%
Dropout	210.459	207.743	98.71%
SoftmaxOutput	6.944	6.428	92.57%
_backward_SoftmaxOutput	3.627	4.424	121.97%
_backward_FullyConnected	1792.938	1806.433	100.75%
_backward_Dropout	3.977	3.586	90.17%
_backward_Activation	11083.87	9536.887	86.04%
_zeros	184.517	175.302	95.01%
_backward_copy	1.076	0.917	85.22%
_backward_Pooling	2291.866	2056.797	89.74%
_backward_Convolution	63134.26	46223.86	73.22%
_backward_LRN	185.551	130.467	70.31%
Concat	594.403	618.61	104.07%
_backward_Concat	703.62	708.137	100.64%
add_n	2501.635	2556.075	102.18%
BatchNorm	5815.348	5797.396	99.69%
_backward_BatchNorm	10238.95	7818.107	76.36%
_copy	339.458	351.205	103.46%
elemwise_add	909.67	925.058	101.69%

merge with official master

pengzhao-intel · 2018-06-28T00:50:49Z

ping @zheng-da

Change-Id: Iefe9f720de719ec2e2f5d24a006602425136711b

Change-Id: Idbe94e21f1e2ddf711523767194b95beda19b120

Change-Id: I545628ff68a54cb01b7fef323dc3de9bd47b1a19

Change-Id: I1e9bf1b9b44bae52068a9c564dff037851e896e5

Change-Id: I9e52651bd830b8cb5d2f193076ef51606c9056f9

Change-Id: I0496fa2394ee036d05c58f3abc1d74af544c7bca

Change-Id: I8347527ec1271b1518921a74e3581d7d84187429

zheng-da · 2018-07-07T18:36:05Z

@ashokei could you also help review this PR?

zheng-da · 2018-07-07T18:38:50Z

src/operator/nn/lrn.cc

@@ -129,6 +129,8 @@ void LRNComputeExCPU(const nnvm::NodeAttrs &attrs,
    MKLDNN_OPCHECK_INIT(false, 1, inputs, outputs);
    MKLDNNLRNForward(ctx, param, inputs[0], req[0], outputs[0]);
    MKLDNN_OPCHECK_RUN(LRNCompute<cpu>, attrs, ctx, inputs, req, outputs);
+    // Copy outputs[1] from opcheck reference as backward check needs it.
+    MKLDNN_OPCHECK_COPY_RESULT(outputs, std::vector<size_t>{1});


do we have to it for all operators?

LRN is a special operator that MKLDNN only generates output[0], but for default cpu backward computing, it requires output[1] as well, making opcheck fails eventually. After copying output[1], this problem can be fixed.
Maybe you're thinking if it's necessary for all operators. Firstly, we only found this issue on LRN. Secondly, it would make results different before and after enabling opcheck from accuracy losing accumulation. Thirdly, it would hurt opcheck performance a lot. So I prefer not applying this to all operators.

thanks for explaining. i just didn't understand why we needed this.

zheng-da · 2018-07-07T18:39:55Z

src/operator/nn/mkldnn/mkldnn_act.cc

+                                                      fw_pdesc);
+    return bw_pdesc;
+  });
+  LOG(INFO) << "Unsupported data type for MKLDNN activation";


I think you should use LOG(FATAL) here.

it seems you changed the one above, but not this one.

zheng-da · 2018-07-07T18:40:40Z

src/operator/nn/mkldnn/mkldnn_act.cc

+  mkldnn::eltwise_forward::primitive_desc fw_pdesc(fw_desc, cpu_engine);
+  mkldnn::eltwise_backward::desc bw_desc(alg, diff_md, data_md, 0.0);
+  mkldnn::eltwise_backward::primitive_desc bw_pdesc(bw_desc, cpu_engine,
+                                                    fw_pdesc);


the code shouldn't run here, right? can we return an empty descriptor?

Yes, it should be a unreachable code. As it's hard to create a empty descriptor(mkldnn doesn't provide such a function to do that), I think fatal error is enough.

zheng-da · 2018-07-07T18:49:48Z

src/operator/nn/mkldnn/mkldnn_convolution.cc

+  }
+
+  void SetDataNewMem(const mkldnn::memory &out_grad, const mkldnn::memory &weight,
+                 const mkldnn::memory &in_grad) {


zheng-da · 2018-07-07T18:50:27Z

src/operator/nn/mkldnn/mkldnn_convolution.cc

+
+void SetWeightNewMem(const mkldnn::memory &data,
+                       const mkldnn::memory &out_grad,
+                       const mkldnn::memory &in_grad_weight) {


zheng-da · 2018-07-07T18:57:20Z

src/operator/nn/mkldnn/mkldnn_lrn-inl.h

@@ -133,13 +134,23 @@ void MKLDNNLRNFwd::_Init(const LRNParam &param,

 void MKLDNNLRNFwd::SetDataHandle(const NDArray &in_data,
                                 const NDArray &out_data) {


why in_data is not used?

It's used in line 140:
this->SetDataHandle(in_data, out_data_mem);

zheng-da · 2018-07-07T19:02:40Z

src/operator/nn/mkldnn/mkldnn_fully_connected.cc

@@ -82,6 +82,101 @@ inline static mkldnn::inner_product_backward_weights::primitive_desc GetIPBwdWei
  }
 }

+class MKLDNNFullyConnectForward {


could you take the caching of FullyConnected forward out and submit another PR for this?
my concern is that this PR is for caching backward. mixing it with all other backward operators might cause confusing later. thanks

OK. I will move FullyConnected forward out this PR.

FullyConnected forward PR see #11611.

zheng-da · 2018-07-07T19:04:28Z

@ZhennanQin sorry for the late reply. I don't see big problems in the PR.

marcoabreu · 2018-07-09T12:18:59Z

Sorry, I might be missing something, but I can't find any tests. @zheng-da @azai91 could you check please?

ashokei · 2018-07-09T23:15:31Z

changes look ok for me, great performance improvement! Since these are primarily for perf.optimization i assume existing unit-tests should be ok for functionality checking.

ZhennanQin · 2018-07-10T00:59:25Z

@marcoabreu Tests are covered by 2 parts:

The existing unit-tests.
Train model after enabling OPCHECK by environment variable MXNET_MKLDNN_DEBUG=1. This will trigger computing for each op twice, one in mkldnn pass, the other one in default cpu pass, then compare the results. If the results are different, it would generate error message.

This PR is like a refactoring work, which doesn't add any new feature or functionality, so above tests are enough.

marcoabreu · 2018-07-12T10:01:09Z

Excellent, thanks a lot!

ZhennanQin · 2018-07-16T06:42:04Z

@zheng-da @marcoabreu Code is refactored based on comments:

Remove FC forward change into another PR.
Change back to use DispatchMode::kFCompute for FC backward.(We still implemented FC backward op cache in this PR, but won't be used)

If there's no more comment, could you please approve and merge this ASAP? Thanks.

ZhennanQin · 2018-07-27T01:56:41Z

Ping @zheng-da @anirudh2290

lupesko · 2018-08-21T05:30:14Z

Thanks for the contribution @ZhennanQin !
Bouncing again for review: @anirudh2290 @zheng-da @azai91 @mseth10

zheng-da · 2018-08-21T05:31:32Z

This is reviewed. We agreed to merge this to 1.4 instead of in 1.3.

pengzhao-intel · 2018-08-26T03:40:08Z

@zheng-da The 1.3 branch is cut. I think it's the time to merge the PR into master now.
What's your opinion?

BTW, we will rebase the code soon :)

zheng-da · 2018-08-27T04:51:57Z

@pengzhao-intel yes. After you rebase successfully, i'll browse through the code again.

ZhennanQin · 2018-08-27T08:25:02Z

@zheng-da Code is rebased, please have a look. Thanks for engagement.

zheng-da · 2018-08-31T07:01:26Z

src/operator/nn/mkldnn/mkldnn_act.cc

-                                                  *diff_dst_memory,
-                                                  *diff_src_memory.second));
-  });
+  TmpMemMgr::Get()->Init(ctx.requested[activation::kTempSpace]);


why do you move it here?

Can't remember the details, maybe it's changed by mistake. I've changed it back.

zheng-da · 2018-08-31T07:19:04Z

everything else looks fine to me.

zheng-da · 2018-08-31T07:20:59Z

@azai91 @eric-haibin-lin could you also help review the PR?
The PR has been under review for more than two months. We should merge it if you guys think it looks good to you as well.

lupesko · 2018-09-11T08:34:26Z

@azai91 @eric-haibin-lin @anirudh2290 please review. This has been out for review for a long time...

* Enable primitive allocation cache for _backward_Activation. Change-Id: I545628ff68a54cb01b7fef323dc3de9bd47b1a19 * Enable primitive allocation cache for _backward_Deconvolution. Change-Id: I1e9bf1b9b44bae52068a9c564dff037851e896e5 * Enable primitive allocation cache for _backward_Pooling. Change-Id: Idbe94e21f1e2ddf711523767194b95beda19b120 * Enable primitive allocation cache for _backward_LRN. Change-Id: Iefe9f720de719ec2e2f5d24a006602425136711b * Enable primitive allocation cache for _backward_BatchNorm. Change-Id: I9e52651bd830b8cb5d2f193076ef51606c9056f9 * Enable primitive allocation cache for _backward_Convolution Change-Id: I0496fa2394ee036d05c58f3abc1d74af544c7bca * Enable primitive allocation cache for _backward_Fully_Connected Change-Id: I8347527ec1271b1518921a74e3581d7d84187429 * remove fc forward and fix indent problem * remove fc forward and fix convolution indent problem * Change log level to FATAL for unreachable code in mkldnn_act.cc * remove fc forward and fix convolution indent problem * remove useless hint in fc * Merge branch 'master' into backward_op_cache * Empty commit to retrigger the CI. * Change LOG(INFO) to LOG(FATAL) for unreachable code in mkldnn_act.cc * Fix build issue after code merge. * Fix lint after merge * Fix mkldnn act.

ZhennanQin force-pushed the backward_op_cache branch from 919626d to d035177 Compare June 15, 2018 07:39

Merge pull request #1 from apache/master

7c9e8e6

merge with official master

ZhennanQin force-pushed the backward_op_cache branch from d035177 to 2f64725 Compare June 21, 2018 02:25

Merge remote-tracking branch 'upstream/master'

a35bc9a

ZhennanQin force-pushed the backward_op_cache branch from 2f64725 to 739eaa2 Compare June 23, 2018 10:17

ZhennanQin and others added 8 commits June 29, 2018 08:44

Merge remote-tracking branch 'upstream/master'

ad90147

Enable primitive allocation cache for _backward_LRN.

f3f09b7

Change-Id: Iefe9f720de719ec2e2f5d24a006602425136711b

Enable primitive allocation cache for _backward_Pooling.

d6dc8a8

Change-Id: Idbe94e21f1e2ddf711523767194b95beda19b120

Enable primitive allocation cache for _backward_Activation.

9e107d2

Change-Id: I545628ff68a54cb01b7fef323dc3de9bd47b1a19

Enable primitive allocation cache for _backward_Deconvolution.

b2b71e1

Change-Id: I1e9bf1b9b44bae52068a9c564dff037851e896e5

Enable primitive allocation cache for _backward_BatchNorm.

a58ad33

Change-Id: I9e52651bd830b8cb5d2f193076ef51606c9056f9

Enable primitive allocation cache for _backward_Convolution

97e0d34

Change-Id: I0496fa2394ee036d05c58f3abc1d74af544c7bca

Enable primitive allocation cache for _backward_Fully_Connected

f7b9d30

Change-Id: I8347527ec1271b1518921a74e3581d7d84187429

ZhennanQin force-pushed the backward_op_cache branch from 739eaa2 to f7b9d30 Compare June 29, 2018 00:48

pengzhao-intel mentioned this pull request Jun 30, 2018

Fix InferStorage for sparse fallback in FullyConnected #11498

Merged

5 tasks

Merge branch 'master' into backward_op_cache

09ab93a

zheng-da reviewed Jul 7, 2018

View reviewed changes

remove fc forward and fix indent problem

e9f6a33

ZhennanQin requested a review from anirudh2290 as a code owner July 9, 2018 04:15

huangzhiyuan and others added 2 commits July 9, 2018 12:27

remove fc forward and fix convolution indent problem

2f3f436

Change log level to FATAL for unreachable code in mkldnn_act.cc

315abb8

huangzhiyuan mentioned this pull request Jul 9, 2018

MKLDNN Forward FullyConnected op cache #11611

Merged

7 tasks

huangzhiyuan added 2 commits July 11, 2018 09:33

remove fc forward and fix convolution indent problem

21b1a68

remove useless hint in fc

dea6f91

ZhennanQin and others added 3 commits July 13, 2018 08:47

Merge branch 'master' into backward_op_cache

dd07d9f

Merge branch 'master' into backward_op_cache

f160c11

Merge branch 'master' into backward_op_cache

89bafa8

ZhennanQin force-pushed the backward_op_cache branch from 002ad65 to 89bafa8 Compare July 16, 2018 02:07

Empty commit to retrigger the CI.

913a143

ZhennanQin added 3 commits July 16, 2018 15:33

Change LOG(INFO) to LOG(FATAL) for unreachable code in mkldnn_act.cc

75039e1

Merge branch 'master' into backward_op_cache

c8e976f

Fix build issue after code merge.

d92915b

anirudh2290 added MKLDNN Backend Issues related to the backend of MXNet pr-awaiting-review PR is waiting for code review labels Aug 9, 2018

Merge remote-tracking branch 'upstream/master' into backward_op_cache

e0805c8

Fix lint after merge

ae4a749

zheng-da reviewed Aug 31, 2018

View reviewed changes

Fix mkldnn act.

c34c603

zheng-da approved these changes Aug 31, 2018

View reviewed changes

eric-haibin-lin merged commit 741635a into apache:master Sep 13, 2018

ZhennanQin deleted the backward_op_cache branch October 10, 2018 00:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MKLDNN Backward op cache #11301

MKLDNN Backward op cache #11301

ZhennanQin commented Jun 15, 2018

pengzhao-intel commented Jun 28, 2018

zheng-da commented Jul 7, 2018

zheng-da Jul 7, 2018

ZhennanQin Jul 9, 2018

zheng-da Jul 9, 2018

zheng-da Jul 7, 2018

ZhennanQin Jul 9, 2018

zheng-da Jul 16, 2018

zheng-da Jul 7, 2018

ZhennanQin Jul 9, 2018

zheng-da Jul 7, 2018

zheng-da Jul 7, 2018

zheng-da Jul 7, 2018

ZhennanQin Jul 9, 2018

zheng-da Jul 7, 2018

ZhennanQin Jul 9, 2018

huangzhiyuan Jul 9, 2018

zheng-da commented Jul 7, 2018

marcoabreu commented Jul 9, 2018

ashokei commented Jul 9, 2018

ZhennanQin commented Jul 10, 2018

marcoabreu commented Jul 12, 2018

ZhennanQin commented Jul 16, 2018

ZhennanQin commented Jul 27, 2018

lupesko commented Aug 21, 2018

zheng-da commented Aug 21, 2018

pengzhao-intel commented Aug 26, 2018 •

edited

Loading

zheng-da commented Aug 27, 2018

ZhennanQin commented Aug 27, 2018

zheng-da Aug 31, 2018

ZhennanQin Aug 31, 2018

zheng-da commented Aug 31, 2018

zheng-da commented Aug 31, 2018

lupesko commented Sep 11, 2018

		@@ -133,13 +134,23 @@ void MKLDNNLRNFwd::_Init(const LRNParam &param,

		void MKLDNNLRNFwd::SetDataHandle(const NDArray &in_data,
		const NDArray &out_data) {

MKLDNN Backward op cache #11301

MKLDNN Backward op cache #11301

Conversation

ZhennanQin commented Jun 15, 2018

pengzhao-intel commented Jun 28, 2018

zheng-da commented Jul 7, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zheng-da commented Jul 7, 2018

marcoabreu commented Jul 9, 2018

ashokei commented Jul 9, 2018

ZhennanQin commented Jul 10, 2018

marcoabreu commented Jul 12, 2018

ZhennanQin commented Jul 16, 2018

ZhennanQin commented Jul 27, 2018

lupesko commented Aug 21, 2018

zheng-da commented Aug 21, 2018

pengzhao-intel commented Aug 26, 2018 • edited Loading

zheng-da commented Aug 27, 2018

ZhennanQin commented Aug 27, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zheng-da commented Aug 31, 2018

zheng-da commented Aug 31, 2018

lupesko commented Sep 11, 2018

pengzhao-intel commented Aug 26, 2018 •

edited

Loading