enable mkldnn_batch_norm #5049

tensor-tang · 2017-10-24T13:26:59Z

enable MKLDNNMatrix copy from CpuMatrix, then the MKLDNNMatrix mean and var can copy from moving mean and var.
add MKLDNNBatchNormLayer files
add unit test for mkldnn_batch_norm layer
add python interface for mkldnn_batch_norm type.
BTW, find a question that the comment
said use_cudnn should depends on cudnn_version , but it has never been used. Maybe forget add cudnn_version >= 4007 or been removed? @luotao1
add unit test for mkldnn_batch_norm branch test and simple net test

luotao1

希望下一个PR可以是修改macro CHECK。

luotao1 · 2017-10-25T09:03:57Z

paddle/gserver/layers/MKLDNNBatchNormLayer.cpp

+  } else {
+    movingMean->add(*mean_, movingAvgFraction_, 1.0 - movingAvgFraction_);
+    // here var is v^2
+    movingVar->add(*var_, movingAvgFraction_, 1.0 - movingAvgFraction_);


请问这里为什么需要if else。从if中看，mvMean 指向的也是 movingMean，那118行直接用121行可以么？

嗯，是可去掉的，是原先参考的里面有GPU的逻辑，所以遗留下来了。

luotao1 · 2017-10-25T09:04:50Z

paddle/gserver/layers/MKLDNNBatchNormLayer.cpp

+                                    MKLDNNMatrixPtr& bias,
+                                    MKLDNNMatrixPtr& out) {
+  // in training always calculate mean and var, so useGlobalStats must be false
+  // in test depends on useGlobalStats


145和146行，注释需要refine，语句不通顺。

luotao1 · 2017-10-25T09:05:50Z

paddle/gserver/layers/MKLDNNBatchNormLayer.cpp

+  if (passType_ != PASS_TEST && useGlobalStats_ == true) {
+    LOG(WARNING) << "use_global_stats is invalid setting in training phase";
+    useGlobalStats_ = false;
+  }


这里是不是应该直接报错，而不是把useGlobalStats_给改成false
!= PASS_TEST 可以直接用==PASS_TRAIN，下同

还是用修改比较好吧，因为CPU的code中是先覆盖的，并且如果是PASS_TRAIN的时候，useGlobalStats_就一直等于false，并且也没有报错。

关于用==PASS_TRAIN，我还是觉得保留原来的比较好，因为看到还有PASS_GC，这个pass也是考虑进去了。

luotao1 · 2017-10-25T09:07:08Z

paddle/gserver/layers/MKLDNNBatchNormLayer.cpp

+void MKLDNNBatchNormLayer::forward(PassType passType) {
+  MKLDNNLayer::forward(passType);
+
+  // calculating and saving moving mean and variance


calculate and save moving mean and variance

luotao1 · 2017-10-25T09:11:16Z

paddle/gserver/layers/MKLDNNBatchNormLayer.cpp

+  MKLDNNLayer::forward(passType);
+
+  // calculating and saving moving mean and variance
+  if (passType_ != PASS_TEST) {


同上，改成passType_ == PASS_TRAIN

luotao1 · 2017-10-25T09:20:04Z

paddle/gserver/layers/MKLDNNBatchNormLayer.h

+
+  // local mean and variance
+  MKLDNNMatrixPtr mean_;  // output of mkldnn: m
+  MKLDNNMatrixPtr var_;   // output of mkldnn: v^2


output of mkldnn: m 和 output of mkldnn: v^2，这两个注释没看懂

如果晦涩其实也可以去掉，我在前面再多注释点。

luotao1 · 2017-10-25T09:23:51Z

paddle/gserver/tests/test_MKLDNN.cpp

+  cfg.inputDefs.back().isStatic = true;
+  LayerInputConfig* input = cfg.layerConfig.add_inputs();
+  // TODO(TJ): uncomment me when refine and support comparing all zeroes vector
+  // cfg.layerConfig.set_active_type("relu");


请问237-238的todo是处理什么情况？

因为发现了加了relu之后，mkldnn恰好输出了全0的vector，cpu也是一样，但是我现在的gtest没有考虑到这种全0的情况，所以本来准备下一个PR来优化下test，顺便解决这个问题。

luotao1 · 2017-10-25T09:25:04Z

paddle/gserver/tests/test_MKLDNN.cpp

+  // for PASS_TRAIN, use_global_stats always should be false, and batchsize != 1
+  VLOG(MKLDNN_TESTS) << "check train phase";
+  dnnConfig.layerConfig.set_use_global_stats(false);
+  refConfig.layerConfig.set_use_global_stats(false);


如果上面直接报错的话，这里use_global_stats还要设置么？

向上面那样，我觉得还是不要报错比较好。
并且这里我认为还是要设置下比较好，毕竟不想要依靠默认值，要显示出来测试到了哪些情况比较好。

luotao1 · 2017-10-25T09:25:51Z

paddle/gserver/tests/test_MKLDNN.cpp

+  refConfig.layerConfig.set_use_global_stats(false);
+  MKLDNNTester tester;
+  tester.run(dnnConfig, refConfig, pm.bs, pm.ih, pm.iw, PASS_TRAIN);
+  // for PASS_TEST, check use_global_stats true and false, and batchsize 1


这里本来是想要用test，表示要测试到后面那些情况。但是前面有PASS_TEST，为了避免重复，所以用了check，表示检查后面那些情况。

luotao1 · 2017-10-25T09:32:20Z

paddle/trainer/tests/sample_trainer_config_branch_net.conf

+tmp = addto_layer(input=[c1, c2],
+            act=ReluActivation(),
+            bias_attr=False)
+


整个conf可以简化一下。

嗯是的，这里我准备去掉test compare那一部分。因为有一些地方与test_MKLDNN里面有branch测试有重复，并且compare那个是在trainer目录下，感觉也不太合理。所以下一个PR会统一refine这些test的。

luotao1 · 2017-10-25T09:43:30Z

said use_cudnn should depends on cudnn_version, but it has never been used. Maybe forget add cudnn_version >= 4007 or been removed?

这个可以单独提一个issue问一下

tensor-tang

好的，没问题。
那就先实现check macro，然后再refine unit test。

tensor-tang · 2017-10-25T09:43:10Z

paddle/gserver/layers/MKLDNNBatchNormLayer.cpp

+  } else {
+    movingMean->add(*mean_, movingAvgFraction_, 1.0 - movingAvgFraction_);
+    // here var is v^2
+    movingVar->add(*var_, movingAvgFraction_, 1.0 - movingAvgFraction_);


嗯，是可去掉的，是原先参考的里面有GPU的逻辑，所以遗留下来了。

tensor-tang · 2017-10-25T09:43:40Z

paddle/gserver/layers/MKLDNNBatchNormLayer.cpp

+                                    MKLDNNMatrixPtr& bias,
+                                    MKLDNNMatrixPtr& out) {
+  // in training always calculate mean and var, so useGlobalStats must be false
+  // in test depends on useGlobalStats


tensor-tang · 2017-10-25T12:46:07Z

paddle/gserver/layers/MKLDNNBatchNormLayer.cpp

+  if (passType_ != PASS_TEST && useGlobalStats_ == true) {
+    LOG(WARNING) << "use_global_stats is invalid setting in training phase";
+    useGlobalStats_ = false;
+  }


还是用修改比较好吧，因为CPU的code中是先覆盖的，并且如果是PASS_TRAIN的时候，useGlobalStats_就一直等于false，并且也没有报错。

关于用==PASS_TRAIN，我还是觉得保留原来的比较好，因为看到还有PASS_GC，这个pass也是考虑进去了。

tensor-tang · 2017-10-25T12:47:01Z

paddle/gserver/layers/MKLDNNBatchNormLayer.cpp

+void MKLDNNBatchNormLayer::forward(PassType passType) {
+  MKLDNNLayer::forward(passType);
+
+  // calculating and saving moving mean and variance


tensor-tang · 2017-10-25T12:47:12Z

paddle/gserver/layers/MKLDNNBatchNormLayer.cpp

+  MKLDNNLayer::forward(passType);
+
+  // calculating and saving moving mean and variance
+  if (passType_ != PASS_TEST) {


tensor-tang · 2017-10-25T12:53:36Z

paddle/gserver/layers/MKLDNNBatchNormLayer.h

+
+  // local mean and variance
+  MKLDNNMatrixPtr mean_;  // output of mkldnn: m
+  MKLDNNMatrixPtr var_;   // output of mkldnn: v^2


如果晦涩其实也可以去掉，我在前面再多注释点。

tensor-tang · 2017-10-25T12:55:50Z

paddle/gserver/tests/test_MKLDNN.cpp

+  cfg.inputDefs.back().isStatic = true;
+  LayerInputConfig* input = cfg.layerConfig.add_inputs();
+  // TODO(TJ): uncomment me when refine and support comparing all zeroes vector
+  // cfg.layerConfig.set_active_type("relu");


因为发现了加了relu之后，mkldnn恰好输出了全0的vector，cpu也是一样，但是我现在的gtest没有考虑到这种全0的情况，所以本来准备下一个PR来优化下test，顺便解决这个问题。

tensor-tang · 2017-10-25T12:56:41Z

paddle/gserver/tests/test_MKLDNN.cpp

+  // for PASS_TRAIN, use_global_stats always should be false, and batchsize != 1
+  VLOG(MKLDNN_TESTS) << "check train phase";
+  dnnConfig.layerConfig.set_use_global_stats(false);
+  refConfig.layerConfig.set_use_global_stats(false);


向上面那样，我觉得还是不要报错比较好。
并且这里我认为还是要设置下比较好，毕竟不想要依靠默认值，要显示出来测试到了哪些情况比较好。

tensor-tang · 2017-10-25T13:03:48Z

paddle/gserver/tests/test_MKLDNN.cpp

+  refConfig.layerConfig.set_use_global_stats(false);
+  MKLDNNTester tester;
+  tester.run(dnnConfig, refConfig, pm.bs, pm.ih, pm.iw, PASS_TRAIN);
+  // for PASS_TEST, check use_global_stats true and false, and batchsize 1


这里本来是想要用test，表示要测试到后面那些情况。但是前面有PASS_TEST，为了避免重复，所以用了check，表示检查后面那些情况。

tensor-tang · 2017-10-25T13:06:04Z

paddle/trainer/tests/sample_trainer_config_branch_net.conf

+tmp = addto_layer(input=[c1, c2],
+            act=ReluActivation(),
+            bias_attr=False)
+


嗯是的，这里我准备去掉test compare那一部分。因为有一些地方与test_MKLDNN里面有branch测试有重复，并且compare那个是在trainer目录下，感觉也不太合理。所以下一个PR会统一refine这些test的。

tensor-tang · 2017-10-25T15:24:45Z

@luotao1 travis-ci/pr挂了，但是好像跟我改的没什么关系
https://travis-ci.org/PaddlePaddle/Paddle/jobs/292627376#L845

File "scipy/linalg/setup.py", line 19, in configuration
raise NotFoundError('no lapack/blas resources found')
numpy.distutils.system_info.NotFoundError: no lapack/blas resources found

and

Running setup.py install for scipy ... error
Complete output from command /usr/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-build-gq75gQ/scipy/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-okUd9V-record/install-record.txt --single-version-externally-managed --compile:

luotao1 · 2017-10-26T01:54:37Z

I restart the travis-ci, and it is successful now.

tensor-tang added 3 commits October 24, 2017 21:13

enable copyFrom of MKLDNNMatrix

02fdf24

enable mkldnn_batch_norm layer

64eaeba

add unit test for mkldnn_batch_norm layer

ad6b531

tensor-tang force-pushed the mkldnn_bn branch from b0a3d85 to ad6b531 Compare October 24, 2017 14:42

tensor-tang added 2 commits October 24, 2017 23:23

add python interface of mkldnn_batch_norm

4d7eb09

add batchnorm layer in simple test and branch test

8845218

tensor-tang force-pushed the mkldnn_bn branch from b0ccd95 to 8845218 Compare October 24, 2017 16:12

tensor-tang requested a review from luotao1 October 25, 2017 01:57

luotao1 reviewed Oct 25, 2017

View reviewed changes

refine comment and code

7039479

tensor-tang commented Oct 25, 2017

View reviewed changes

Merge remote-tracking branch 'upstream/develop' into mkldnn_bn

5ba1e1e

luotao1 approved these changes Oct 26, 2017

View reviewed changes

luotao1 merged commit b68f2d2 into PaddlePaddle:develop Oct 26, 2017

tensor-tang deleted the mkldnn_bn branch October 26, 2017 04:01

enable mkldnn_batch_norm #5049

enable mkldnn_batch_norm #5049

Conversation

tensor-tang commented Oct 24, 2017 • edited Loading

luotao1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luotao1 commented Oct 25, 2017

tensor-tang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tensor-tang commented Oct 25, 2017 • edited Loading

luotao1 commented Oct 26, 2017

tensor-tang commented Oct 24, 2017 •

edited

Loading

tensor-tang commented Oct 25, 2017 •

edited

Loading