Skip to content

enable mkldnn_batch_norm #5049

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Oct 26, 2017
Merged

Conversation

tensor-tang
Copy link
Contributor

@tensor-tang tensor-tang commented Oct 24, 2017

  • enable MKLDNNMatrix copy from CpuMatrix, then the MKLDNNMatrix mean and var can copy from moving mean and var.

  • add MKLDNNBatchNormLayer files

  • add unit test for mkldnn_batch_norm layer

  • add python interface for mkldnn_batch_norm type.
    BTW, find a question that the comment
    said use_cudnn should depends on cudnn_version , but it has never been used. Maybe forget add cudnn_version >= 4007 or been removed? @luotao1

  • add unit test for mkldnn_batch_norm branch test and simple net test

Copy link
Contributor

@luotao1 luotao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

希望下一个PR可以是修改macro CHECK。

} else {
movingMean->add(*mean_, movingAvgFraction_, 1.0 - movingAvgFraction_);
// here var is v^2
movingVar->add(*var_, movingAvgFraction_, 1.0 - movingAvgFraction_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

请问这里为什么需要if else。从if中看,mvMean 指向的也是 movingMean,那118行直接用121行可以么?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

嗯,是可去掉的,是原先参考的里面有GPU的逻辑,所以遗留下来了。

MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) {
// in training always calculate mean and var, so useGlobalStats must be false
// in test depends on useGlobalStats
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

145和146行,注释需要refine,语句不通顺。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

if (passType_ != PASS_TEST && useGlobalStats_ == true) {
LOG(WARNING) << "use_global_stats is invalid setting in training phase";
useGlobalStats_ = false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里是不是应该直接报错,而不是把useGlobalStats_给改成false
!= PASS_TEST 可以直接用==PASS_TRAIN,下同

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

还是用修改比较好吧,因为CPU的code中是先覆盖的,并且如果是PASS_TRAIN的时候,useGlobalStats_就一直等于false,并且也没有报错。

关于用==PASS_TRAIN, 我还是觉得保留原来的比较好,因为看到还有PASS_GC,这个pass也是考虑进去了。

void MKLDNNBatchNormLayer::forward(PassType passType) {
MKLDNNLayer::forward(passType);

// calculating and saving moving mean and variance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calculate and save moving mean and variance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thx.

MKLDNNLayer::forward(passType);

// calculating and saving moving mean and variance
if (passType_ != PASS_TEST) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上,改成passType_ == PASS_TRAIN

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上


// local mean and variance
MKLDNNMatrixPtr mean_; // output of mkldnn: m
MKLDNNMatrixPtr var_; // output of mkldnn: v^2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output of mkldnn: m 和 output of mkldnn: v^2,这两个注释没看懂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果晦涩其实也可以去掉,我在前面再多注释点。

cfg.inputDefs.back().isStatic = true;
LayerInputConfig* input = cfg.layerConfig.add_inputs();
// TODO(TJ): uncomment me when refine and support comparing all zeroes vector
// cfg.layerConfig.set_active_type("relu");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

请问237-238的todo是处理什么情况?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

因为发现了加了relu之后,mkldnn恰好输出了全0的vector,cpu也是一样,但是我现在的gtest没有考虑到这种全0的情况,所以本来准备下一个PR来优化下test,顺便解决这个问题。

// for PASS_TRAIN, use_global_stats always should be false, and batchsize != 1
VLOG(MKLDNN_TESTS) << "check train phase";
dnnConfig.layerConfig.set_use_global_stats(false);
refConfig.layerConfig.set_use_global_stats(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果上面直接报错的话,这里use_global_stats还要设置么?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

向上面那样,我觉得还是不要报错比较好。
并且这里我认为还是要设置下比较好,毕竟不想要依靠默认值,要显示出来测试到了哪些情况比较好。

refConfig.layerConfig.set_use_global_stats(false);
MKLDNNTester tester;
tester.run(dnnConfig, refConfig, pm.bs, pm.ih, pm.iw, PASS_TRAIN);
// for PASS_TEST, check use_global_stats true and false, and batchsize 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check->use

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里本来是想要用test,表示要测试到后面那些情况。但是前面有PASS_TEST,为了避免重复,所以用了check,表示检查后面那些情况。

tmp = addto_layer(input=[c1, c2],
act=ReluActivation(),
bias_attr=False)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

整个conf可以简化一下。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

嗯是的,这里我准备去掉test compare那一部分。因为有一些地方与test_MKLDNN里面有branch测试有重复,并且compare那个是在trainer目录下,感觉也不太合理。所以下一个PR会统一refine这些test的。

@luotao1
Copy link
Contributor

luotao1 commented Oct 25, 2017

said use_cudnn should depends on cudnn_version, but it has never been used. Maybe forget add cudnn_version >= 4007 or been removed?

这个可以单独提一个issue问一下

Copy link
Contributor Author

@tensor-tang tensor-tang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,没问题。
那就先实现check macro,然后再refine unit test。

} else {
movingMean->add(*mean_, movingAvgFraction_, 1.0 - movingAvgFraction_);
// here var is v^2
movingVar->add(*var_, movingAvgFraction_, 1.0 - movingAvgFraction_);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

嗯,是可去掉的,是原先参考的里面有GPU的逻辑,所以遗留下来了。

MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) {
// in training always calculate mean and var, so useGlobalStats must be false
// in test depends on useGlobalStats
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

if (passType_ != PASS_TEST && useGlobalStats_ == true) {
LOG(WARNING) << "use_global_stats is invalid setting in training phase";
useGlobalStats_ = false;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

还是用修改比较好吧,因为CPU的code中是先覆盖的,并且如果是PASS_TRAIN的时候,useGlobalStats_就一直等于false,并且也没有报错。

关于用==PASS_TRAIN, 我还是觉得保留原来的比较好,因为看到还有PASS_GC,这个pass也是考虑进去了。

void MKLDNNBatchNormLayer::forward(PassType passType) {
MKLDNNLayer::forward(passType);

// calculating and saving moving mean and variance
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thx.

MKLDNNLayer::forward(passType);

// calculating and saving moving mean and variance
if (passType_ != PASS_TEST) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上


// local mean and variance
MKLDNNMatrixPtr mean_; // output of mkldnn: m
MKLDNNMatrixPtr var_; // output of mkldnn: v^2
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果晦涩其实也可以去掉,我在前面再多注释点。

cfg.inputDefs.back().isStatic = true;
LayerInputConfig* input = cfg.layerConfig.add_inputs();
// TODO(TJ): uncomment me when refine and support comparing all zeroes vector
// cfg.layerConfig.set_active_type("relu");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

因为发现了加了relu之后,mkldnn恰好输出了全0的vector,cpu也是一样,但是我现在的gtest没有考虑到这种全0的情况,所以本来准备下一个PR来优化下test,顺便解决这个问题。

// for PASS_TRAIN, use_global_stats always should be false, and batchsize != 1
VLOG(MKLDNN_TESTS) << "check train phase";
dnnConfig.layerConfig.set_use_global_stats(false);
refConfig.layerConfig.set_use_global_stats(false);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

向上面那样,我觉得还是不要报错比较好。
并且这里我认为还是要设置下比较好,毕竟不想要依靠默认值,要显示出来测试到了哪些情况比较好。

refConfig.layerConfig.set_use_global_stats(false);
MKLDNNTester tester;
tester.run(dnnConfig, refConfig, pm.bs, pm.ih, pm.iw, PASS_TRAIN);
// for PASS_TEST, check use_global_stats true and false, and batchsize 1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里本来是想要用test,表示要测试到后面那些情况。但是前面有PASS_TEST,为了避免重复,所以用了check,表示检查后面那些情况。

tmp = addto_layer(input=[c1, c2],
act=ReluActivation(),
bias_attr=False)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

嗯是的,这里我准备去掉test compare那一部分。因为有一些地方与test_MKLDNN里面有branch测试有重复,并且compare那个是在trainer目录下,感觉也不太合理。所以下一个PR会统一refine这些test的。

@tensor-tang
Copy link
Contributor Author

tensor-tang commented Oct 25, 2017

@luotao1 travis-ci/pr挂了, 但是好像跟我改的没什么关系
https://travis-ci.org/PaddlePaddle/Paddle/jobs/292627376#L845

File "scipy/linalg/setup.py", line 19, in configuration
raise NotFoundError('no lapack/blas resources found')
numpy.distutils.system_info.NotFoundError: no lapack/blas resources found

and

Running setup.py install for scipy ... error
Complete output from command /usr/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-build-gq75gQ/scipy/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-okUd9V-record/install-record.txt --single-version-externally-managed --compile:

@luotao1
Copy link
Contributor

luotao1 commented Oct 26, 2017

I restart the travis-ci, and it is successful now.

@luotao1 luotao1 merged commit b68f2d2 into PaddlePaddle:develop Oct 26, 2017
@tensor-tang tensor-tang deleted the mkldnn_bn branch October 26, 2017 04:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants