Fix the bug of batch_norm and batch_norm_grad op. #38288

ZibinGuo · 2021-12-20T10:28:17Z

PR types

Bug fixes

PR changes

OPs

Describe

For batch_norm op, fixed the bug that the output saved_mean and saved_variance could not be initialized correctly when using global_stats mode.

For batch_norm_grad op, global_stats and inplace modes have been added. And added the function of whether to calculate d_x, d_bias, d_sclae separately.

Add the "roi_align" and "roi_align_grad" op in xpu2 op list. test=kunlun

CLAassistant · 2021-12-20T10:28:21Z

All committers have signed the CLA.

CLAassistant · 2021-12-20T10:28:21Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

Zibin seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

ZibinGuo · 2021-12-20T10:28:44Z

您的邮件已收到，我将尽快回信。谢谢您！Your email has been received and I will reply as soon as possible. Thank you！

tangzhiyi11 · 2021-12-20T10:59:08Z

paddle/fluid/operators/batch_norm_op_xpu.cc

如果dx/dscale/dbias为空，应该调用的是这几个临时变量？下面接口调用里没用到

对的，如果为空，即为dx/dscale/dbias不做回传，且从输出获得的指针为空。但xpu的api不支持单独计算dx/dscale/dbias的功能必须传入这三个参数且不能是空指针，只能传入临时变量，否则会报错。

d_x_data_tensor.mutable_data(ctx.GetPlace()); => d_x_data = d_x_data_tensor.mutable_data(ctx.GetPlace());

对的，如果为空，即为dx/dscale/dbias不做回传，且从输出获得的指针为空。但xpu的api不支持单独计算dx/dscale/dbias的功能必须传入这三个参数且不能是空指针，只能传入临时变量，否则会报错。

这种情况，可以在kernel里面增加判断，指针为空，就不要做后面的计算了，计算了，还耗时。

d_x_data_tensor.mutable_data(ctx.GetPlace()); => d_x_data = d_x_data_tensor.mutable_data(ctx.GetPlace());

已改

tangzhiyi11 · 2021-12-20T10:59:32Z

paddle/fluid/operators/batch_norm_op_xpu.cc

多余的注释去掉吧

多余的注释去掉吧

已改

tangzhiyi11 · 2021-12-20T11:02:59Z

paddle/fluid/operators/batch_norm_op_xpu.cc

mean/scale/bias这些类型应该都是float

这里是给mean_out，variance_out输出变量分配内存

最好直接用float类型或者封装一层，后续是有fp16需求的，这时候 T == fp16，这用T就不对了

最好直接用float类型或者封装一层，后续是有fp16需求的，这时候 T == fp16，这用T就不对了

已修改

taixiurong · 2021-12-20T12:03:36Z

paddle/fluid/operators/batch_norm_op_xpu.cc

这种定义的局部Tensor变量，考虑下生存周期，推荐使用RAII_GUARD.alloc_l3_or_gm 来分配一块内存。

这种定义的局部Tensor变量，考虑下生存周期，推荐使用RAII_GUARD.alloc_l3_or_gm 来分配一块内存。

已修改

taixiurong · 2021-12-20T12:08:03Z

paddle/fluid/operators/batch_norm_op_xpu.cc

能不能考虑把is_inplace, use_global_stats 作为参数，传给api，在api侧，增加kernel实现功能呢？

能不能考虑把is_inplace, use_global_stats 作为参数，传给api，在api侧，增加kernel实现功能呢？

因batch_norm算子内部比较复杂，修改起来周期可能会比较长。另修改起来对现有接口改动比较大，可能会影响到其他使用该api的地方，如果要修改可能需要另行实现一个batch_norm_v2的op。计划目前先通过此方法绕过，后续在针对性能进行优化。

taixiurong · 2021-12-20T12:09:27Z

paddle/fluid/operators/batch_norm_op_xpu.cc

对的，如果为空，即为dx/dscale/dbias不做回传，且从输出获得的指针为空。但xpu的api不支持单独计算dx/dscale/dbias的功能必须传入这三个参数且不能是空指针，只能传入临时变量，否则会报错。

这种情况，可以在kernel里面增加判断，指针为空，就不要做后面的计算了，计算了，还耗时。

tangzhiyi11 · 2021-12-22T09:43:51Z

paddle/fluid/platform/device/xpu/xpu2_op_list.h

顺序调整一下吧，按字母排序~

顺序调整一下吧，按字母排序~

已修改

taixiurong

这个相当于增加了batchnorm 这个op训练中一些属性改变的计算，可以在单测中增加 TestBatchNormOpTraining，增加测试用例。

ZibinGuo · 2021-12-23T08:16:47Z

这个相当于增加了batchnorm 这个op训练中一些属性改变的计算，可以在单测中增加 TestBatchNormOpTraining，增加测试用例。

已增加

tangzhiyi11 · 2021-12-23T08:28:08Z

LGTM

QingshuChen

commit信息增加test=kunlun, 触发KL2单测。

paddle/fluid/operators/batch_norm_op_xpu.cc

… and "roi_align_grad" op in xpu2 op list.

… and "roi_align_grad" op in xpu2 op list. test=kunlun

tangzhiyi11 reviewed Dec 20, 2021

View reviewed changes

taixiurong reviewed Dec 20, 2021

View reviewed changes

ZibinGuo force-pushed the develop branch 2 times, most recently from 4e1b8f6 to 99cf4c8 Compare December 22, 2021 09:41

tangzhiyi11 reviewed Dec 22, 2021

View reviewed changes

ZibinGuo force-pushed the develop branch 2 times, most recently from e86ddbb to 0d5fb59 Compare December 22, 2021 11:28

ZibinGuo changed the title ~~Fix the bug of batch_norm and batch_norm_grad op. Add the "roi_align"…~~ Fix the bug of batch_norm and batch_norm_grad op. Dec 22, 2021

taixiurong reviewed Dec 22, 2021

View reviewed changes

ZibinGuo force-pushed the develop branch from 0d5fb59 to 1699742 Compare December 23, 2021 08:14

QingshuChen reviewed Dec 28, 2021

View reviewed changes

paddle/fluid/operators/batch_norm_op_xpu.cc Outdated Show resolved Hide resolved

paddle/fluid/operators/batch_norm_op_xpu.cc Outdated Show resolved Hide resolved

ZibinGuo force-pushed the develop branch from d940096 to 70c7afd Compare December 28, 2021 11:48

Zibin and others added 2 commits December 29, 2021 06:39

Fix the bug of batch_norm and batch_norm_grad op. Add the "roi_align"…

f042d90

… and "roi_align_grad" op in xpu2 op list.

Fix the bug of batch_norm and batch_norm_grad op. Add the "roi_align"…

b2cd30b

… and "roi_align_grad" op in xpu2 op list. test=kunlun

ZibinGuo force-pushed the develop branch from 70c7afd to b2cd30b Compare December 29, 2021 06:40

QingshuChen approved these changes Dec 30, 2021

View reviewed changes

QingshuChen merged commit cc83c95 into PaddlePaddle:develop Dec 30, 2021

Fix the bug of batch_norm and batch_norm_grad op. #38288

Fix the bug of batch_norm and batch_norm_grad op. #38288

Uh oh!

Conversation

ZibinGuo commented Dec 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR types

PR changes

Describe

Uh oh!

CLAassistant commented Dec 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CLAassistant commented Dec 20, 2021

Uh oh!

ZibinGuo commented Dec 20, 2021 via email

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ZibinGuo Dec 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

taixiurong left a comment

Choose a reason for hiding this comment

Uh oh!

ZibinGuo commented Dec 23, 2021

Uh oh!

tangzhiyi11 commented Dec 23, 2021

Uh oh!

QingshuChen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ZibinGuo commented Dec 20, 2021 •

edited

Loading

CLAassistant commented Dec 20, 2021 •

edited

Loading

ZibinGuo Dec 20, 2021 •

edited

Loading