Skip to content

Conversation

@ZibinGuo
Copy link
Contributor

@ZibinGuo ZibinGuo commented Dec 20, 2021

PR types

Bug fixes

PR changes

OPs

Describe

For batch_norm op, fixed the bug that the output saved_mean and saved_variance could not be initialized correctly when using global_stats mode.

For batch_norm_grad op, global_stats and inplace modes have been added. And added the function of whether to calculate d_x, d_bias, d_sclae separately.

Add the "roi_align" and "roi_align_grad" op in xpu2 op list. test=kunlun

@CLAassistant
Copy link

CLAassistant commented Dec 20, 2021

CLA assistant check
All committers have signed the CLA.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Zibin seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@ZibinGuo
Copy link
Contributor Author

ZibinGuo commented Dec 20, 2021 via email

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果dx/dscale/dbias为空,应该调用的是这几个临时变量?下面接口调用里没用到

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

对的,如果为空,即为dx/dscale/dbias不做回传,且从输出获得的指针为空。但xpu的api不支持单独计算dx/dscale/dbias的功能必须传入这三个参数且不能是空指针,只能传入临时变量,否则会报错。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d_x_data_tensor.mutable_data(ctx.GetPlace()); => d_x_data = d_x_data_tensor.mutable_data(ctx.GetPlace());

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

对的,如果为空,即为dx/dscale/dbias不做回传,且从输出获得的指针为空。但xpu的api不支持单独计算dx/dscale/dbias的功能必须传入这三个参数且不能是空指针,只能传入临时变量,否则会报错。

这种情况,可以在kernel里面增加判断,指针为空,就不要做后面的计算了,计算了,还耗时。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d_x_data_tensor.mutable_data(ctx.GetPlace()); => d_x_data = d_x_data_tensor.mutable_data(ctx.GetPlace());

已改

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

多余的注释去掉吧

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

多余的注释去掉吧

已改

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mean/scale/bias这些类型应该都是float

Copy link
Contributor Author

@ZibinGuo ZibinGuo Dec 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里是给mean_out,variance_out输出变量分配内存

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

最好直接用float类型或者封装一层,后续是有fp16需求的,这时候 T == fp16,这用T就不对了

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

最好直接用float类型或者封装一层,后续是有fp16需求的,这时候 T == fp16,这用T就不对了

已修改

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这种定义的局部Tensor变量,考虑下生存周期,推荐使用RAII_GUARD.alloc_l3_or_gm 来分配一块内存。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这种定义的局部Tensor变量,考虑下生存周期,推荐使用RAII_GUARD.alloc_l3_or_gm 来分配一块内存。

已修改

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

能不能考虑把is_inplace, use_global_stats 作为参数,传给api, 在api侧,增加kernel实现功能呢?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

能不能考虑把is_inplace, use_global_stats 作为参数,传给api, 在api侧,增加kernel实现功能呢?

因batch_norm算子内部比较复杂,修改起来周期可能会比较长。另修改起来对现有接口改动比较大,可能会影响到其他使用该api的地方,如果要修改可能需要另行实现一个batch_norm_v2的op。计划目前先通过此方法绕过,后续在针对性能进行优化。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

对的,如果为空,即为dx/dscale/dbias不做回传,且从输出获得的指针为空。但xpu的api不支持单独计算dx/dscale/dbias的功能必须传入这三个参数且不能是空指针,只能传入临时变量,否则会报错。

这种情况,可以在kernel里面增加判断,指针为空,就不要做后面的计算了,计算了,还耗时。

@ZibinGuo ZibinGuo force-pushed the develop branch 2 times, most recently from 4e1b8f6 to 99cf4c8 Compare December 22, 2021 09:41
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

顺序调整一下吧,按字母排序~

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

顺序调整一下吧,按字母排序~

已修改

@ZibinGuo ZibinGuo force-pushed the develop branch 2 times, most recently from e86ddbb to 0d5fb59 Compare December 22, 2021 11:28
@ZibinGuo ZibinGuo changed the title Fix the bug of batch_norm and batch_norm_grad op. Add the "roi_align"… Fix the bug of batch_norm and batch_norm_grad op. Dec 22, 2021
Copy link
Contributor

@taixiurong taixiurong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个相当于增加了batchnorm 这个op训练中一些属性改变的计算,可以在单测中增加 TestBatchNormOpTraining, 增加测试用例。

@ZibinGuo
Copy link
Contributor Author

这个相当于增加了batchnorm 这个op训练中一些属性改变的计算,可以在单测中增加 TestBatchNormOpTraining, 增加测试用例。

已增加

@tangzhiyi11
Copy link
Contributor

LGTM

Copy link
Contributor

@QingshuChen QingshuChen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit信息增加test=kunlun, 触发KL2单测。

Zibin and others added 2 commits December 29, 2021 06:39
… and "roi_align_grad" op in xpu2 op list. test=kunlun
@QingshuChen QingshuChen merged commit cc83c95 into PaddlePaddle:develop Dec 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants