-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Fix the bug of batch_norm and batch_norm_grad op. #38288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Zibin seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
|
您的邮件已收到,我将尽快回信。谢谢您!Your email has been received and I will reply as soon as possible. Thank you!
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果dx/dscale/dbias为空,应该调用的是这几个临时变量?下面接口调用里没用到
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
对的,如果为空,即为dx/dscale/dbias不做回传,且从输出获得的指针为空。但xpu的api不支持单独计算dx/dscale/dbias的功能必须传入这三个参数且不能是空指针,只能传入临时变量,否则会报错。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
d_x_data_tensor.mutable_data(ctx.GetPlace()); => d_x_data = d_x_data_tensor.mutable_data(ctx.GetPlace());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
对的,如果为空,即为dx/dscale/dbias不做回传,且从输出获得的指针为空。但xpu的api不支持单独计算dx/dscale/dbias的功能必须传入这三个参数且不能是空指针,只能传入临时变量,否则会报错。
这种情况,可以在kernel里面增加判断,指针为空,就不要做后面的计算了,计算了,还耗时。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
d_x_data_tensor.mutable_data(ctx.GetPlace()); => d_x_data = d_x_data_tensor.mutable_data(ctx.GetPlace());
已改
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
多余的注释去掉吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
多余的注释去掉吧
已改
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mean/scale/bias这些类型应该都是float
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里是给mean_out,variance_out输出变量分配内存
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
最好直接用float类型或者封装一层,后续是有fp16需求的,这时候 T == fp16,这用T就不对了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
最好直接用float类型或者封装一层,后续是有fp16需求的,这时候 T == fp16,这用T就不对了
已修改
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这种定义的局部Tensor变量,考虑下生存周期,推荐使用RAII_GUARD.alloc_l3_or_gm 来分配一块内存。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这种定义的局部Tensor变量,考虑下生存周期,推荐使用RAII_GUARD.alloc_l3_or_gm 来分配一块内存。
已修改
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
能不能考虑把is_inplace, use_global_stats 作为参数,传给api, 在api侧,增加kernel实现功能呢?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
能不能考虑把is_inplace, use_global_stats 作为参数,传给api, 在api侧,增加kernel实现功能呢?
因batch_norm算子内部比较复杂,修改起来周期可能会比较长。另修改起来对现有接口改动比较大,可能会影响到其他使用该api的地方,如果要修改可能需要另行实现一个batch_norm_v2的op。计划目前先通过此方法绕过,后续在针对性能进行优化。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
对的,如果为空,即为dx/dscale/dbias不做回传,且从输出获得的指针为空。但xpu的api不支持单独计算dx/dscale/dbias的功能必须传入这三个参数且不能是空指针,只能传入临时变量,否则会报错。
这种情况,可以在kernel里面增加判断,指针为空,就不要做后面的计算了,计算了,还耗时。
4e1b8f6 to
99cf4c8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
顺序调整一下吧,按字母排序~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
顺序调整一下吧,按字母排序~
已修改
e86ddbb to
0d5fb59
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个相当于增加了batchnorm 这个op训练中一些属性改变的计算,可以在单测中增加 TestBatchNormOpTraining, 增加测试用例。
已增加 |
|
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commit信息增加test=kunlun, 触发KL2单测。
… and "roi_align_grad" op in xpu2 op list.
… and "roi_align_grad" op in xpu2 op list. test=kunlun
PR types
Bug fixes
PR changes
OPs
Describe
For batch_norm op, fixed the bug that the output saved_mean and saved_variance could not be initialized correctly when using global_stats mode.
For batch_norm_grad op, global_stats and inplace modes have been added. And added the function of whether to calculate d_x, d_bias, d_sclae separately.
Add the "roi_align" and "roi_align_grad" op in xpu2 op list. test=kunlun