Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Scalar Pow #4082

Merged
merged 16 commits into from
Jan 11, 2021
Merged

Add Scalar Pow #4082

merged 16 commits into from
Jan 11, 2021

Conversation

MARD1NO
Copy link
Contributor

@MARD1NO MARD1NO commented Jan 6, 2021

概述

增加 Scalar 版本的 Pow 运算,参考自 torch.pow,他支持接受 tensor 和 float 作为参数
图片

另外修改了 hardswish 的空格问题

功能 CheckList

Op

  • Op SetBatchAxisInferFn
  • Op SetGetSbpFn
  • Op SetInputArgModifyFn
  • Op 反向梯度注册

Kernel

由于tensor版本的pow,只注册了float32,float64类型。这里为了统一,暂时没有注册 float16 版本

  • CPU in:float32

  • CPU in:float64

  • GPU in:float32

  • GPU in:float64

Python Wrapper

  • Python API 参数检查及异常提示
  • 接口注释
  • Example 

测试

  • 单机单卡 CPU Test Case
  • 单机单卡 GPU Test Case
  • 单机多卡 CPU Test Case
  • 单机多卡 GPU Test Case
  • 分布式 CPU Test Case
  • 分布式 GPU Test Case

GPU 有效带宽

理论带宽:

Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)	Bandwidth(GB/s)
   32000000			378.4

实际带宽:

I0106 16:37:14.958042 125325 kernel.cpp:89] PROFILER::KERNEL::CUDA_MEMORY_BANDWIDTH op_name: pow elapsed(ms): 0.196544 memory_size(Byte): 50331648 bandwidth(GB/s): 238.496

折合成百分比为 63 %

PR Checklist

  • PR 标题语句通畅,明确表达 PR 内容,适合直接作为新版本发布时的 changelog
  • 代码格式化
  • 已经本地编译通过
  • 已本地针对改动测试
  • 已添加 type 标签:(填写 type 标签名,如 bug, enhancement, purge, feature, documentation)
  • 已添加 component 标签:(填写 component 标签名,如 op, system, eager, build, xla, python, ci, test, tooling, onnx)
  • Draft 转正式 PR 前已请人 Review

@MARD1NO MARD1NO requested a review from doombeaker January 6, 2021 08:08
@MARD1NO MARD1NO marked this pull request as ready for review January 6, 2021 08:08
@MARD1NO MARD1NO marked this pull request as draft January 6, 2021 08:08
@MARD1NO MARD1NO marked this pull request as ready for review January 6, 2021 08:39
@MARD1NO MARD1NO requested a review from oneflow-ci-bot January 6, 2021 08:39
@oneflow-ci-bot oneflow-ci-bot removed their request for review January 6, 2021 09:06
@MARD1NO MARD1NO requested a review from oneflow-ci-bot January 6, 2021 09:08
@oneflow-ci-bot oneflow-ci-bot removed their request for review January 6, 2021 09:39
.SetCreateFn<CpuHardSwishKernel<device, dtype>>() \
.SetIsMatchedHob((HobDeviceTag() == device) \
& (HobDataType("out", 0) == GetDataType<dtype>::value));
#define REGISTER_CPU_HARDSWISH_KERNEL(device, dtype) \
Copy link
Contributor

@doombeaker doombeaker Jan 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么另外不相关算子的格式化会在这个PR里。建议拆分成另外的PR。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

=,= 因为有个空格,编译的时候会提示几行。 我后面拆分到别的PR吧

@oneflow-ci-bot oneflow-ci-bot removed their request for review January 11, 2021 04:58
@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot January 11, 2021 08:24
@oneflow-ci-bot oneflow-ci-bot merged commit fef379e into master Jan 11, 2021
@oneflow-ci-bot oneflow-ci-bot deleted the add_scalar_pow branch January 11, 2021 11:21
liujuncheng pushed a commit that referenced this pull request Jun 3, 2021
* Add Scalar Pow CPU version

* add backward

* add GPU kernel

* add test case

* add docs example

* fix if condition bug

* fix

* remove change

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: fef379e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants