Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add masked fill op #3515

Merged
merged 18 commits into from
Sep 2, 2020
Merged

add masked fill op #3515

merged 18 commits into from
Sep 2, 2020

Conversation

clackhan
Copy link
Contributor

REGISTER_USER_OP("masked_fill")
.Input("x")
.Input("mask")
.Attr("value", UserOpAttrType::kAtFloat)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

masked_fill支持多种数据类型,没有办法用float精确表达,这里可以参考scalar_mul

@clackhan clackhan marked this pull request as draft August 28, 2020 08:24
@jackalcooper jackalcooper added this to the 0.1.11 milestone Aug 30, 2020
@clackhan clackhan marked this pull request as ready for review August 30, 2020 23:49

namespace {

__global__ void NaiveHalfFillGpu(const int64_t elem_cnt, const float16 x, float16* y) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NewKernelUtil<device_type>::Fill是支持fp16的,是不是只需要处理好operand到float16的转换,而不是新写kernel


REGISTER_HALF_CONSTANT_LIKE_KERNEL

} // namespace oneflow
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

注意这里的空行


namespace {

__global__ void HalfAddByScalarPtrGpu(const int64_t n, const half* x, const half* y, half* z) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里应该是给XxxByScalarPtr添加float16类型吧

namespace {

template<typename CondT>
__global__ void NaiveHalfWhere(const int64_t elem_cnt, const CondT* cond, const half* lhs,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同理,这里直接给where添加float16就可以了吧


__global__ void HalfAddByScalarPtrGpu(const int64_t n, const half* x, const half* y, half* z) {
const half y_value = y[0];
CUDA_1D_KERNEL_LOOP(i, n) { z[i] = x[i] + y_value; }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

float16/half类型不能直接用+,要用__hadd

@oneflow-ci-bot oneflow-ci-bot merged commit 4511d64 into master Sep 2, 2020
@oneflow-ci-bot oneflow-ci-bot deleted the dev_add_op_masked_fill branch September 2, 2020 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants