-
Notifications
You must be signed in to change notification settings - Fork 5.8k
[Auto Parallel] add dropout spmd rules #70216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
|
||
// args : (Tensor x, Tensor seed_tensor, Scalar p, bool is_test, str mode, int | ||
// seed, bool fix_seed) output : Tensor(out), Tensor(mask) | ||
SpmdInfo DropoutFwdInferSpmd(const DistMetaTensor& x, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, the naming of forward spmd rule is DropoutInferSpmd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
由于现在不需要写reverse,所以命名为fwd和bwd更合理,已经有部分spmd是如此命名的了
|
||
// args : (Tensor mask, Tensor out_grad, Scalar p, bool is_test, str mode) | ||
// output : Tensor(x_grad) | ||
SpmdInfo DropoutBwdInferSpmd(const DistMetaTensor& mask, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, the naming of backward spmd rule is DropoutGradInferSpmd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
bool is_test, | ||
const std::string& mode) { | ||
return ElementwiseBinaryInferSpmd(mask, out_grad); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we add DropoutInferSpmdReverse
to support reverse spmd rule in forward when use static mode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
现在不需要添加reverse的切分推导了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR Category
Auto Parallel
PR Types
Performance
Description
dropout切分推导规则添加
Pcard-73145
dropout切分推导会导致dp情况下精度无法诸位对齐,gpt中使用了dropout,因此gpt3个开启dp的单测会无法通过。暂时先禁止这3个测试,待此PR合入之后,再修改paddleNLP中gpt的测试。