-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refine code(momentum_op, rmsprop_op ) #6000
Conversation
@@ -50,8 +50,7 @@ class MomentumOpKernel : public framework::OpKernel<T> { | |||
|
|||
v_out.device(place) = v * mu + g; | |||
if (use_nesterov) { | |||
p_out.device(place) = p - g * lr.broadcast(grad_dsize) + | |||
v_out * mu * lr.broadcast(grad_dsize); | |||
p_out.device(place) = p - (g - v_out * mu) * lr.broadcast(grad_dsize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whether the framework::EigenScalar
can be used for lr
instead of broadcast
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried it, but the result is wrong.
@@ -50,8 +50,7 @@ class MomentumOpKernel : public framework::OpKernel<T> { | |||
|
|||
v_out.device(place) = v * mu + g; | |||
if (use_nesterov) { | |||
p_out.device(place) = p - g * lr.broadcast(grad_dsize) + | |||
v_out * mu * lr.broadcast(grad_dsize); | |||
p_out.device(place) = p - (g - v_out * mu) * lr.broadcast(grad_dsize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implement is almost same.
it's better to remove the v_out.device
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, I think it's better to remove the v_out.device
, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find that v_out
and v
point to the same memory, this is to say the writer use inplace method in python interface. And it is necessary to update v_out
in momentum_op, so v_out.device
can't be removed.
I find that the implementation of momentum_op seems to be wrong.
Code implementation: Paddle/paddle/operators/momentum_op.h Lines 51 to 57 in 728e8b1
|
The writer use inplace method in python interface, this is to say that Paddle/python/paddle/v2/fluid/optimizer.py Lines 265 to 278 in 21053c1
|
fix #6002