Skip to content

Change customize_loss_grad to use_default_grad_scale. #10223

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
May 2, 2018
6 changes: 3 additions & 3 deletions paddle/fluid/framework/parallel_executor.cc
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ ParallelExecutor::ParallelExecutor(
const std::unordered_set<std::string> &bcast_vars,
const ProgramDesc &main_program, const std::string &loss_var_name,
Scope *scope, const std::vector<Scope *> &local_scopes, bool allow_op_delay,
bool customize_scale_loss)
bool use_default_grad_scale)
: member_(new ParallelExecutorPrivate(places)) {
member_->global_scope_ = scope;

Expand Down Expand Up @@ -93,11 +93,11 @@ ParallelExecutor::ParallelExecutor(
#ifdef PADDLE_WITH_CUDA
details::MultiDevSSAGraphBuilder builder(
member_->places_, loss_var_name, params, member_->local_scopes_,
customize_scale_loss, member_->nccl_ctxs_.get());
use_default_grad_scale, member_->nccl_ctxs_.get());
#else
details::MultiDevSSAGraphBuilder builder(member_->places_, loss_var_name,
params, member_->local_scopes_,
customize_scale_loss);
use_default_grad_scale);
#endif
auto graph = builder.Build(main_program);

Expand Down
2 changes: 1 addition & 1 deletion paddle/fluid/framework/parallel_executor.h
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ class ParallelExecutor {
const ProgramDesc& main_program,
const std::string& loss_var_name, Scope* scope,
const std::vector<Scope*>& local_scopes,
bool allow_op_delay, bool customize_scale_loss);
bool allow_op_delay, bool use_default_grad_scale);

~ParallelExecutor();

Expand Down
10 changes: 5 additions & 5 deletions paddle/fluid/pybind/pybind.cc
Original file line number Diff line number Diff line change
Expand Up @@ -502,11 +502,11 @@ All parameter, weight, gradient are variables in Paddle.
const std::unordered_set<std::string> &bcast_vars,
const ProgramDesc &main_program, const std::string &loss_var_name,
Scope *scope, std::vector<Scope *> &local_scopes,
bool allow_op_delay, bool customize_loss_grad) {
new (&self) ParallelExecutor(num_threads, use_event, places,
params, bcast_vars, main_program,
loss_var_name, scope, local_scopes,
allow_op_delay, customize_loss_grad);
bool allow_op_delay, bool use_default_grad_scale) {
new (&self) ParallelExecutor(
num_threads, use_event, places, params, bcast_vars,
main_program, loss_var_name, scope, local_scopes,
allow_op_delay, use_default_grad_scale);
})
.def("bcast_params", &ParallelExecutor::BCastParamsToGPUs)
// NOTE: even we return a vec<Scope*>* to Python use reference policy.
Expand Down
8 changes: 6 additions & 2 deletions python/paddle/fluid/parallel_executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ def __init__(self,
num_threads=None,
allow_op_delay=False,
share_vars_from=None,
customize_loss_grad=False):
use_default_grad_scale=True):
"""
ParallelExecutor can run program in parallel.

Expand All @@ -46,6 +46,10 @@ def __init__(self,
improve performance in some cases, defalut False.
share_vars_from(ParallelExecutor, default None): If provied,
it will share variables from the specified ParallelExecutor.
use_default_grad_scale(bool, default True): If set True, a default
scale value equal to `1./device_count` would be multiplied to
the gradients. Otherwise, a customized scale value should be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to gradients of each device? and then aggregated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, followed the comment.

feeded to the network.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feeded->fed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


Returns:
A ParallelExecutor object.
Expand Down Expand Up @@ -124,7 +128,7 @@ def __init__(self,
scope,
local_scopes,
allow_op_delay,
customize_loss_grad)
use_default_grad_scale)
self.scope = scope

def run(self, fetch_list, feed=None, feed_dict=None):
Expand Down