-
Notifications
You must be signed in to change notification settings - Fork 5.9k
V2 optimizer #1214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
V2 optimizer #1214
Conversation
|
转 @cxwangyi 之前的review:
|
| @@ -0,0 +1,44 @@ | |||
| # Design Doc: New Paddle API For Updater | |||
|
|
|||
| In gradient-base optimization algorithms, the parameters are updated using the gradients in each iteration. We call the component that do update work Updater. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
English grammar suggestion:
PaddlePaddle uses SGD to update parameters of a neural network. For each mini-batch, we implement the forward/backward algorithm, which computes activations and cost, in GradientMachine, and we updates model parameters using Updater. This design doc is about Updater.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好的,这个描述更加准确
|
|
||
| In gradient-base optimization algorithms, the parameters are updated using the gradients in each iteration. We call the component that do update work Updater. | ||
|
|
||
| The main method of an Updater is update(parameters), there may have more then one parameters is multi-layer neural network, the Updater cann |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
English grammar suggestions:
Update::update updates parameters layer by layer.
| In gradient-base optimization algorithms, the parameters are updated using the gradients in each iteration. We call the component that do update work Updater. | ||
|
|
||
| The main method of an Updater is update(parameters), there may have more then one parameters is multi-layer neural network, the Updater cann | ||
| update each parameter one by one with updater.update(parameter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update为什么不是GradientMachine的一个method,而需要在GradientMachine之外再新建一个class Updater呢?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updater完成update功能,GradientMachine中包含一个updater对象,并且在完整forward/backward之后,根据自己的需求调用updater.update来更新参数,这么个顺序。
| update each parameter one by one with updater.update(parameter) | ||
|
|
||
| ```python | ||
| gm = GradientMachine() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个例子是在描述目前的搞法吗?
如果是,那么可能应该描述一下:
The current usage of
Updater::updateis about the following:
另外,因为后面的 gm.foward 函数调用里没有指定 network,所以这个构造函数里应该是有指定network的?
gm = paddle.GradientMachine(network)
|
|
||
| 使用方式: | ||
| ```python | ||
| updater = paddle.v2.Updater( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我看下面把optimizers放到一个叫 optimizer 的package里了。Updater会不会有多种实现?如果是,需不需要也放到一个package里?
| ```python | ||
| updater = paddle.v2.Updater( | ||
| learning_method=paddle.v2.optimizer.AdamOptimizer(), | ||
| learning_rate=1e-4, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我以为步长应该是每次调用Updater.update的时候指定的?这里最多有一个“参考步长”(learning_rate_hint)吧?
| 使用方式: | ||
| ```python | ||
| updater = paddle.v2.Updater( | ||
| learning_method=paddle.v2.optimizer.AdamOptimizer(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个optimizer是什么意思呢?之前的design doc里没有提到过。Optimizier听上去和Trainer是一个东西。如果是,那么逻辑上应该是optimizer生成一个gradient machine 来做forward/backward,同时生成一个 updater 来update参数——实际上gradient machine貌似就可以foward/backward以及update。而这里的逻辑是反过来的—— updater 生成了一个optimizer?
| updater = paddle.v2.Updater( | ||
| learning_method=paddle.v2.optimizer.AdamOptimizer(), | ||
| learning_rate=1e-4, | ||
| model_average=paddle.v2.optimizer.ModelAverage(average_window=0.5), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Class name 应该是名词形式——ModelAverager——有个r在末尾?
| learning_method=paddle.v2.optimizer.AdamOptimizer(), | ||
| learning_rate=1e-4, | ||
| model_average=paddle.v2.optimizer.ModelAverage(average_window=0.5), | ||
| regularization=paddle.v2.optimizer.L2Regularization(rate=0.5)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
类似的,这里应该是:
paddle.v2.regularizer.L2(rate=0.5)
| _temp_optimizer_ = api.ParameterOptimizer.create(opt_config) | ||
| enable_types = _temp_optimizer_.getParameterTypes() | ||
| optimizer = paddle_v2.optimizer.Optimizer( | ||
| learning_method=paddle_v2.optimizer_types.AdamOptimizer(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optimizer_types 不太合适。
既然已经有optimizer这个包了,就扔那里面吧。
| regularization=paddle_v2.optimizer_types.L2Regularization(rate=0.5)) | ||
|
|
||
| # Create Simple Gradient Machine. | ||
| model_config = parse_network_config(network_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
既然,我们的optimizer都是独立的类了,那么我们解析这个model_config是不是可以改成一个成员函数呢?
optimizer.proto()| @@ -0,0 +1,226 @@ | |||
| # Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果现阶段我们是单纯的复制粘贴的话,能不能用软连接做?
@wangkuiyi 软连接主要的考虑是,我们之前的python代码还会有更新,直接复制粘贴会让两边无法同步的。
| class OptimizerConfig(object): | ||
| def __init__(self, **kwargs): | ||
| trainer_conf = TrainerConfig() | ||
| self.conf_proto = trainer_conf.opt_config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果是单纯的conf_proto可以直接从OptimizationConfig()构造。
| def __init__(self, **kwargs): | ||
| trainer_conf = TrainerConfig() | ||
| self.conf_proto = trainer_conf.opt_config | ||
| self.settings = dict( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
既然这个函数是复制粘贴得了,就没必要复用之前的code了吧?我们可以重头写这个?
因为如果复制粘贴了code,那之前的code还在增加功能,这个code也是会不同步的。所以复用原来的逻辑,也没必要。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
嗯,为了避免之前的返工问题,就先把之前的code copy出来了,然后看看大家的想法,再做修改
|
有一个问题,如果我们目前复制粘贴了code,形成新的Paddle.V2的api,那么 1、原来的代码还在持续开发,在我们开发paddle.v2的过程中,之前的code还在持续增加功能。当我们开发完了之后,两头同步起来似乎有些麻烦。 所以,似乎简单的copy & paste会有比较大的维护问题。 |
这确实是个大问题,需要确定下的 |
update cumsum api doc add Tensor/lodTensor for input x description alter the optional for parameters have default value
API设计文档
updater 设计文档
把之前代码中关于optimizer的部分独立出来,放到v2里面了,不再调用config_parser来parse,而是直接生成对应的proto config。