V2 optimizer #1214

jacquesqiao · 2017-01-22T10:58:35Z

把之前代码中关于optimizer的部分独立出来，放到v2里面了，不再调用config_parser来parse，而是直接生成对应的proto config。

…rser

jacquesqiao · 2017-01-23T01:17:40Z

转 @cxwangyi 之前的review：

下面这段代码里看上去有一些不一致性

optimizer = paddle.v2.Optimizer(
    learning_method=paddle.optimizer.AdamOptimizer(),
    learning_rate=1e-4,
    model_average=paddle.optimizer.ModelAverage(average_window=0.5),
    regularization=paddle.optimizer.L2Regularization(rate=0.5))

paddle.v2.Optimizer 和 paddle.optimizer.AdamOptimizer 不在同一个package里？
paddle.v2.Optimizer 从命名上看是一个 class，它的构造函数的参数里为什么需要另一个 optimizer （AdamOptimizer）呢？它们俩倒是谁是我们要的optimizer？
ModelAverage 是想叫做 ModelAverager 吗？少了个 er，貌似从英语语法上就不通了？
一个optimizer为什么需要 model averager 呢？是因为这是一个分布式optimizer吗？
我了解这些不一致性，有很多和 @jacquesqiao 的设计没关系，是因为历史原因。但是在我们的v2里看起来必须fix。

wangkuiyi · 2017-01-23T02:51:08Z

doc/design/updater.md

@@ -0,0 +1,44 @@
+# Design Doc: New Paddle API For Updater
+
+In gradient-base optimization algorithms, the parameters are updated using the gradients in each iteration. We call the component that do update work Updater.


English grammar suggestion:

PaddlePaddle uses SGD to update parameters of a neural network. For each mini-batch, we implement the forward/backward algorithm, which computes activations and cost, in GradientMachine, and we updates model parameters using Updater. This design doc is about Updater.

好的，这个描述更加准确

wangkuiyi · 2017-01-23T02:52:03Z

doc/design/updater.md

+
+In gradient-base optimization algorithms, the parameters are updated using the gradients in each iteration. We call the component that do update work Updater.
+
+The main method of an Updater is update(parameters), there may have more then one parameters is multi-layer neural network, the Updater cann


English grammar suggestions:

Update::update updates parameters layer by layer.

wangkuiyi · 2017-01-23T02:53:00Z

doc/design/updater.md

+In gradient-base optimization algorithms, the parameters are updated using the gradients in each iteration. We call the component that do update work Updater.
+
+The main method of an Updater is update(parameters), there may have more then one parameters is multi-layer neural network, the Updater cann
+update each parameter one by one with updater.update(parameter)


update为什么不是GradientMachine的一个method，而需要在GradientMachine之外再新建一个class Updater呢？

updater完成update功能，GradientMachine中包含一个updater对象，并且在完整forward/backward之后，根据自己的需求调用updater.update来更新参数，这么个顺序。

wangkuiyi · 2017-01-23T02:54:39Z

doc/design/updater.md

+update each parameter one by one with updater.update(parameter)
+
+```python
+gm = GradientMachine()


这个例子是在描述目前的搞法吗？

如果是，那么可能应该描述一下：

The current usage of Updater::update is about the following:

另外，因为后面的 gm.foward 函数调用里没有指定 network，所以这个构造函数里应该是有指定network的？

gm = paddle.GradientMachine(network)

wangkuiyi · 2017-01-23T02:59:32Z

doc/design/updater.md

+
+使用方式：
+```python
+    updater = paddle.v2.Updater(


我看下面把optimizers放到一个叫 optimizer 的package里了。Updater会不会有多种实现？如果是，需不需要也放到一个package里？

wangkuiyi · 2017-01-23T03:02:38Z

doc/design/updater.md

+```python
+    updater = paddle.v2.Updater(
+        learning_method=paddle.v2.optimizer.AdamOptimizer(),
+        learning_rate=1e-4,


我以为步长应该是每次调用Updater.update的时候指定的？这里最多有一个“参考步长”（learning_rate_hint）吧？

wangkuiyi · 2017-01-23T03:04:40Z

doc/design/updater.md

+使用方式：
+```python
+    updater = paddle.v2.Updater(
+        learning_method=paddle.v2.optimizer.AdamOptimizer(),


这个optimizer是什么意思呢？之前的design doc里没有提到过。Optimizier听上去和Trainer是一个东西。如果是，那么逻辑上应该是optimizer生成一个gradient machine 来做forward/backward，同时生成一个 updater 来update参数——实际上gradient machine貌似就可以foward/backward以及update。而这里的逻辑是反过来的—— updater 生成了一个optimizer？

wangkuiyi · 2017-01-23T03:05:15Z

doc/design/updater.md

+    updater = paddle.v2.Updater(
+        learning_method=paddle.v2.optimizer.AdamOptimizer(),
+        learning_rate=1e-4,
+        model_average=paddle.v2.optimizer.ModelAverage(average_window=0.5),


Class name 应该是名词形式——ModelAverager——有个r在末尾？

wangkuiyi · 2017-01-23T03:05:41Z

doc/design/updater.md

+        learning_method=paddle.v2.optimizer.AdamOptimizer(),
+        learning_rate=1e-4,
+        model_average=paddle.v2.optimizer.ModelAverage(average_window=0.5),
+        regularization=paddle.v2.optimizer.L2Regularization(rate=0.5))


类似的，这里应该是：

paddle.v2.regularizer.L2(rate=0.5)

reyoung · 2017-01-23T02:38:08Z

demo/mnist/api_train.py

-    _temp_optimizer_ = api.ParameterOptimizer.create(opt_config)
-    enable_types = _temp_optimizer_.getParameterTypes()
+    optimizer = paddle_v2.optimizer.Optimizer(
+        learning_method=paddle_v2.optimizer_types.AdamOptimizer(),


optimizer_types 不太合适。

既然已经有optimizer这个包了，就扔那里面吧。

reyoung · 2017-01-23T03:00:55Z

demo/mnist/api_train.py

+        regularization=paddle_v2.optimizer_types.L2Regularization(rate=0.5))

    # Create Simple Gradient Machine.
    model_config = parse_network_config(network_config)


既然，我们的optimizer都是独立的类了，那么我们解析这个model_config是不是可以改成一个成员函数呢？

optimizer.proto()

reyoung · 2017-01-23T03:02:50Z

python/paddle/v2/activations.py

@@ -0,0 +1,226 @@
+# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved


如果现阶段我们是单纯的复制粘贴的话，能不能用软连接做？

@wangkuiyi 软连接主要的考虑是，我们之前的python代码还会有更新，直接复制粘贴会让两边无法同步的。

reyoung · 2017-01-23T03:03:48Z

python/paddle/v2/optimizer.py

+class OptimizerConfig(object):
+    def __init__(self, **kwargs):
+        trainer_conf = TrainerConfig()
+        self.conf_proto = trainer_conf.opt_config


如果是单纯的conf_proto可以直接从OptimizationConfig()构造。

reyoung · 2017-01-23T03:07:11Z

python/paddle/v2/optimizer.py

+    def __init__(self, **kwargs):
+        trainer_conf = TrainerConfig()
+        self.conf_proto = trainer_conf.opt_config
+        self.settings = dict(


既然这个函数是复制粘贴得了，就没必要复用之前的code了吧？我们可以重头写这个?

因为如果复制粘贴了code，那之前的code还在增加功能，这个code也是会不同步的。所以复用原来的逻辑，也没必要。

嗯，为了避免之前的返工问题，就先把之前的code copy出来了，然后看看大家的想法，再做修改

reyoung · 2017-01-23T03:13:43Z

有一个问题，如果我们目前复制粘贴了code，形成新的Paddle.V2的api，那么

1、原来的代码还在持续开发，在我们开发paddle.v2的过程中，之前的code还在持续增加功能。当我们开发完了之后，两头同步起来似乎有些麻烦。
2、我们的paddle.v2是一个在python中使用的library，原来的python包是一个可以被Paddle C++解析的python包。。我们开发出paddle.v2后，是不是还要支持原来二进制运行的方式(我倾向于支持)？因为现在大部分用户都是使用二进制运行paddle的方式。那我们的paddle.v2如何保证可以还被C++解析？

所以，似乎简单的copy & paste会有比较大的维护问题。

jacquesqiao · 2017-01-23T04:47:15Z

2、我们的paddle.v2是一个在python中使用的library，原来的python包是一个可以被Paddle C++解析的python包。。我们开发出paddle.v2后，是不是还要支持原来二进制运行的方式(我倾向于支持)？因为现在大部分用户都是使用二进制运行paddle的方式。那我们的paddle.v2如何保证可以还被C++解析？

这确实是个大问题，需要确定下的

update cumsum api doc add Tensor/lodTensor for input x description alter the optional for parameters have default value

) * [NPU] npu support modification for bert and ernie-1.0 * [NPU] revert cumsum related changes Co-authored-by: Zeyu Chen <chenzeyu01@baidu.com>

jacquesqiao added 5 commits January 22, 2017 15:43

add optimizer related code to v2

098dbe5

change optimizer to v2, remove most code that is related to config_pa…

f106a2c

…rser

change optimizer in mnist

348c195

add optimizer_types in v2 init

5b3b230

add updater design doc

4103411

jacquesqiao requested review from helinwang, reyoung and wangkuiyi January 23, 2017 01:15

wangkuiyi requested changes Jan 23, 2017

View reviewed changes

reyoung requested changes Jan 23, 2017

View reviewed changes

jacquesqiao closed this Jan 23, 2017

zhhsplendid pushed a commit to zhhsplendid/Paddle that referenced this pull request Sep 25, 2019

Cumsum (PaddlePaddle#1214)

e12f691

update cumsum api doc add Tensor/lodTensor for input x description alter the optional for parameters have default value

		@@ -0,0 +1,44 @@
		# Design Doc: New Paddle API For Updater

		In gradient-base optimization algorithms, the parameters are updated using the gradients in each iteration. We call the component that do update work Updater.


		In gradient-base optimization algorithms, the parameters are updated using the gradients in each iteration. We call the component that do update work Updater.

		The main method of an Updater is update(parameters), there may have more then one parameters is multi-layer neural network, the Updater cann

		@@ -0,0 +1,226 @@
		# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved

V2 optimizer #1214

V2 optimizer #1214

Uh oh!

Conversation

jacquesqiao commented Jan 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jacquesqiao commented Jan 23, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

reyoung commented Jan 23, 2017

Uh oh!

jacquesqiao commented Jan 23, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jacquesqiao commented Jan 22, 2017 •

edited

Loading