Skip to content

Conversation

@jacquesqiao
Copy link
Member

@jacquesqiao jacquesqiao commented Jan 22, 2017

API设计文档

updater 设计文档

把之前代码中关于optimizer的部分独立出来,放到v2里面了,不再调用config_parser来parse,而是直接生成对应的proto config。

@jacquesqiao
Copy link
Member Author

@cxwangyi 之前的review:

下面这段代码里看上去有一些不一致性

optimizer = paddle.v2.Optimizer(
    learning_method=paddle.optimizer.AdamOptimizer(),
    learning_rate=1e-4,
    model_average=paddle.optimizer.ModelAverage(average_window=0.5),
    regularization=paddle.optimizer.L2Regularization(rate=0.5))

paddle.v2.Optimizer 和 paddle.optimizer.AdamOptimizer 不在同一个package里?
paddle.v2.Optimizer 从命名上看是一个 class,它的构造函数的参数里为什么需要另一个 optimizer (AdamOptimizer)呢?它们俩倒是谁是我们要的optimizer?
ModelAverage 是想叫做 ModelAverager 吗?少了个 er,貌似从英语语法上就不通了?
一个optimizer为什么需要 model averager 呢?是因为这是一个分布式optimizer吗?
我了解这些不一致性,有很多和 @jacquesqiao 的设计没关系,是因为历史原因。但是在我们的v2里看起来必须fix。

@@ -0,0 +1,44 @@
# Design Doc: New Paddle API For Updater

In gradient-base optimization algorithms, the parameters are updated using the gradients in each iteration. We call the component that do update work Updater.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

English grammar suggestion:

PaddlePaddle uses SGD to update parameters of a neural network. For each mini-batch, we implement the forward/backward algorithm, which computes activations and cost, in GradientMachine, and we updates model parameters using Updater. This design doc is about Updater.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,这个描述更加准确


In gradient-base optimization algorithms, the parameters are updated using the gradients in each iteration. We call the component that do update work Updater.

The main method of an Updater is update(parameters), there may have more then one parameters is multi-layer neural network, the Updater cann
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

English grammar suggestions:

Update::update updates parameters layer by layer.

In gradient-base optimization algorithms, the parameters are updated using the gradients in each iteration. We call the component that do update work Updater.

The main method of an Updater is update(parameters), there may have more then one parameters is multi-layer neural network, the Updater cann
update each parameter one by one with updater.update(parameter)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update为什么不是GradientMachine的一个method,而需要在GradientMachine之外再新建一个class Updater呢?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updater完成update功能,GradientMachine中包含一个updater对象,并且在完整forward/backward之后,根据自己的需求调用updater.update来更新参数,这么个顺序。

update each parameter one by one with updater.update(parameter)

```python
gm = GradientMachine()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个例子是在描述目前的搞法吗?

如果是,那么可能应该描述一下:

The current usage of Updater::update is about the following:

另外,因为后面的 gm.foward 函数调用里没有指定 network,所以这个构造函数里应该是有指定network的?

gm = paddle.GradientMachine(network)


使用方式:
```python
updater = paddle.v2.Updater(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我看下面把optimizers放到一个叫 optimizer 的package里了。Updater会不会有多种实现?如果是,需不需要也放到一个package里?

```python
updater = paddle.v2.Updater(
learning_method=paddle.v2.optimizer.AdamOptimizer(),
learning_rate=1e-4,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我以为步长应该是每次调用Updater.update的时候指定的?这里最多有一个“参考步长”(learning_rate_hint)吧?

使用方式:
```python
updater = paddle.v2.Updater(
learning_method=paddle.v2.optimizer.AdamOptimizer(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个optimizer是什么意思呢?之前的design doc里没有提到过。Optimizier听上去和Trainer是一个东西。如果是,那么逻辑上应该是optimizer生成一个gradient machine 来做forward/backward,同时生成一个 updater 来update参数——实际上gradient machine貌似就可以foward/backward以及update。而这里的逻辑是反过来的—— updater 生成了一个optimizer?

updater = paddle.v2.Updater(
learning_method=paddle.v2.optimizer.AdamOptimizer(),
learning_rate=1e-4,
model_average=paddle.v2.optimizer.ModelAverage(average_window=0.5),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Class name 应该是名词形式——ModelAverager——有个r在末尾?

learning_method=paddle.v2.optimizer.AdamOptimizer(),
learning_rate=1e-4,
model_average=paddle.v2.optimizer.ModelAverage(average_window=0.5),
regularization=paddle.v2.optimizer.L2Regularization(rate=0.5))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

类似的,这里应该是:

paddle.v2.regularizer.L2(rate=0.5)

_temp_optimizer_ = api.ParameterOptimizer.create(opt_config)
enable_types = _temp_optimizer_.getParameterTypes()
optimizer = paddle_v2.optimizer.Optimizer(
learning_method=paddle_v2.optimizer_types.AdamOptimizer(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optimizer_types 不太合适。

既然已经有optimizer这个包了,就扔那里面吧。

regularization=paddle_v2.optimizer_types.L2Regularization(rate=0.5))

# Create Simple Gradient Machine.
model_config = parse_network_config(network_config)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

既然,我们的optimizer都是独立的类了,那么我们解析这个model_config是不是可以改成一个成员函数呢?

optimizer.proto()

@@ -0,0 +1,226 @@
# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果现阶段我们是单纯的复制粘贴的话,能不能用软连接做?

@wangkuiyi 软连接主要的考虑是,我们之前的python代码还会有更新,直接复制粘贴会让两边无法同步的。

class OptimizerConfig(object):
def __init__(self, **kwargs):
trainer_conf = TrainerConfig()
self.conf_proto = trainer_conf.opt_config
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果是单纯的conf_proto可以直接从OptimizationConfig()构造。

def __init__(self, **kwargs):
trainer_conf = TrainerConfig()
self.conf_proto = trainer_conf.opt_config
self.settings = dict(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

既然这个函数是复制粘贴得了,就没必要复用之前的code了吧?我们可以重头写这个?

因为如果复制粘贴了code,那之前的code还在增加功能,这个code也是会不同步的。所以复用原来的逻辑,也没必要。

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

嗯,为了避免之前的返工问题,就先把之前的code copy出来了,然后看看大家的想法,再做修改

@reyoung
Copy link
Collaborator

reyoung commented Jan 23, 2017

有一个问题,如果我们目前复制粘贴了code,形成新的Paddle.V2的api,那么

1、原来的代码还在持续开发,在我们开发paddle.v2的过程中,之前的code还在持续增加功能。当我们开发完了之后,两头同步起来似乎有些麻烦。
2、我们的paddle.v2是一个在python中使用的library,原来的python包是一个可以被Paddle C++解析的python包。。我们开发出paddle.v2后,是不是还要支持原来二进制运行的方式(我倾向于支持)?因为现在大部分用户都是使用二进制运行paddle的方式。那我们的paddle.v2如何保证可以还被C++解析?

所以,似乎简单的copy & paste会有比较大的维护问题。

@jacquesqiao
Copy link
Member Author

2、我们的paddle.v2是一个在python中使用的library,原来的python包是一个可以被Paddle C++解析的python包。。我们开发出paddle.v2后,是不是还要支持原来二进制运行的方式(我倾向于支持)?因为现在大部分用户都是使用二进制运行paddle的方式。那我们的paddle.v2如何保证可以还被C++解析?

这确实是个大问题,需要确定下的

zhhsplendid pushed a commit to zhhsplendid/Paddle that referenced this pull request Sep 25, 2019
update cumsum api doc

add Tensor/lodTensor for input x description

alter the optional for parameters have default value
wangxicoding pushed a commit to wangxicoding/Paddle that referenced this pull request Dec 9, 2021
)

* [NPU] npu support modification for bert and ernie-1.0

* [NPU] revert cumsum related changes

Co-authored-by: Zeyu Chen <chenzeyu01@baidu.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants