-
Notifications
You must be signed in to change notification settings - Fork 5.7k
design doc for implementation parameters in CPP. #2249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
wangkuiyi
merged 5 commits into
PaddlePaddle:develop
from
reyoung:feature/design_of_cpp_parameters_concept
May 25, 2017
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
# Design Doc: The C++ Class `Parameters` | ||
|
||
`Parameters` is a concept we designed in Paddle V2 API. `Parameters` is a container of parameters, and make Paddle can shared parameter between topologies. We described usages of `Parameter` in [api.md](./api.md). | ||
|
||
We used Python to implement Parameters when designing V2 API before. There are several defects for current implementation: | ||
* We just use `memcpy` to share Parameters between topologies, but this is very inefficient. | ||
* We did not implement share Parameters while training. We just trigger `memcpy` when start training. | ||
|
||
It is necessary that we implement Parameters in CPP side. However, it could be a code refactoring for Paddle, because Paddle was designed for training only one topology before, i.e., each GradientMachine contains its Parameter as a data member. In current Paddle implementation, there are three concepts associated with `Parameters`: | ||
|
||
1. `paddle::Parameter`. A `Parameters` is a container for `paddle::Parameter`. | ||
It is evident that we should use `paddle::Parameter` when developing `Parameters`. | ||
However, the `Parameter` class contains many functions and does not have a clear interface. | ||
It contains `create/store Parameter`, `serialize/deserialize`, `optimize(i.e SGD)`, `randomize/zero`. | ||
When we developing `Parameters`, we only use `create/store Parameter` functionality. | ||
We should extract functionalities of Parameter into many classes to clean Paddle CPP implementation. | ||
|
||
2. `paddle::GradientMachine` and its sub-classes, e.g., `paddle::MultiGradientMachine`, `paddle::NeuralNetwork`. | ||
We should pass `Parameters` to `paddle::GradientMachine` when `forward/backward` to avoid `memcpy` between topologies. | ||
Also, we should handle multi-GPU/CPU training, because `forward` and `backward` would perform on multi-GPUs and multi-CPUs. | ||
`Parameters` should dispatch the parameter value to each device, and gather the parameter gradient from each device. | ||
|
||
3. `paddle::ParameterUpdater`. The ParameterUpdater is used to update parameters in Paddle. | ||
So `Parameters` should be used by `paddle::ParameterUpdater`, and `paddle::ParameterUpdater` should optimize `Parameters` (by SGD). | ||
|
||
|
||
The step by step approach for implementation Parameters in Paddle C++ core is listed below. Each step should be a PR and could be merged into Paddle one by one. | ||
|
||
1. Clean `paddle::Parameter` interface. Extract the functionalities of `paddle::Parameter` to prepare for the implementation of Parameters. | ||
|
||
2. Implementation a `Parameters` class. It just stores the `paddle::Parameter` inside. Make `GradientMachine` uses `Parameters` as a class member. | ||
|
||
3. Make `Parameters` support Multi-CPU and Multi-GPU training to prepare for sharing `Parameter` between topologies. | ||
Because we need share `Parameters` between topologies, it is `Parameters`'s response to exchange Parameters between GPUs. | ||
`GradientMachine` should not handle how to exchange Parameters because `GradientMachine` only used to train one topology and we need to support train many topologies in Paddle, i.e., there could be many GradientMachines use one `Parameters`. | ||
* We should use a global function to exchange Parameters between GPUs, not a member function in `Parameters`. The `MultiGradientMachine` invoke this function, which uses `Parameters` as this function inputs. | ||
* The MultiGradientMachine contains many functionalities. Extracting the Parameters exchanging logic could make MultiGradientMachine clearer and simpler. | ||
|
||
4. Make `Parameters` as an argument for `forward/backward` function, not a data member for `GradientMachine`. For example, `forward` could be `forward(const Parameters& params, ...)` and `backward` could be `backward(Parameters* params, ...)`. After this step, Paddle could share `Parameters` between topologies. | ||
|
||
5. `ParameterUpdater` is invoked by `GradientMachine` and `Trainer`, but it updates `Parameters`. In the end of this code refactoring, we could change `ParameterUpdater` directly uses `Parameters` to make `ParameterUpdater`'s implementation clear. |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it that
GradientMachine
andNeuralNetwork
does single-thread training, andMultiGradientMachine
does concurrent training? If so, it seems that it is the responsibility ofMultiGradientMachine
, other thanParameters
, to sync up among threads.如果
GradientMachine
,NeuralNetwork
,MultiGradientMachine
都用到Parameters
,但是只有MultiGradientMachine
做并发训练,前两个classes都不做,那么对多线程的支持应该在MultiGradientMachine
里面,而不应该在Parameters
里面。There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MultiGradientMachine can support only one topology while training. But
Parameters
may be shared by many topologies. I think MultiGradientMachine should invokeParameters.exchangeToMultiGPU(used_parameter_names)
method when MultiGradientMachine is used only a subset of Parameters. Or let another class to do exchange job, such asexchanger = new ParameterExchanger(parameters, used_parameter_names)
,exchanger.exchange()
;Another reason I want to extract
Parameter Exchange/Gather
logic from MultiGradientMachine is the MultiGradientMachine is a super class. It mixes the Multi-Devices computing logic, Parameter Exchange/Gather logic, synchronization together in a Single Class. It should be better and clearer that we extract some logic.原因有如下几点:
1、MultiGradientMachine只处理了训练一个拓扑结构的情况,而Parameters可能在训练中被多个拓扑结构共享。于是多卡参数交换就和单个拓扑结构情况下不同(不是所有参数都要进行交换,而是选择某些参数进行交换)。当然,还是应该由MultiGradientMachine调用Parameters.exchange进行交换。
可能的实现手法是:
2、另一个想要把参数交换逻辑提取出来的原因是,MultiGradientMachine是一个非常重的类,揉和了多个功能。例如多设备的计算,参数聚合分发,同步逻辑等等。如果我们在写Parameters的时候,把参数聚合逻辑分解出来,会让代码逻辑变得更清晰。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe global function is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Added this part into design doc.