-
Notifications
You must be signed in to change notification settings - Fork 5.7k
design doc for implementation parameters in CPP. #2249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
design doc for implementation parameters in CPP. #2249
Conversation
doc/design/parameters_in_cpp.md
Outdated
@@ -0,0 +1,39 @@ | |||
# Parameters in CPP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CPP => C++
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The title should be
# Design Doc: The C++ Class `Parameters`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
2. Implementation a `Parameters` class. It just stores the `paddle::Parameter` inside. Make `GradientMachine` uses `Parameters` as a class member. | ||
|
||
3. Make `Parameters` support Multi-CPU and Multi-GPU training to prepare for sharing `Parameter` between topologies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it that GradientMachine
and NeuralNetwork
does single-thread training, and MultiGradientMachine
does concurrent training? If so, it seems that it is the responsibility of MultiGradientMachine
, other than Parameters
, to sync up among threads.
如果 GradientMachine
, NeuralNetwork
, MultiGradientMachine
都用到 Parameters
,但是只有 MultiGradientMachine
做并发训练,前两个classes都不做,那么对多线程的支持应该在 MultiGradientMachine
里面,而不应该在 Parameters
里面。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MultiGradientMachine can support only one topology while training. But Parameters
may be shared by many topologies. I think MultiGradientMachine should invoke Parameters.exchangeToMultiGPU(used_parameter_names)
method when MultiGradientMachine is used only a subset of Parameters. Or let another class to do exchange job, such as exchanger = new ParameterExchanger(parameters, used_parameter_names)
, exchanger.exchange()
;
Another reason I want to extract Parameter Exchange/Gather
logic from MultiGradientMachine is the MultiGradientMachine is a super class. It mixes the Multi-Devices computing logic, Parameter Exchange/Gather logic, synchronization together in a Single Class. It should be better and clearer that we extract some logic.
原因有如下几点:
1、MultiGradientMachine只处理了训练一个拓扑结构的情况,而Parameters可能在训练中被多个拓扑结构共享。于是多卡参数交换就和单个拓扑结构情况下不同(不是所有参数都要进行交换,而是选择某些参数进行交换)。当然,还是应该由MultiGradientMachine调用Parameters.exchange进行交换。
可能的实现手法是:
// ParameterExchanger负责参数交换的全部逻辑
auto exchanger = new ParameterExchanger(parameters, used_parameter_names);
exchanger.exchange();
2、另一个想要把参数交换逻辑提取出来的原因是,MultiGradientMachine是一个非常重的类,揉和了多个功能。例如多设备的计算,参数聚合分发,同步逻辑等等。如果我们在写Parameters的时候,把参数聚合逻辑分解出来,会让代码逻辑变得更清晰。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe global function is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Added this part into design doc.
The step by step approach for how to implement Parameters in CPP
Here may be better to review.