Skip to content

design doc for implementation parameters in CPP. #2249

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

reyoung
Copy link
Collaborator

@reyoung reyoung commented May 24, 2017

The step by step approach for how to implement Parameters in CPP

Here may be better to review.

@reyoung reyoung requested a review from wangkuiyi May 24, 2017 08:23
@@ -0,0 +1,39 @@
# Parameters in CPP
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CPP => C++

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The title should be

# Design Doc: The C++ Class `Parameters`

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


2. Implementation a `Parameters` class. It just stores the `paddle::Parameter` inside. Make `GradientMachine` uses `Parameters` as a class member.

3. Make `Parameters` support Multi-CPU and Multi-GPU training to prepare for sharing `Parameter` between topologies.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it that GradientMachine and NeuralNetwork does single-thread training, and MultiGradientMachine does concurrent training? If so, it seems that it is the responsibility of MultiGradientMachine, other than Parameters, to sync up among threads.

如果 GradientMachine, NeuralNetwork, MultiGradientMachine 都用到 Parameters,但是只有 MultiGradientMachine做并发训练,前两个classes都不做,那么对多线程的支持应该在 MultiGradientMachine 里面,而不应该在 Parameters 里面。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MultiGradientMachine can support only one topology while training. But Parameters may be shared by many topologies. I think MultiGradientMachine should invoke Parameters.exchangeToMultiGPU(used_parameter_names) method when MultiGradientMachine is used only a subset of Parameters. Or let another class to do exchange job, such as exchanger = new ParameterExchanger(parameters, used_parameter_names), exchanger.exchange();

Another reason I want to extract Parameter Exchange/Gather logic from MultiGradientMachine is the MultiGradientMachine is a super class. It mixes the Multi-Devices computing logic, Parameter Exchange/Gather logic, synchronization together in a Single Class. It should be better and clearer that we extract some logic.

原因有如下几点:
1、MultiGradientMachine只处理了训练一个拓扑结构的情况,而Parameters可能在训练中被多个拓扑结构共享。于是多卡参数交换就和单个拓扑结构情况下不同(不是所有参数都要进行交换,而是选择某些参数进行交换)。当然,还是应该由MultiGradientMachine调用Parameters.exchange进行交换。
可能的实现手法是:

// ParameterExchanger负责参数交换的全部逻辑
auto exchanger = new ParameterExchanger(parameters, used_parameter_names);
exchanger.exchange();

2、另一个想要把参数交换逻辑提取出来的原因是,MultiGradientMachine是一个非常重的类,揉和了多个功能。例如多设备的计算,参数聚合分发,同步逻辑等等。如果我们在写Parameters的时候,把参数聚合逻辑分解出来,会让代码逻辑变得更清晰。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe global function is better.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Added this part into design doc.

@wangkuiyi wangkuiyi merged commit 5e94fa3 into PaddlePaddle:develop May 25, 2017
@reyoung reyoung deleted the feature/design_of_cpp_parameters_concept branch June 27, 2017 12:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants