Closed
Description
Project
https://github.com/PaddlePaddle/Paddle/projects/61
Design
- Add async update design doc. Add async update design doc #9932
- Add distributed training overview doc. Add distributed training overview doc #9937
Operators
- VariableResponse support deserialize var into local scope. VariableResponse support deserialize var into local scope #10060
- Refine listen and serve op, Separate RunSyncLoop to a method, prepare for RunAsyncLoop. Refine listen and serve op #10080
- split optimization ops on pserver to independenty blocks split optimization ops on pserver to independenty blocks #10123
- Create sub socpe when it is necessary Create sub socpe when it is necessary #10124
- Add an RunAsyncUpdate(no barrier and no lock) to listen_and_serv_op listen_and_serv_op support async update #9997
- Prepare optimization block and PrepareContext for each parameter.
- Add BlockQueue for each parameter block. The queue is used to store the gradient VariableMessage of this parameter from trainers.
- Add a thread for each parameter to run optimization block.
- The thread will read gradient from its BlockQueue, create a subscope to deserialize it and then use this subscope to run optimization block.
- Add one thread to get parameter from the global scope for trainers.(Maybe we need a thread pool to speed up the get process. but it seems that GRPC interface can only work in one thread. Can have a test)
- send_vars and read_vars from pserver without send_barrier and get_barrier.
- Use multi thread todo update Use multi thread to do update #10228
Transpiler #9997
- dist transpile async trainer program. Do not need to add
.trainer_n
suffix to gradient block in async mode. - dist transpile async pserver program. Do not need to aggregate gradient block.
Consider
- need to consider how to add learning rate decay in asynchronous training. Do we need lr_decay?
Benchmark
- benchmark of fluid async training benchmark of fluid async training #10180
Metadata
Metadata
Labels
No labels