-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Closed
Description
- 如何做并行test?
参考代码:https://github.com/dzhwinter/benchmark/pull/91/files
ParallelExecutor用于train:如何用于test?是类似如下吗?exe = fluid.ParallelExecutor(loss_name=avg_cost.name, use_cuda=True) for i xrange(iterations): loss = exe.run([avg_cost.name])
test_exe = fluid.ParallelExecutor(loss_name=avg_cost.name, use_cuda=True, main_program =test_program) for i xrange(test_iterations): loss, top1, top5 = test_exe.run([avg_cost.name, top1.name, top5.name])
【ParallelExecutor用于test,当前存在以下问题】:
- ParallelExecutor构造函数【始终】运行了一个startup_program。
- 代码: https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/framework/parallel_executor.cc#L56
- 存在的问题:并行测试startup_program是什么?要去掉startup_program里参数初始化吗?
- 是否要移动到Python里?:
-
- Python中ParallelExecutor构造函数判断是否要做初始化?
class ParallelExecutor(object): def __init__(self, loss_name, use_cuda, num_threads=None, main_program=None, startup_program=None, run_startup=True): # ... startup = startup_program if startup_program else framework.default_startup_program() if run_startup: place = core.CUDAPlace(0) if use_cuda else core.CPUPlace() exe = executor.Executor(place) exe.run(startup)
- ParallelExecutor里不管输入的Program是什么,【始终】创建grad vars和插入用于grad聚合的
NCCLAllReduceOp
。- 插入Grad聚合代码:
-
- 创建
MultiDevSSAGraphBuilder
: https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/framework/parallel_executor.cc#L75
- 创建
-
- 存在的问题:并行测试并不需要这些操作
- 插入Grad聚合代码:
- ParallelExecutor不支持Python data reader。Recordio
参考代码 https://github.com/dzhwinter/benchmark/pull/91/files
train的Program定义如下, 训练数据路径是./flowers.train.recordio
,以及相关的var是train Program的一部分。问题是:test时,数据路径是with fluid.program_guard(main, startup): reader = fluid.layers.open_recordio_file( filename='./flowers.train.recordio', shapes=[[-1, 3, 224, 224], [-1, 1]], lod_levels=[0, 0], dtypes=['float32', 'int64']) image, label = fluid.layers.read_file(reader) prediction, avg_cost, accuracy, accuracy5 = net_conf(image, label, class_dim)
'./flowers.test.recordio'
,不同于train, 如何获取test Program?
fluid.ParallelExecutor
的输入有个loss_name
,test时如何指定?
代码在:https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/parallel_executor.py#L24 。
看代码https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/framework/details/multi_devices_graph_builder.cc#L94 ,loss_name
似乎是用来分割forward ops和backward ops,插入Grad聚合op的一个辅助变量。
Metadata
Metadata
Assignees
Labels
No labels