Skip to content

Conversation

typhoonzero
Copy link
Contributor

@typhoonzero typhoonzero commented Apr 8, 2018

Resolves #8139

Sample code to run multi GPU distributed training:

def train_loop_parallel(use_gpu, trainer_prog, trainer_id=0, bcast=False):
        place = core.CPUPlace() if not use_gpu else core.CUDAPlace(0)
        startup_exe = fluid.Executor(place)
        startup_exe.run(fluid.default_startup_program())
        exe = fluid.ParallelExecutor(use_gpu, avg_cost.name)

        feeder = fluid.DataFeeder(place=place, feed_list=[images, label])

        for pass_id in range(args.num_passes):
            for batch_id, data in enumerate(train_reader()):
                print("before run one...")
                loss, = exe.run(
                        [avg_cost.name],
                        feed_dict=feeder.feed(data))
                if bcast:
                    exe.bcast_params()
                print("Pass %d, batch %d, loss %s" % (pass_id, batch_id, np.array(loss)))

@typhoonzero typhoonzero changed the title [WIP] [Feature] Enable multi gpu distributed training of fluid [Feature] Enable multi gpu distributed training of fluid Apr 11, 2018
@typhoonzero typhoonzero merged commit 652cf43 into PaddlePaddle:develop Apr 11, 2018
@typhoonzero typhoonzero deleted the multigpumultinode branch April 11, 2018 10:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants