Support testing during training by ParallelExecutor. #9656

qingqing01 · 2018-04-04T11:09:49Z

Use two ParallelExecutors, one for training, one for testing.
The ParallelExecutor for testing is shared local scopes with training.
When testing during training, set run_startup False.
There is no need to set loss_name for testing ParallelExecutor.

Now, following testing code can run successfully. The correctness will be verified later.

The usage is as follows:

    image, label = fluid.layers.read_file(data_file)
    avg_cost, accuracy, accuracy5 = net_conf(image, label, class_dim)
    test_program = fluid.default_main_program().clone(for_test=True)

    optimizer = fluid.optimizer.Momentum(
        learning_rate=fluid.layers.piecewise_decay(
            boundaries=[100], values=[0.1, 0.2]),
        momentum=0.9,
        regularization=fluid.regularizer.L2Decay(1e-4))
    opts = optimizer.minimize(avg_cost)

    exe = fluid.ParallelExecutor(loss_name=avg_cost.name,
                                     use_cuda=True)
    test_exe = fluid.ParallelExecutor(use_cuda=True,
                                     main_program=test_program,
                                     run_startup=False,
                                     local_scopes=exe.local_scopes())
    def test():
        for i in xrange(10):
            loss, top1, top5 = test_exe.run([avg_cost.name, accuracy.name, accuracy5.name])
            l,t1,t5 = np.mean(np.array(loss)), np.mean(np.array(top1)), np.mean(np.array(top5))
            print('Test Loss {0}, Top1 {1}, Top5 {2}'.format(l, t1, t5))

    batch_id = 0
    time_record = []
    for i in xrange(20):
        loss, = exe.run([avg_cost.name])
        loss_v = np.mean(np.array(loss))
        print('Batch {0}, Loss {1}'.format(batch_id, loss_v))
        if batch_id % 10 == 0:
            test()
        batch_id += 1

panyx0718

LG Overall.

Have some thoughts about the API:
Currently, ParallelExecutor has so many arguments. It's not easy for user the know which one to set for train or inference.
How about having ParallelTrainExecuor and ParallelInferExecutor that wraps ParallelExecutor.

I don't quite like sharing local_scopes. It's not possible for normal program and user has no idea what it is and what's the effect. How about:
ParallelInferExecutor(share_vars_from=train_executor)

panyx0718 · 2018-04-04T11:59:20Z

paddle/fluid/framework/parallel_executor.cc

  // Create local scopes
-  for (size_t i = 0; i < member_->places_.size(); ++i) {
-    member_->local_scopes_.push_back(&scope->NewScope());
+  if (local_scopes.size() == 0) {


local_scopes.empty()

panyx0718 · 2018-04-04T12:01:45Z

python/paddle/fluid/parallel_executor.py

-                 loss_name,
-                 use_cuda,
+                 loss_name=None,
+                 use_cuda=None,


should this be True or False?

Modify the interface.

panyx0718 · 2018-04-04T12:02:33Z

python/paddle/fluid/parallel_executor.py

+                 main_program=None,
+                 startup_program=None,
+                 local_scopes=None,
+                 run_startup=True):


If startup_program is None, then startup is not run?

startup_program is always used, even for the parallel testing, the code is here https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/framework/parallel_executor.cc#L69

qingqing01

I don't quite like sharing local_scopes. It's not possible for normal program and user has no idea what it is and what's the effect. How about:
ParallelInferExecutor(share_vars_from=train_executor)

Done.

qingqing01 · 2018-04-08T07:24:48Z

python/paddle/fluid/parallel_executor.py

+                 main_program=None,
+                 startup_program=None,
+                 local_scopes=None,
+                 run_startup=True):


startup_program is always used, even for the parallel testing, the code is here https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/framework/parallel_executor.cc#L69

qingqing01 · 2018-04-08T07:24:54Z

paddle/fluid/framework/parallel_executor.cc

  // Create local scopes
-  for (size_t i = 0; i < member_->places_.size(); ++i) {
-    member_->local_scopes_.push_back(&scope->NewScope());
+  if (local_scopes.size() == 0) {


qingqing01 · 2018-04-08T07:25:09Z

python/paddle/fluid/parallel_executor.py

-                 loss_name,
-                 use_cuda,
+                 loss_name=None,
+                 use_cuda=None,


Modify the interface.

Support testing during training by ParallelExecutor.

1d8dbb0

qingqing01 requested review from panyx0718 and reyoung April 4, 2018 11:09

panyx0718 reviewed Apr 4, 2018

View reviewed changes

qingqing01 commented Apr 8, 2018

View reviewed changes

qingqing01 mentioned this pull request Apr 8, 2018

Support testing during training by ParallelExecutor. #9738

Merged

qingqing01 closed this Apr 9, 2018

qingqing01 deleted the parallel_exe branch November 14, 2019 05:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support testing during training by ParallelExecutor. #9656

Support testing during training by ParallelExecutor. #9656

Uh oh!

qingqing01 commented Apr 4, 2018 •

edited

Loading

Uh oh!

panyx0718 left a comment

Uh oh!

panyx0718 Apr 4, 2018

Uh oh!

qingqing01 Apr 8, 2018

Uh oh!

panyx0718 Apr 4, 2018

Uh oh!

qingqing01 Apr 8, 2018

Uh oh!

panyx0718 Apr 4, 2018

Uh oh!

qingqing01 Apr 8, 2018

Uh oh!

qingqing01 left a comment

Uh oh!

qingqing01 Apr 8, 2018

Uh oh!

qingqing01 Apr 8, 2018

Uh oh!

qingqing01 Apr 8, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Support testing during training by ParallelExecutor. #9656

Support testing during training by ParallelExecutor. #9656

Uh oh!

Conversation

qingqing01 commented Apr 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

panyx0718 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qingqing01 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

qingqing01 commented Apr 4, 2018 •

edited

Loading