Skip to content

Conversation

qingqing01
Copy link
Contributor

@qingqing01 qingqing01 commented Apr 4, 2018

Fix #9571

  • Use two ParallelExecutors, one for training, one for testing.
  • The ParallelExecutor for testing is shared local scopes with training.
  • When testing during training, set run_startup False.
  • There is no need to set loss_name for testing ParallelExecutor.

Now, following testing code can run successfully. The correctness will be verified later.

The usage is as follows:

    image, label = fluid.layers.read_file(data_file)
    avg_cost, accuracy, accuracy5 = net_conf(image, label, class_dim)
    test_program = fluid.default_main_program().clone(for_test=True)

    optimizer = fluid.optimizer.Momentum(
        learning_rate=fluid.layers.piecewise_decay(
            boundaries=[100], values=[0.1, 0.2]),
        momentum=0.9,
        regularization=fluid.regularizer.L2Decay(1e-4))
    opts = optimizer.minimize(avg_cost)

    exe = fluid.ParallelExecutor(loss_name=avg_cost.name,
                                     use_cuda=True)
    test_exe = fluid.ParallelExecutor(use_cuda=True,
                                     main_program=test_program,
                                     run_startup=False,
                                     local_scopes=exe.local_scopes())
    def test():
        for i in xrange(10):
            loss, top1, top5 = test_exe.run([avg_cost.name, accuracy.name, accuracy5.name])
            l,t1,t5 = np.mean(np.array(loss)), np.mean(np.array(top1)), np.mean(np.array(top5))
            print('Test Loss {0}, Top1 {1}, Top5 {2}'.format(l, t1, t5))

    batch_id = 0
    time_record = []
    for i in xrange(20):
        loss, = exe.run([avg_cost.name])
        loss_v = np.mean(np.array(loss))
        print('Batch {0}, Loss {1}'.format(batch_id, loss_v))
        if batch_id % 10 == 0:
            test()
        batch_id += 1

@qingqing01 qingqing01 requested review from panyx0718 and reyoung April 4, 2018 11:09
Copy link
Contributor

@panyx0718 panyx0718 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG Overall.

Have some thoughts about the API:
Currently, ParallelExecutor has so many arguments. It's not easy for user the know which one to set for train or inference.
How about having ParallelTrainExecuor and ParallelInferExecutor that wraps ParallelExecutor.

I don't quite like sharing local_scopes. It's not possible for normal program and user has no idea what it is and what's the effect. How about:
ParallelInferExecutor(share_vars_from=train_executor)

// Create local scopes
for (size_t i = 0; i < member_->places_.size(); ++i) {
member_->local_scopes_.push_back(&scope->NewScope());
if (local_scopes.size() == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local_scopes.empty()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

loss_name,
use_cuda,
loss_name=None,
use_cuda=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be True or False?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modify the interface.

main_program=None,
startup_program=None,
local_scopes=None,
run_startup=True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If startup_program is None, then startup is not run?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

startup_program is always used, even for the parallel testing, the code is here https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/framework/parallel_executor.cc#L69

Copy link
Contributor Author

@qingqing01 qingqing01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite like sharing local_scopes. It's not possible for normal program and user has no idea what it is and what's the effect. How about:
ParallelInferExecutor(share_vars_from=train_executor)

Done.

main_program=None,
startup_program=None,
local_scopes=None,
run_startup=True):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

startup_program is always used, even for the parallel testing, the code is here https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/framework/parallel_executor.cc#L69

// Create local scopes
for (size_t i = 0; i < member_->places_.size(); ++i) {
member_->local_scopes_.push_back(&scope->NewScope());
if (local_scopes.size() == 0) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

loss_name,
use_cuda,
loss_name=None,
use_cuda=None,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modify the interface.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants