Skip to content

Distributed training VGG16 benchmark questions #8500

@helinwang

Description

@helinwang

In Distributed training VGG16 benchmark

  • Does Single Node Single Thread mean single node, if so, the PServer Count: 10, Trainer Count: 20 below better be removed, since it's confusing.
    • Answer: yes it means single node, removed the confusing part from the doc.
  • In Different Batch Size, there is Per trainer CPU Core: 1, why only 1 core is used, we probably should be utilizing all the computing resource of one node.
    • Answer: actually it means the test is using MKL_NUM_THREADS=1 (doc is now updated), the entire CPU is available for the trainer process. Detail please see here. @typhoonzero will update the GPU results.
  • Does 78.64% in Accelerate Rate means the performance compared to ideal scenario?
    • Answer: yes.
  • In Accelerate Rate, PaddlePaddle v2 is much better than Fluid in the trainer count = 20 case, do you know why is it so? Same question for the metric in Different Pserver Count, V2 is much better than Fluid.
    • Answer: No, need to figure it out.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions