ParallelDo performance on VGG

Major take aways:
1. Parameter copy is still a big bottleneck (large net large VGG16, Memcpy takes up to 80%)
1. We do need multiple streams (AllReduce Kernel takes up 70% of the total kernel time)
1. NCCLInit should not be called at every iteration. It takes about 70ms for one GPU and 90ms for four GPUs.

### Background
[test script](https://gist.github.com/tonyyang-svail/13592402aa98b2a270767637cf26c7dd), [command line](https://gist.github.com/tonyyang-svail/13592402aa98b2a270767637cf26c7dd#gistcomment-2368100)

Net: VGG16
Model Size: 409M (The original definition of VGG16 net is incorrect https://github.com/PaddlePaddle/Paddle/issues/8718)
Batchsize: 16 for each GPU
BatchNorm: OFF. Since parallel_do doesn't support this.

Inputs are randomly generated on each GPU. So no overhead on copying training data to different devices

### Result
Time unit: milliseconds.

| # GPUs | copy weights (ms) | forward and backward (ms) | merge gradient (ms) | apply gradient (ms) | total |
|------|----------------|------------|----------------|----------------|-------|
| 1    | N/A              | 130     | N/A              | 5              |   |
| 1  NCCL in backward  | N/A              | 220     | N/A              | 5              |   |
| 4    | 350             | 130     | 350              | 5              |   |
| 4 NCCL in backward   | 350             | 650(AllReduce takes about 70%)     | N/A              | 5              |  |




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ParallelDo performance on VGG #8719

Background

Result

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

# GPUs	copy weights (ms)	forward and backward (ms)	merge gradient (ms)	apply gradient (ms)
1	N/A	130	N/A	5
1 NCCL in backward	N/A	220	N/A	5
4	350	130	350	5
4 NCCL in backward	350	650(AllReduce takes about 70%)	N/A	5

ParallelDo performance on VGG #8719

Description

Background

Result

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions