[Speed] Refine parallel_do_grad #9072

chengduoZH · 2018-03-14T11:00:35Z

chengduoZH · 2018-03-14T11:16:38Z

model: SE-ResNet-150
Input: 3 x 224 x 224
batch_size: 12
Profiling machine

	ncclAllReduce	ncclReduce
2 cards	0.976438038	0.978825315	0.99756108
3 cards	1.136392013	1.074367762	1.057730931
4 cards	1.274742246	1.170658557	1.088910373

the result shows that, as the number of cards increasing, ncclReduce is faster than ncclAllReduce for parallel_do_grad.

tonyyang-svail

Looks good to me.

refine parallel_do_grad

ef28e7d

chengduoZH requested review from QiJune, reyoung and tonyyang-svail March 14, 2018 11:00

tonyyang-svail approved these changes Mar 14, 2018

View reviewed changes

chengduoZH merged commit 11c43e5 into PaddlePaddle:develop Mar 15, 2018

chengduoZH changed the title ~~Refine parallel_do_grad~~ [Speed] Refine parallel_do_grad Mar 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Speed] Refine parallel_do_grad #9072

[Speed] Refine parallel_do_grad #9072

Uh oh!

chengduoZH commented Mar 14, 2018 •

edited

Loading

Uh oh!

chengduoZH commented Mar 14, 2018

Uh oh!

tonyyang-svail left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Speed] Refine parallel_do_grad #9072

[Speed] Refine parallel_do_grad #9072

Uh oh!

Conversation

chengduoZH commented Mar 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chengduoZH commented Mar 14, 2018

Uh oh!

tonyyang-svail left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chengduoZH commented Mar 14, 2018 •

edited

Loading