Skip to content

Conversation

@gongweibao
Copy link
Contributor

@gongweibao gongweibao commented Jun 8, 2019

Fix sync_batch_norm_op ncclallreduce error
When nccl inits nccl comm using ncclCommInitAll, it meets error when allreduce ophandle and sync_batch_norm_op use ncclallreduce parallelly. So create a new nccl comm for sync_batch_norm_op. And these codes should be polished with a unified nccl management.

Fix PaddlePaddle/models#2338

heavengate
heavengate previously approved these changes Jun 8, 2019
Copy link
Contributor

@heavengate heavengate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gongweibao gongweibao merged commit dd4cd35 into PaddlePaddle:develop Jun 8, 2019
@gongweibao gongweibao deleted the fixncclconn branch June 8, 2019 13:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

yolov3 8卡训练报错

2 participants