Skip to content

Conversation

@hitywt
Copy link

@hitywt hitywt commented Oct 7, 2023

PR types

Others

PR changes

Others

Description

Pcard-70448
merge distributed communication related functions into develop

  1. Add lazy comm initialization function
  2. Add comm hang related debug
  3. Adapt dependency path from fluid to phi, and adapt related file path changes.

@paddle-bot
Copy link

paddle-bot bot commented Oct 7, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@hitywt hitywt force-pushed the merge_comm_into_develop branch from 4602d7c to 04aabd8 Compare October 9, 2023 08:32

#if defined(PADDLE_WITH_RCCL)
#include "paddle/phi/backends/dynload/rccl.h"
#else
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里应该判断一下#elif defined(PADDLE_WITH_NCCL)
image

Copy link
Author

@hitywt hitywt Oct 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里应该判断一下#elif defined(PADDLE_WITH_NCCL) image

好的,这是冲突之后出现的,已修复

const phi::DenseTensor& in_tensor UNUSED,
const BroadcastOptions& opts UNUSED,
bool sync_op UNUSED) {
bool sync_op UNUSED UNUSED) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里为啥用了两次UNUSED?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里为啥用了两次UNUSED?

已修复

const BroadcastOptions& opts UNUSED,
bool sync_op UNUSED,
bool use_calc_stream UNUSED) {
bool use_calc_stream) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的UNUSED是不是还得加上?后面报错了"ProcessGroup does not support broadcast with sync_op and use_calc_stream flag."后面的Broadcast看着是支持sync_op参数的,需要把sync_opUNUSED删除吗?

其他的通信函数的sync_op好像也有一样的问题

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的UNUSED是不是还得加上?后面报错了"ProcessGroup does not support broadcast with sync_op and use_calc_stream flag."后面的Broadcast看着是支持sync_op参数的,需要把sync_opUNUSED删除吗?

其他的通信函数的sync_op好像也有一样的问题
UNUSED是在基类上加的,实例找不到实现时才会报错。这几个UNUSED这是develop分支加的,后续确认一下

Copy link
Contributor

@GhostScreaming GhostScreaming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@LiYuRio LiYuRio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@ForFishes ForFishes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@zhiqiu zhiqiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhiqiu zhiqiu merged commit e9fca77 into PaddlePaddle:develop Oct 23, 2023
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Oct 24, 2023
* fix merge conflicts

fix compile

tinyfix

* fix

* format code style

* fix code style

* fix PR-CI-APPROVAL problems

* fix PR-CI-APPROVAL problems

* fix PR-CI-Py3

* fix PR-CI-Codestyle-Check

* fix PR-CI-Codestyle-Check

* fix PR-CI-Kunlun-R200

* fix PR-CI-Codestyle-Check

* update

* fix ci-compile

* fix PR-CI-Coverage

* fix PR-CI-Coverage

* fix code style

* fix PR-CI-Kunlun-R200

* fix conflicts

* fix PR-CI-Kunlun-R200

* fix PR-CI-Py3

* fix PR-CI-Static-Check

* update trace func

* fix code style

* update

* fix compile

* fix compile

* cherry pick (PaddlePaddle#57260)

* tiny update

* tiny fix
jiahy0825 pushed a commit to jiahy0825/Paddle that referenced this pull request Oct 26, 2023
* fix merge conflicts

fix compile

tinyfix

* fix

* format code style

* fix code style

* fix PR-CI-APPROVAL problems

* fix PR-CI-APPROVAL problems

* fix PR-CI-Py3

* fix PR-CI-Codestyle-Check

* fix PR-CI-Codestyle-Check

* fix PR-CI-Kunlun-R200

* fix PR-CI-Codestyle-Check

* update

* fix ci-compile

* fix PR-CI-Coverage

* fix PR-CI-Coverage

* fix code style

* fix PR-CI-Kunlun-R200

* fix conflicts

* fix PR-CI-Kunlun-R200

* fix PR-CI-Py3

* fix PR-CI-Static-Check

* update trace func

* fix code style

* update

* fix compile

* fix compile

* cherry pick (PaddlePaddle#57260)

* tiny update

* tiny fix
danleifeng pushed a commit to danleifeng/Paddle that referenced this pull request Nov 14, 2023
* fix merge conflicts

fix compile

tinyfix

* fix

* format code style

* fix code style

* fix PR-CI-APPROVAL problems

* fix PR-CI-APPROVAL problems

* fix PR-CI-Py3

* fix PR-CI-Codestyle-Check

* fix PR-CI-Codestyle-Check

* fix PR-CI-Kunlun-R200

* fix PR-CI-Codestyle-Check

* update

* fix ci-compile

* fix PR-CI-Coverage

* fix PR-CI-Coverage

* fix code style

* fix PR-CI-Kunlun-R200

* fix conflicts

* fix PR-CI-Kunlun-R200

* fix PR-CI-Py3

* fix PR-CI-Static-Check

* update trace func

* fix code style

* update

* fix compile

* fix compile

* cherry pick (PaddlePaddle#57260)

* tiny update

* tiny fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants