Skip to content

Conversation

ziyoujiyi
Copy link
Contributor

@ziyoujiyi ziyoujiyi commented Mar 25, 2022

PR types

Others

PR changes

Others

Describe

add new features in HeterClient and HeterServer

  • support send&recv with shard&scope
  • support send&listen with shard&scope
  • add unittest

@paddle-bot-old
Copy link

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@ziyoujiyi ziyoujiyi requested review from zmxdream and removed request for zmxdream March 25, 2022 13:38
Copy link
Contributor

@zmxdream zmxdream left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

DEFINE_int32(pserver_sparse_table_shard_num, 1000,
"sparse table shard for save & load");

DEFINE_int32(heter_world_size, 100, "group size"); // 可配置
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个配置是什么意思,注释写详细点?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nccl 通信组的个数

heter_service_proto fleet_executor ${BRPC_DEP})
set(DISTRIBUTE_COMPILE_FLAGS "-Wno-non-virtual-dtor -Wno-error=non-virtual-dtor -Wno-error=delete-non-virtual-dtor")
set(DISTRIBUTE_COMPILE_FLAGS "-Wno-non-virtual-dtor -Wno-error=non-virtual-dtor -Wno-error=delete-non-virtual-dtor -Wno-error=parentheses")
if (CMAKE_CXX_COMPILER_VERSION VERSION_GREATER 7.0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

加上这个,非heter下还能编译过吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

能编过,消除所依赖的第三方库的代码带来的编译告警

// std::vector<std::shared_ptr<brpc::Controller>> _cntls;
};

class HeterClient {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

没继承基类?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个类一开始就是个独立的类,它没所谓的table,还有很多定制化的接口,var存储在scope里的。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个client其实不是ps通信的,worker之间的

Copy link
Contributor

@chenwhql chenwhql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for PADDLE_ENFORCE

Copy link
Contributor

@zhwesky2010 zhwesky2010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for change heter_cloud_comm_cpu_test

Copy link
Contributor

@XiaoguangHu01 XiaoguangHu01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fuyinno4 fuyinno4 merged commit 2f41f38 into PaddlePaddle:develop Mar 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants