Cpu mpi broadcast #5726

lixinqi · 2021-08-04T07:22:42Z

to_consistent支持cpu。主要技术内容：

实现底层的eager_nccl_broadcast op。以后会命名为ccl_broadcast。
eager_nccl_broadcast kernel里调用底层新实现的ccl::Broadcast操作，后者的接口模仿nccl。
重构 eager_nccl_broadcast op 的DeviceInferFn。

lixinqi · 2021-08-04T07:25:12Z

oneflow/core/ccl/ccl.cpp

+                                        DeviceCtx* ctx) {
+  CHECK_EQ_OR_RETURN(parallel_desc->device_type(), DeviceType::kCPU);
+  static thread_local std::vector<int64_t> rank_heap{};
+  InitBroadcastRankHeap(&rank_heap, *parallel_desc, root);


把一组rank看成二叉树堆。root就是根节点，每个rank从父节点拷贝数据，再拷贝到两个子节点。
以后可以重构成更快的方式。

lixinqi · 2021-08-04T07:26:51Z

oneflow/user/ops/eager_nccl_ops.cpp

+    static thread_local const auto& nccl_device = Device::New("nccl");
+    return nccl_device;
+  } else if (input_device->type() == "cpu") {
+    return input_device;


虚拟机里的cpu stream原本就是异步的。

clackhan · 2021-08-04T09:47:47Z

oneflow/user/ops/eager_nccl_ops.cpp

-          .Broadcast(user_op::OpArg("out", 0))
-          .Build();
-      return Maybe<void>::Ok();
+      UNIMPLEMENTED_THEN_RETURN() << "consistent tensor are not supported";


这个地方eager boxing（p2b）会推导sbp，不能这么写吧

我复原吧

…to cpu_mpi_broadcast

oneflow/core/ccl/ccl.cpp

daquexian · 2021-08-04T13:00:53Z

oneflow/core/framework/rpc_util.cpp

+  const auto& ForEachRank = [&](const std::function<Maybe<void>(int64_t)>& DoEach) -> Maybe<void> {
+    return rank_group->ForEachRank(DoEach);
+  };
+  return AccessToOtherRanks<SendOrRecv, Prepare>(ForEachRank, token, ctx);


这里的 ForEachRank，实际想做的事情是返回 DoEach 所应用的 rank 的列表？所以如果直接传一个列表代替 ForEachRank 会更清晰吧

ForEachRank本就等同于列表。不用列表是因为1）省去对象构造；2）省去容器类型选择；3）省去迭代过程。

oneflow/core/framework/rpc_util.cpp

oneflow/core/ccl/ccl.cpp

daquexian · 2021-08-04T13:15:18Z

oneflow/user/kernels/eager_nccl_kernels.cpp

+
+}  // namespace
+
+class EagerCclBroadcastKernel final : public user_op::OpKernel {


这里和 EagerNcclBroadcastKernel 的区别就是少了一个 N，感觉容易看混 😂 有没有必要换一个名字

它们内容本来就不一样。我后面还想把这些名字上的nccl直接换成ccl。

我后面还想把这些名字上的nccl直接换成ccl。
可以的，这样就不会看混了

yuanms2 · 2021-08-05T00:27:07Z

oneflow/core/ccl/ccl.cpp

+        *Cb = [] {};
+        return Maybe<void>::Ok();
+      });
+  JUST(RpcUtil::ReceiveDataFromParentInHeap(rank_heap, rpc_token, &rpc_ctx));


现在使用rpc实现broadcast，以后可以考虑用commnet 支持

这里就是调用commnet模块。
看来是Rpc这个字眼造成误解。我把它重命名为Transport

clackhan · 2021-08-05T11:02:33Z

oneflow/user/kernels/eager_nccl_kernels.cpp

+  void Init(user_op::KernelInitContext* ctx) {
+    const std::string& parallel_conf_txt = ctx->Attr<std::string>("parallel_conf");
+    ParallelConf parallel_conf;
+    std::set<std::pair<int64_t, int64_t>> device_set;


这个变量没有用

clackhan · 2021-08-05T11:04:51Z

oneflow/user/ops/eager_nccl_ops.cpp

-          .Broadcast(user_op::OpArg("out", 0))
-          .Build();
-      return Maybe<void>::Ok();
+      UNIMPLEMENTED_THEN_RETURN() << "consistent tensor are not supported";


这里需要还原，consistent代码也会用到这个op

clackhan · 2021-08-05T11:06:45Z

oneflow/core/ccl/ccl.cpp

+  CHECK_EQ_OR_RETURN(parallel_desc->device_type(), DeviceType::kCPU);
+  static thread_local std::vector<int64_t> rank_heap{};
+  JUST(InitBroadcastRankHeap(&rank_heap, *parallel_desc, root));
+  TransportToken rpc_token = TransportToken::NewDataTransportToken();


rpc_token改成transport_token吧，包括下面的rpc_ctx

daquexian · 2021-08-05T11:42:18Z

oneflow/core/framework/transport_util.cpp

+  Optional<int64_t> current_rank_index{};
+  for (int i = 0; i < rank_heap.size(); ++i) {
+    if (rank_heap.at(i) == GlobalProcessCtx::Rank()) {
+      current_rank_index = i;


这里直接 return i，在 for 循环之后 return error？

daquexian · 2021-08-05T13:32:27Z

oneflow/core/ccl/ccl.cpp

+template<>
+Maybe<void> Broadcast<DeviceType::kCPU>(const void* in, void* out, size_t elem_cnt, DataType dtype,
+                                        int64_t root, Symbol<ParallelDesc> parallel_desc,
+                                        DeviceCtx* ctx) {


这个参数没有用到

是，没有用到。我是为了对齐接口，因为以后这里会带一个模板参数，cpu, nccl等功能都走一个函数调用。

daquexian · 2021-08-05T14:05:31Z

oneflow/user/ops/eager_nccl_ops.cpp

@@ -83,7 +91,10 @@ REGISTER_USER_OP("eager_nccl_reduce")
      *ctx->OutputShape("out", 0) = ctx->InputShape("in", 0);
      return Maybe<void>::Ok();
    })
-    .SetGetSbpFn(user_op::GetSbpFnUtil::DefaultBroadcastToBroadcast)


这里需要还原吗

这个不用

这里暂不还原了，因为原本的逻辑不对。

daquexian · 2021-08-05T15:04:31Z

oneflow/core/ccl/ccl.cpp

+        return Maybe<void>::Ok();
+      });
+  JUST(TransportUtil::ReceiveDataFromParentInHeap(rank_heap, transport_token, &transport_ctx));
+  JUST(TransportUtil::WaitUntilDoneOrTimeout(transport_ctx, TransportUtil::TimeoutSeconds()));


😂 我在基于这个分支实现 flow.load 时从 src rank 广播 shape。发现这里是不是不对，执行到这里的时候，只有 recv 没有 send 吧，应该只在 72 行那里等待？我确实会在不删掉这一行的时候 block 住，删掉这一行就好了

github-actions · 2021-08-06T05:38:57Z

Speed stats:

GPU Name: GeForce GTX 1080 

PyTorch resnet50 time: 139.7ms (= 6985.7ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 127.9ms (= 6394.1ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
Relative speed: 1.09 (= 139.7ms / 127.9ms)

PyTorch resnet50 time: 85.6ms (= 4280.7ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 74.1ms (= 3705.3ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
Relative speed: 1.16 (= 85.6ms / 74.1ms)

PyTorch resnet50 time: 57.3ms (= 2866.4ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 47.2ms (= 2359.0ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
Relative speed: 1.22 (= 57.3ms / 47.2ms)

PyTorch resnet50 time: 47.6ms (= 2382.2ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 40.0ms (= 1999.2ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
Relative speed: 1.19 (= 47.6ms / 40.0ms)

PyTorch resnet50 time: 44.2ms (= 2210.2ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 44.4ms (= 2221.6ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
Relative speed: 0.99 (= 44.2ms / 44.4ms)

lixinqi added 5 commits August 3, 2021 22:51

Rename CpuStreamType to AsyncCpuStreamType

b68aa45

Rename CpuStreamType to AsyncCpuStreamType

889448f

backup code

c051ba4

Merge branch 'master' into cpu_mpi_broadcast

6d482b9

cpu broadcast

51e1d2d

lixinqi added enhancement automerge eager labels Aug 4, 2021

lixinqi requested review from liujuncheng, daquexian, clackhan and oneflow-ci-bot August 4, 2021 07:22

lixinqi commented Aug 4, 2021

View reviewed changes

Merge branch 'master' into cpu_mpi_broadcast

0a3928b

clackhan reviewed Aug 4, 2021

View reviewed changes

lixinqi added 2 commits August 4, 2021 17:54

Merge branch 'master' into cpu_mpi_broadcast

b30e056

Merge branch 'cpu_mpi_broadcast' of github.com:Oneflow-Inc/oneflow in…

26a5d68

…to cpu_mpi_broadcast

daquexian reviewed Aug 4, 2021

View reviewed changes

lixinqi added 3 commits August 4, 2021 23:46

Merge branch 'master' into cpu_mpi_broadcast

d9b1a16

backup code

57fb6c0

inplace call broadcast when current rank is root

d470a36

yuanms2 reviewed Aug 5, 2021

View reviewed changes

lixinqi added 2 commits August 5, 2021 17:54

merge master

0b4432e

Merge branch 'master' into cpu_mpi_broadcast

c96b6c0

clackhan reviewed Aug 5, 2021

View reviewed changes

daquexian reviewed Aug 5, 2021

View reviewed changes

address pr comments

906443c

daquexian reviewed Aug 5, 2021

View reviewed changes

refactor to

a742ac4

daquexian reviewed Aug 5, 2021

View reviewed changes

daquexian approved these changes Aug 5, 2021

View reviewed changes

daquexian reviewed Aug 5, 2021

View reviewed changes

daquexian added automerge and removed automerge labels Aug 5, 2021

hjchen2 and others added 2 commits August 6, 2021 00:24

Fix compilation warning for old version gcc (#5757)

df95197

Merge branch 'master' into cpu_mpi_broadcast

fc7e0b0

oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 5, 2021 16:28

Merge branch 'master' into cpu_mpi_broadcast

7639a11

oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 5, 2021 20:32

Merge branch 'master' into cpu_mpi_broadcast

3b89950

oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 5, 2021 22:52

Merge branch 'master' into cpu_mpi_broadcast

234caf0

oneflow-ci-bot self-requested a review August 6, 2021 03:32

oneflow-ci-bot merged commit 9e07e2c into master Aug 6, 2021

oneflow-ci-bot deleted the cpu_mpi_broadcast branch August 6, 2021 05:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cpu mpi broadcast #5726

Cpu mpi broadcast #5726

lixinqi commented Aug 4, 2021

lixinqi Aug 4, 2021

lixinqi Aug 4, 2021

clackhan Aug 4, 2021 •

edited

Loading

lixinqi Aug 4, 2021

daquexian Aug 4, 2021

lixinqi Aug 4, 2021

daquexian Aug 4, 2021

lixinqi Aug 4, 2021

daquexian Aug 4, 2021

yuanms2 Aug 5, 2021

lixinqi Aug 5, 2021

yuanms2 Aug 5, 2021

clackhan Aug 5, 2021

clackhan Aug 5, 2021

clackhan Aug 5, 2021

daquexian Aug 5, 2021

lixinqi Aug 5, 2021

daquexian Aug 5, 2021

lixinqi Aug 5, 2021

daquexian Aug 5, 2021

clackhan Aug 5, 2021

lixinqi Aug 5, 2021 •

edited

Loading

daquexian Aug 5, 2021

github-actions bot commented Aug 6, 2021


		} // namespace

		class EagerCclBroadcastKernel final : public user_op::OpKernel {

Cpu mpi broadcast #5726

Cpu mpi broadcast #5726

Conversation

lixinqi commented Aug 4, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

clackhan Aug 4, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lixinqi Aug 5, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Aug 6, 2021

clackhan Aug 4, 2021 •

edited

Loading

lixinqi Aug 5, 2021 •

edited

Loading