RankTaskGraph #9108

lixinqi · 2022-09-19T05:20:28Z

将TaskGraph的逻辑拆解成BoxingTaskGraph和RankTaskGraph。BoxingTaskGraph负责构建boxing相关的task graph子图，然后序列化到BoxingTaskGraphProto。RankTaskGraph负责两点：1）构建指定rank的CompTaskNode；2）从BoxingTaskGraphProto恢复属于boxing部分的子图；
分布式编译的大体过程将会是：

在main线程（或master进程）上由OpGraph构建BoxingTaskGraph，并序列化成BoxingTaskGraphProto；
在线程池里的各个worker线程（或worker进程）上由OpGraph/BoxingTaskGraphProto/rank等信息构建属于该rank的RankTaskGraph，然后生成该rank的plan。

本pr实现的是分离编译的中间状态版本：即BoxingTaskGraph在main线程上构建，而RankTaskGraph在线程池里构建。
后续pr再实现彻底的分离编译，即BoxingTaskGraph在master进程上构建，而RankTaskGraph在worker进程上构建。

lixinqi · 2022-09-23T12:34:54Z

提供了环境变量供切换：

ONEFLOW_LAZY_COMPILE_MODE=naive 旧版编译方式，全rank编译。
ONEFLOW_LAZY_COMPILE_MODE=rank_per_thread 多线程分离编译，每个rank放在独立的线程里。
ONEFLOW_LAZY_COMPILE_MODE=rank_per_iter 单线程分离编译，每个rank放在main线程的每次循环里。

如果多线程分离编译遇到bug，请回到单线程分离编译再跑一次。

lixinqi · 2022-09-28T04:12:16Z

oneflow/core/graph/task_graph.cpp

@@ -709,12 +727,19 @@ void TaskGraph::EnableInplaceMemSharing(
    const std::function<bool(const std::string&, const std::string&)>&
        IsOpNameDataOrCtrlReachable) {
  ForEachGpuDeviceNodes([&](const HashSet<TaskNode*>& dev_nodes) {
-    InplaceObasInfo safe_inplace_obas_info;


该逻辑已迁移到

void TaskGraph::EnableInplaceMemSharing( const HashSet<TaskNode*>& dev_nodes, const std::function<bool(const std::string&, const std::string&)>& IsOpNameDataOrCtrlReachable);

lixinqi · 2022-09-28T04:12:58Z

oneflow/core/graph/task_graph.cpp

 void TaskGraph::ConnectCtrlEdges(const std::vector<CompTaskNode*>& src_task_nodes,
                                 const std::vector<CompTaskNode*>& dst_task_nodes) {
  CHECK_EQ(src_task_nodes.size(), dst_task_nodes.size());
  FOR_RANGE(int32_t, i, 0, src_task_nodes.size()) {
-    std::string regst_desc_name;


该逻辑已移至

void TaskGraph::ConnectCtrlEdge(CompTaskNode* src_task_node, CompTaskNode* dst_task_node);

lixinqi · 2022-09-28T04:18:26Z

oneflow/core/graph/task_graph.h

@@ -94,9 +104,6 @@ class TaskGraph final : public Graph<TaskNode, TaskEdge> {
          IsOpNameDataOrCtrlReachable) const;
  void SetTaskRegstInplaceInfo(const InplaceObasInfo& obas_info,
                               const HashSet<TaskNode*>& dev_nodes) const;
-  void ForEachGpuDeviceNodes(


该函数并没有删除，而是变成了public

github-actions · 2022-11-22T08:19:19Z

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

into rank_task_graph

github-actions · 2022-11-22T08:36:55Z

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

lixinqi · 2022-11-23T06:15:18Z

oneflow/core/framework/nn_graph.cpp

+    std::vector<Plan> plans(GlobalProcessCtx::WorldSize());
+    JUST(OpGraph::WithSingleton(&job_, [&]() -> Maybe<void> {
+      Singleton<OpGraph>::Get()->UpdateCachedPredicatorIsReachable();
+      auto boxing_task_graph = JUST(BoxingTaskGraph::New());


BoxingTaskGraph相较于GlobalTaskGraph而言，它只包含骨干部分：1）boxing部分，所编译的TaskNode全是TransportTaskNode；2）与boxing相关的上下游ComputeTaskNode；

补集GlobalTaskGraph - BoxingTaskGraph里边的每一个TaskNode都是可以用Op + Sbp切割得到。

BoxingTaskGraph是所有rank的共识。

RankTaskGraph必须包含BoxingTaskGraph中本rank相关部分的根源是：

每个rank在编译时要独立推导shape, dtype。

每个rank在编译时要知晓上下游。

lixinqi · 2022-11-23T06:40:29Z

oneflow/core/framework/nn_graph.cpp

+      auto boxing_task_graph = JUST(BoxingTaskGraph::New());
+      // reachable collective boxing task pairs,
+      std::vector<HashSet<std::pair<int64_t /*src task_id*/, int64_t /*dst task_id*/>>>
+          reachable_cb_pairs{GlobalProcessCtx::WorldSize()};


收集每个rank上 collective boxing task的可达关系。

lixinqi · 2022-11-23T06:56:12Z

oneflow/core/graph/task_graph.cpp

+bool IsComputTaskNodeDutyRank(int64_t current_rank, const ParallelDesc& parallel_desc,
+                              int64_t task_node_rank) {
+  if (current_rank == 0) {
+    // make sure master knows at least one op_node.


at least one compute task node.

lixinqi · 2022-11-23T07:16:33Z

oneflow/core/framework/nn_graph.cpp

+          reachable_cb_pairs{GlobalProcessCtx::WorldSize()};
+      Loop(GlobalProcessCtx::WorldSize(), [&](size_t i) {
+        auto boxing_task_graph_proto = std::make_shared<BoxingTaskGraphProto>();
+        auto PickTaskNode = [&]() -> std::function<bool(TaskNode*)> {


TransportTaskNode

lixinqi · 2022-11-23T08:04:00Z

oneflow/core/graph_impl/acc_compute_task_node.cpp

@@ -44,7 +46,7 @@ void AccCompTaskNode::BuildExecGphAndRegst() {
  exec_node->BindBnWithRegst(op()->SoleIbn(), in_regst);
  out_regst->AddLbi(op()->BnInOp2Lbi(op()->SoleObn()));
  exec_node->BindBnWithRegst(op()->SoleObn(), out_regst);
-  exec_node->InferBlobDescs(parallel_ctx());
+  (exec_node->*InferBlobDescs())(parallel_ctx());


成诚：名字应该改成GetExecNodeInferBlobDescsMethod()

lixinqi · 2022-11-28T09:31:06Z

oneflow/core/job/rank_compiler.cpp

+        sole_regst_desc = regst_desc;
+      });
+      auto* predefined = &regst_desc2predefined_regst_desc_id[sole_regst_desc];
+      *predefined = std::max(*predefined, comm_task_node->candidate_in_regst_desc_id());


此处解决跨rank编译regst_desc_id同步问题。
首先，只有CopyCommNetTaskNode的input regst_desc_id才涉及到跨rank。我们仅需要对这里做处理就行。我们不需要对跨rank的ctrl regst desc id做处理的原因是它们已经保证了同步，它们在构建boxing task graph的过程中就会被构建，也就会随着boxing task graph的分发而同步到各处。

本次修复中，CopyCommNetTaskNode会新持有一个int64_t candidate_in_regst_desc_id字段，该字段会用于初始化CopyCommNetTaskNode上游TaskNode的produced in regst_desc_id。candidate_in_regst_desc_id字段名中带一个candidate字眼是因为上游TaskNode可能有多个下游的CopyCommNetTaskNode节点，所以上游TaskNode的produced regst_desc最后会选一个最大的candidate_in_regst_desc_id作为最终的regst_desc_id。

CopyCommNetTaskNode::candidate_in_regst_desc_id字段会随着boxing task graph的分发而全局同步，自然而然，同一个上游TaskNode的produced regst_desc_id也会由于max(candidate_in_regst_desc_id)的同步而得到同步。

lixinqi · 2022-11-28T09:37:55Z

oneflow/core/register/regst_desc_id_provider.cpp

+
+}  // namespace
+
+std::unique_ptr<RegstDescIdProvider> NewRegstDescIdProvider() {


RegstDesc类上不再直接持有int64_t regst_desc_id_，而是持有一个多态的RegstDescIdProvider regst_desc_id_provider_字段。它可以是ConstRegstDescIdProvider，用于对齐Naive情况，也可以是LazyInitRegstDescIdProvider，用于rank_per_iter/rank_per_thread/...等情况，其中regst_desc_id的设置会考虑producer task_node下游节点的情况。

To be fixed distributed test. --------- Co-authored-by: lixinqi <lixinqi0703106@163.com> Co-authored-by: cheng cheng <472491134@qq.com>

implement RankTaskGraph

560a511

lixinqi requested review from jackalcooper and chengtbf as code owners September 19, 2022 05:20

RankCompiler

51034f0

lixinqi requested a review from strint as a code owner September 19, 2022 10:25

lixinqi added 8 commits September 20, 2022 09:21

fix compiler complaints

3736494

CompTaskNode::ConsumeFakeRegsts

5807882

TransportTaskProto::lbi

c07ea5e

makes sure all ranks know all var_op_names

1fd10ef

RankTaskGraph::ForEachDutyRank

74c96df

PortableCtrlEdge

e89143b

compile in MultiThreadLoop

1b10509

CompileMode

44bf12b

lixinqi added 9 commits September 26, 2022 10:14

rebuild new_task_id_ before ProduceRegst

bd50bc7

RankTaskGraph::InitRegstDescsConsumers()

7853956

PlanUtil::GenReachableTaskPairs

b725318

disable checking consumer_task_regst_desc_id_size

45bc629

TaskNode::InitConsumedRegstsFromProto

3c4ea9d

remove RegstDesc::InitConsumersFromProto

9880ba4

refactor CompTaskNode::ConsumeFakeRegstsIf

20175fc

refactor CompTaskNode::ConsumeFakeRegsts

fbff274

remove Plan::fake_consumed_regst_desc_id

ede3cd2

lixinqi commented Sep 28, 2022

View reviewed changes

lixinqi and others added 6 commits September 28, 2022 12:41

revert part of code in job/plan_util.cpp

3ba45e5

refacotr ParallelDesc::TryGetParallelId

2e9ab1a

cut boxing_task_graph by rank

93a7947

make sure TaskIdGenerator::Generator is thread safe

818d14d

atomic<int64_t> mem_block_id

8ca22bf

chunk id add lock

2adbb13

strint requested review from hjchen2 and liujuncheng as code owners November 22, 2022 07:36

fix conflict

7d69c25

strint added graph graph mode enhancement labels Nov 22, 2022

strint requested a review from oneflow-ci-bot November 22, 2022 08:17

auto format by CI

d4782a7

strint and others added 4 commits November 22, 2022 16:24

fix conflict

eb76987

Merge branch 'rank_task_graph' of https://github.com/Oneflow-Inc/oneflow

bb9e65e

into rank_task_graph

fix conflict

6b575fc

auto format by CI

13ba2ac

strint added 2 commits November 22, 2022 16:38

fix conflict

92face0

fix

1b2edca

lixinqi commented Nov 23, 2022

View reviewed changes

address pr comments

3910af6

lixinqi commented Nov 28, 2022

View reviewed changes

strint changed the base branch from master to rank_task_graph_test_passed December 13, 2022 10:12

strint changed the base branch from rank_task_graph_test_passed to master December 13, 2022 10:59

strint force-pushed the rank_task_graph branch from 18e3bac to 3910af6 Compare December 13, 2022 11:03

mergify bot mentioned this pull request Dec 13, 2022

Rank task graph fix reg id #9602

Closed

strint and others added 3 commits December 13, 2022 19:17

fix bug

a37f9f8

Rank task graph fix (#9749)

132a8a7

To be fixed distributed test. --------- Co-authored-by: lixinqi <lixinqi0703106@163.com> Co-authored-by: cheng cheng <472491134@qq.com>

rm useless

f1352e6

This was referenced Feb 28, 2023

Plan seperation compile #9913

Closed

Plan separation compile #9920

Merged

strint closed this Apr 13, 2023

strint deleted the rank_task_graph branch April 13, 2023 08:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RankTaskGraph #9108

RankTaskGraph #9108

lixinqi commented Sep 19, 2022 •

edited

Loading

lixinqi commented Sep 23, 2022

lixinqi Sep 28, 2022

lixinqi Sep 28, 2022

lixinqi Sep 28, 2022

github-actions bot commented Nov 22, 2022

github-actions bot commented Nov 22, 2022

lixinqi Nov 23, 2022

lixinqi Nov 23, 2022

lixinqi Nov 23, 2022

lixinqi Nov 23, 2022

lixinqi Nov 23, 2022

lixinqi Nov 23, 2022

lixinqi Nov 23, 2022

lixinqi Nov 28, 2022 •

edited

Loading

lixinqi Nov 28, 2022


		} // namespace

		std::unique_ptr<RegstDescIdProvider> NewRegstDescIdProvider() {

RankTaskGraph #9108

RankTaskGraph #9108

Conversation

lixinqi commented Sep 19, 2022 • edited Loading

lixinqi commented Sep 23, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Nov 22, 2022

github-actions bot commented Nov 22, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lixinqi Nov 28, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lixinqi commented Sep 19, 2022 •

edited

Loading

lixinqi Nov 28, 2022 •

edited

Loading