Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plan seperation compile #9913

Closed
wants to merge 43 commits into from
Closed
Changes from 1 commit
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
560a511
implement RankTaskGraph
lixinqi Sep 19, 2022
51034f0
RankCompiler
lixinqi Sep 19, 2022
3736494
fix compiler complaints
lixinqi Sep 20, 2022
5807882
CompTaskNode::ConsumeFakeRegsts
lixinqi Sep 20, 2022
c07ea5e
TransportTaskProto::lbi
lixinqi Sep 20, 2022
1fd10ef
makes sure all ranks know all var_op_names
lixinqi Sep 22, 2022
74c96df
RankTaskGraph::ForEachDutyRank
lixinqi Sep 22, 2022
e89143b
PortableCtrlEdge
lixinqi Sep 23, 2022
1b10509
compile in MultiThreadLoop
lixinqi Sep 23, 2022
44bf12b
CompileMode
lixinqi Sep 23, 2022
bd50bc7
rebuild new_task_id_ before ProduceRegst
lixinqi Sep 26, 2022
7853956
RankTaskGraph::InitRegstDescsConsumers()
lixinqi Sep 26, 2022
b725318
PlanUtil::GenReachableTaskPairs
lixinqi Sep 27, 2022
45bc629
disable checking consumer_task_regst_desc_id_size
lixinqi Sep 27, 2022
3c4ea9d
TaskNode::InitConsumedRegstsFromProto
lixinqi Sep 27, 2022
9880ba4
remove RegstDesc::InitConsumersFromProto
lixinqi Sep 27, 2022
20175fc
refactor CompTaskNode::ConsumeFakeRegstsIf
lixinqi Sep 27, 2022
fbff274
refactor CompTaskNode::ConsumeFakeRegsts
lixinqi Sep 27, 2022
ede3cd2
remove Plan::fake_consumed_regst_desc_id
lixinqi Sep 27, 2022
3ba45e5
revert part of code in job/plan_util.cpp
lixinqi Sep 28, 2022
2e9ab1a
refacotr ParallelDesc::TryGetParallelId
lixinqi Sep 28, 2022
93a7947
cut boxing_task_graph by rank
lixinqi Sep 29, 2022
818d14d
make sure TaskIdGenerator::Generator is thread safe
lixinqi Oct 8, 2022
8ca22bf
atomic<int64_t> mem_block_id
lixinqi Oct 8, 2022
2adbb13
chunk id add lock
strint Oct 8, 2022
ccf9bea
get chunk proto with lock
strint Oct 9, 2022
2c577df
create chunk with lock
strint Oct 9, 2022
fa49459
mutable std::mutex
lixinqi Oct 9, 2022
a4e67b0
Rank task graph merge master (#9440)
strint Nov 22, 2022
7d69c25
fix conflict
strint Nov 22, 2022
d4782a7
auto format by CI
oneflow-ci-bot Nov 22, 2022
eb76987
fix conflict
strint Nov 22, 2022
bb9e65e
Merge branch 'rank_task_graph' of https://github.com/Oneflow-Inc/onef…
strint Nov 22, 2022
6b575fc
fix conflict
strint Nov 22, 2022
13ba2ac
auto format by CI
oneflow-ci-bot Nov 22, 2022
92face0
fix conflict
strint Nov 22, 2022
1b2edca
fix
strint Nov 22, 2022
3910af6
address pr comments
lixinqi Nov 24, 2022
a37f9f8
fix bug
strint Dec 13, 2022
132a8a7
Rank task graph fix (#9749)
strint Feb 28, 2023
f1352e6
rm useless
strint Feb 28, 2023
6b13581
fix muti thread merge bug
strint Feb 28, 2023
035e83c
Plan sep compile merge master (#9915)
strint Mar 1, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions oneflow/core/framework/nn_graph.cpp
Original file line number Diff line number Diff line change
@@ -278,13 +278,13 @@ Maybe<void> NNGraph::RankPerThreadCompile() {
std::vector<HashSet<PortableCtrlEdge>> portable_ctrl_edges{GlobalProcessCtx::WorldSize()};
// there are no op_names in plan, we must hold them by a container.
HashMap<int64_t, std::string> comp_task_id2op_name;
for (int i = 0; i < GlobalProcessCtx::WorldSize(); ++i) {
MultiThreadLoop(GlobalProcessCtx::WorldSize(), [&](size_t i) {
Plan rank_plan;
auto* plan = (i > 0) ? &rank_plan : &plan_;
double start = GetCurTime();
// TODO(chengcheng): new memory reused by chunk
JUST(RankCompiler(boxing_task_graph_proto, i)
.Compile(variable_op_names_, &job_, plan, &comp_task_id2op_name));
CHECK_JUST(RankCompiler(boxing_task_graph_proto, i)
.Compile(variable_op_names_, &job_, plan, &comp_task_id2op_name));
PlanUtil::GenMemBlockAndChunkWithVariableOpNames4Plan(plan, variable_op_names_);

VLOG(1) << "Graph name: " << name_
@@ -304,7 +304,7 @@ Maybe<void> NNGraph::RankPerThreadCompile() {
std::string plan_name = "plan:" + job_name() + ":" + std::to_string(i);
Singleton<CtrlClient>::Get()->PushKV(plan_name, *plan);
}
}
});
{
// use multi-thread to merge all ctrl edges into portable_ctrl_edges[0], which is belong to
// master .