Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat graph compile progress bar #9537

Merged
merged 11 commits into from
Dec 6, 2022
Merged

Feat graph compile progress bar #9537

merged 11 commits into from
Dec 6, 2022

Conversation

strint
Copy link
Contributor

@strint strint commented Dec 5, 2022

默认 graph 不输出信息,打开 debug(0) 或者 ONEFLOW_NNGRAPH_ENABLE_PROGRESS_BAR 环境变量,会在 rank 0 显示编译进度条。

# enable progress bar
# graph is a nn.Graph instance
graph.debug(0)

或者

ONEFLOW_NNGRAPH_ENABLE_PROGRESS_BAR=1

graph_progress_bar

Feature request: #9217

@github-actions
Copy link
Contributor

github-actions bot commented Dec 5, 2022

Speed stats:
GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 139.8ms (= 13984.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 162.8ms (= 16275.8ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.16 (= 162.8ms / 139.8ms)

OneFlow resnet50 time: 85.0ms (= 8503.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 101.4ms (= 10139.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.19 (= 101.4ms / 85.0ms)

OneFlow resnet50 time: 57.8ms (= 11551.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 87.9ms (= 17586.4ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.52 (= 87.9ms / 57.8ms)

OneFlow resnet50 time: 44.7ms (= 8935.6ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 70.9ms (= 14176.9ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.59 (= 70.9ms / 44.7ms)

OneFlow resnet50 time: 39.9ms (= 7985.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 69.9ms (= 13974.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.75 (= 69.9ms / 39.9ms)

@github-actions
Copy link
Contributor

github-actions bot commented Dec 5, 2022

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9537/

@jackalcooper
Copy link
Collaborator

打开 debug(0) 会在 rank 0 显示编译进度条。
用一个环境变量吧,这个debug(0)要单独graph设置,如果有多个graph就挺麻烦?

@strint
Copy link
Contributor Author

strint commented Dec 6, 2022

打开 debug(0) 会在 rank 0 显示编译进度条。

用一个环境变量吧,这个debug(0)要单独graph设置,如果有多个graph就挺麻烦?

有道理

@github-actions
Copy link
Contributor

github-actions bot commented Dec 6, 2022

Speed stats:

return Maybe<void>::Ok();
}

const static thread_local uint64_t progress_total_num = 60;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的 60 是人工数出来的吗?如果后续增加、删除某些阶段,这里是不是还得再改

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的 60 是人工数出来的吗?如果后续增加、删除某些阶段,这里是不是还得再改

CostCounter 统计出来的。

是需要改,因为这里面 pass、plan、init, for 循环等,是不规则的逻辑,运行时才知道,编译时确定不了。

JUST(JobPass4Name("DumpBlobParallelConfPass")(job, &job_pass_ctx));
compile_tc->Count("[GraphCompile]" + job_name + " DumpBlobParallelConfPass", 1);
compile_tc->Count("[GraphCompile]" + job_name + " DumpBlobParallelConfPass", 1, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

有没有可能不需要展示所有的 pass,因为有些 pass 对于用户而言,即使看到了也不知道是在做什么。 对于用户关心的问题,应该是,当前在 Graph pass 图优化阶段(有几个标志性事件,比如 构图结束、 autograd 后向图展开、 amp、zero、mlir、Checkpointing、pipeline 等),当前在 物理图编译阶段(Compiler task node build、memory reuse),当前在 runtime init 阶段 等。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

当然这个可以作为 TODO 项

Copy link
Contributor Author

@strint strint Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

进度条是在一行去做更新,所以无论多少个阶段,都是一行。

现在看似很多,有60个,但是多只是代表它被监控到了,如果那个阶段不卡住,用户也无感。

所以多了也没什么影响。

graph_progress_bar

这里是做了 sleep 才看得清楚,小的图零点几秒就完成了。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不需要sleep。这里的目的不是看清楚,而是卡住的时候知道在运行哪一个

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

展示所有的 pass 是好事啊,一闪而过的很多显得编译很快啊,这是好事啊

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不需要sleep。这里的目的不是看清楚,而是卡住的时候知道在运行哪一个

截图展示效果时用了 sleep,要合并的代码里面是没有的。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

展示所有的 pass 是好事啊,一闪而过的很多显得编译很快啊,这是好事啊

好的

@strint strint requested review from oneflow-ci-bot and removed request for oneflow-ci-bot December 6, 2022 06:37
@github-actions
Copy link
Contributor

github-actions bot commented Dec 6, 2022

Speed stats:
GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 140.3ms (= 14030.7ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 162.1ms (= 16208.0ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.16 (= 162.1ms / 140.3ms)

OneFlow resnet50 time: 84.9ms (= 8491.6ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 102.3ms (= 10232.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.21 (= 102.3ms / 84.9ms)

OneFlow resnet50 time: 57.2ms (= 11449.6ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.1ms (= 15423.3ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.35 (= 77.1ms / 57.2ms)

OneFlow resnet50 time: 44.4ms (= 8878.3ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 71.4ms (= 14278.9ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.61 (= 71.4ms / 44.4ms)

OneFlow resnet50 time: 39.5ms (= 7901.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.1ms (= 13627.5ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.72 (= 68.1ms / 39.5ms)

@github-actions
Copy link
Contributor

github-actions bot commented Dec 6, 2022

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9537/

@mergify mergify bot merged commit 116ec78 into master Dec 6, 2022
@mergify mergify bot deleted the feat_graph_prog_bar branch December 6, 2022 11:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants