Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-Client LogicalRun degenerate to PhysicalRun #5479

Merged
merged 5 commits into from
Jul 14, 2021

Conversation

chengtbf
Copy link
Contributor

@chengtbf chengtbf commented Jul 13, 2021

在 Multi-Client 下,虚拟机 VM 的 指令: LogicalRun 会直接退化成 PhysicalRun,因为每个 rank 都 只会收到和处理本 rank 的指令,不会再有 rank 之间的 指令级别的同步(对应之前 Single-Client 下的 ClusterInstruction)

补充:

  • VM 的 IdGenerator 的 Logical 版本也会退化为 Physical 版本

@chengtbf chengtbf requested a review from oneflow-ci-bot July 14, 2021 02:54
@strint
Copy link
Contributor

strint commented Jul 14, 2021

看起来没问题,如果测试可以过应该就是ok的,test_graph.py里面会触发这里

@chengtbf
Copy link
Contributor Author

看起来没问题,如果测试可以过应该就是ok的,test_graph.py里面会触发这里

test_graph.py 目前单测会测到吗? @strint @jackalcooper 我在 CI 的 workflow 里没找到对应的,只有 test/modules 和 test/tensor 的目录下才会触发吧?

@strint
Copy link
Contributor

strint commented Jul 14, 2021

看起来没问题,如果测试可以过应该就是ok的,test_graph.py里面会触发这里

test_graph.py 目前单测会测到吗? @strint @jackalcooper 我在 CI 的 workflow 里没找到对应的,只有 test/modules 和 test/tensor 的目录下才会触发吧?

test_graph.py的make_scope测试会触发MultiClient + LogicalRun的执行。不过还是单进程的。
多进程+MultiClient + LogicalRun之前应该是没有case,jianhao会发现应该是只有ddp的开发会出现这个场景。

@oneflow-ci-bot oneflow-ci-bot removed their request for review July 14, 2021 04:12
@oneflow-ci-bot oneflow-ci-bot self-requested a review July 14, 2021 04:12
@chengtbf
Copy link
Contributor Author

test_graph.py的make_scope测试会触发MultiClient + LogicalRun的执行。不过还是单进程的。
多进程+MultiClient + LogicalRun之前应该是没有case,jianhao会发现应该是只有ddp的开发会出现这个场景。

是的。目前还没有 Multi-Client 下的多进程单测。 我好奇的是,我们在 oneflow/python/test/graph/ 路径下面写的单测,会被 CI 测试到吗? @strint @jackalcooper

@strint
Copy link
Contributor

strint commented Jul 14, 2021

test_graph.py的make_scope测试会触发MultiClient + LogicalRun的执行。不过还是单进程的。
多进程+MultiClient + LogicalRun之前应该是没有case,jianhao会发现应该是只有ddp的开发会出现这个场景。

是的。目前还没有 Multi-Client 下的多进程单测。 我好奇的是,我们在 oneflow/python/test/graph/ 路径下面写的单测,会被 CI 测试到吗? @strint @jackalcooper

#5482

@chengtbf chengtbf removed the request for review from oneflow-ci-bot July 14, 2021 06:56
@chengtbf chengtbf requested a review from oneflow-ci-bot July 14, 2021 06:56
@oneflow-ci-bot oneflow-ci-bot removed their request for review July 14, 2021 08:21
@oneflow-ci-bot oneflow-ci-bot merged commit 809944f into master Jul 14, 2021
@oneflow-ci-bot oneflow-ci-bot deleted the dev_cc_multi_client_logical_run branch July 14, 2021 08:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants