Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stateful local kernel supports consistent #5789

Merged
merged 14 commits into from
Aug 10, 2021

Conversation

daquexian
Copy link
Contributor

eager consistent op interpreter 调用 stateful local opkernel 时传 consistent tensor meta,作为 sbp, logical shape 等的提供者。修复 logical slice 等 op 在 consistent tensor 上挂掉的问题

Signed-off-by: daquexian <daquexian566@gmail.com>
Signed-off-by: daquexian <daquexian566@gmail.com>
Signed-off-by: daquexian <daquexian566@gmail.com>
Signed-off-by: daquexian <daquexian566@gmail.com>
@daquexian daquexian force-pushed the local_kernel_support_consistent branch 2 times, most recently from 0961cde to 8e1cd8a Compare August 10, 2021 07:50
@@ -17,16 +17,15 @@
import os
import unittest

import oneflow.compatible.single_client.unittest
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

把 python/oneflow/compatible/single_client/test/ops/test_stateful_local_kernel.py 挪了过来,它应该是 multi client 的测试。并且增加了可以覆盖新实现的接口的测试


namespace oneflow {
namespace one {

template<class T>
class InputAndOutputListScope {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个类没什么通用性,直接删掉了

Signed-off-by: daquexian <daquexian566@gmail.com>
@daquexian daquexian force-pushed the local_kernel_support_consistent branch from 8e1cd8a to eaad668 Compare August 10, 2021 07:53
@@ -183,6 +198,11 @@ Maybe<Symbol<Device>> GetDevice4CurrentProcessCtx(Symbol<ParallelDesc> parallel_
return device_iter->second;
}

std::shared_ptr<ParallelContext> GetParallelContext4CurrentProcessCtx(
Symbol<ParallelDesc> parallel_desc) {
return DECORATE(&RawGetParallelContext4CurrentProcessCtx, ThreadLocalCopiable)(parallel_desc);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ThreadLocal。
下意识地应该选择ThreadLocal,它会检查每个参数都是scalar,杜绝潜在的问题。

Signed-off-by: daquexian <daquexian566@gmail.com>
@oneflow-ci-bot oneflow-ci-bot self-requested a review August 10, 2021 12:00
@oneflow-ci-bot oneflow-ci-bot removed their request for review August 10, 2021 13:19
@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 10, 2021 13:19
@oneflow-ci-bot oneflow-ci-bot merged commit 05e40d7 into master Aug 10, 2021
@oneflow-ci-bot oneflow-ci-bot deleted the local_kernel_support_consistent branch August 10, 2021 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants