-
Notifications
You must be signed in to change notification settings - Fork 825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fea/nn graph/graph build ctx #5420
Conversation
…ter MultiClientSession created in multi-client
@@ -160,18 +160,18 @@ Maybe<AttrVal> MakeCppAttrValueFromProtoOrCfgAttrValue(const ProtoT& cfg_attr_va | |||
} | |||
} | |||
|
|||
/*static*/ Maybe<AttrVal> AttrValueUtil::ToCppAttrValue(const AttrValue& proto_attr_value) { | |||
/* static */ Maybe<AttrVal> AttrValueUtil::ToCppAttrValue(const AttrValue& proto_attr_value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里为啥有这么多的格式调整?之前的 pr 没有 format 这里吗?现在 format 不是自动的吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这种注释format没有检查,代码里面有两种风格,统一到了和tf一样的风格
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这种注释不适合放在这个位置吧,只见过放在函数参数边上的,用来提示这个参数是哪个,比如:
ToCppAttrValue(/* proto_attr_value */ attr)
函数开头写个 /* static */ 看不明白是要表达什么
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个是google的代码风格,静态成员函数的实现在函数开头加注释/* static */,表明这个函数是类的静态成员函数
@@ -49,6 +54,10 @@ def name(self): | |||
def training(self): | |||
return self.config.training | |||
|
|||
@property | |||
def _graph_proto(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个 property 不对外暴露吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
暂时只是为了我们自己开发调试,没想好是否要开放,所以暂时写成私有的
session.TryInit() | ||
|
||
with graph_build_util.graph_build_context(self.config.proto, session): | ||
outputs = self.build(*args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里掉用 build 前后是不是有一些 todo
- parameter block 转换成 variable op
- input eager tensor 转换成 input op 并返回 lazy tensor
还有这里返回的 outputs 后面用的着吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里掉用 build 前后是不是有一些 todo
- parameter block 转换成 variable op
- input eager tensor 转换成 input op 并返回 lazy tensor
没错,要依赖成诚的pr补全这块,所以这里都还没处理,只是为其准备好执行的graph_build_context
还有这里返回的 outputs 后面用的着吗?
按现在的设计,用得着,返回的lazy tensor,会再走lazy interpreter运行一遍,添加output op到 job
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
outputs 会被转换成 eager tensor,这里仍然需要构造 FetchOutputOpExpr 来实现。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
outputs 转换成 eager tensor 是 launch 后做的,不是 compile 这里做的?
oneflow/init.py
Outdated
@@ -75,6 +76,7 @@ | |||
session_ctx.OpenDefaultSession( | |||
MultiClientSession(oneflow._oneflow_internal.NewSessionId()) | |||
) | |||
scope_util.InitScopeStack() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个init.py文件调试时,改动始终没有生效。。。
解决方式是要make 一下,把它copy到build目录,以同步改动,其它python文件不用。原因参见oneflow.cmake里面的python文件的copy和软链接这两种处理方式。 from @caishenghang
这里的改动不是本次pr的,为了解决冲突,先合并了 #5469
内部的 Close + Open 未来也可以提供,前提是 Multi-ClientSession 还需要记录当前 Session 中仍然存在的 NNGraph 对象,只有NNGraph 对象都下线了,才能 Close。 目前我自己觉得即使不隔离,也不会影响单测。 |
Context for graph build in nn.Graph._compile()