-
Notifications
You must be signed in to change notification settings - Fork 825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
manipulate lazy interface blobs in eager #3226
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
没有找到eager和lazy互斥执行的逻辑。
6d0e9bb
to
144a758
Compare
144a758
to
8d319f7
Compare
|
||
@oneflow_export("experimental.get_interface_blob_value") | ||
def GetInterfaceBlobValue(op_name): | ||
flow.sync_default_session() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这两个暴露给用户的 api 在最开始会 sync default session,等待 lazy job 结束,并且这两个 api 是同步的,这样是不是不会出现 eager 和 lazy 重叠的问题了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好像没有看到lazy等待eager执行完成的代码。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这两个暴露给用户的 api 是同步的,这样是不是相当于有了 lazy 等待 eager 执行完成的效果(因为后续的代码要等这两个 api 执行完成才能执行)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
其实你不能假定eager的api是同步的:-)
一般的eager api都是异步的,只有numpy一族的接口才是同步的。
def AsyncFeedValueToInterfaceBlob(Yield): | ||
def build(builder): | ||
blob_object = builder.MakeLazyRefBlobObject(op_name) | ||
push_util.FeedValueToEagerBlob( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里面用到了 FeedContext,似乎还不支持多机?
field_number = op_conf_util.OperatorConf.DESCRIPTOR.fields_by_name[ | ||
op_type_field | ||
].number | ||
return oneflow_internal.IsInterfaceOpTypeCase(field_number) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
一定要有出错处理
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是的
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
一定要有出错处理
这里调用的 c++ 函数 IsInterfaceOpTypeCase 是不会出错的,所以是不是不需要错误处理呢
for k, v in _ONEFLOW_DTYPE_TO_NUMPY_DTYPE.items(): | ||
if v == numpy_dtype: | ||
return k |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
过于低效。事先生成好dict
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里有一个问题是 因为 numpy 的 bug https://stackoverflow.com/questions/35293672/why-do-these-dtypes-compare-equal-but-hash-different ,numpy 的 dtype 作为字典的 key 会有意料之外的现象:
Python 3.6.9 (default, Apr 18 2020, 01:56:04)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> d = {np.float32: 'float32'}
>>> a = np.array((), dtype=np.float32)
>>> a.dtype
dtype('float32')
>>> a.dtype == np.float32
True
>>> d[np.float32]
'float32'
>>> d[a.dtype]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: dtype('float32')
所以这里只能这样迂回的实现
…eflow into access_lazy_blob_in_eager
…est in cpu only build
这个 pr 的原理和起到的作用能大致在 description 里描述一下吗? |
好的,已添加 |
添加了一条虚拟机指令,为 lazy job 的 interface op 产生的 blob 创建 eager 的 BlobObject 对象,这样就可以借用已有的 eager 相关功能来读取和修改这些 blob 的值。
一个使用场景是给 variable blob 赋值和读取 variable blob 的值,代替 checkpoint 的部分功能,例子:https://github.com/Oneflow-Inc/oneflow/pull/3226/files#diff-7c7472d3085b6b5f7fbf35bd736b02ff