Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

manipulate lazy interface blobs in eager #3226

Merged
merged 47 commits into from
Aug 13, 2020
Merged

Conversation

daquexian
Copy link
Contributor

@daquexian daquexian commented Jul 17, 2020

添加了一条虚拟机指令,为 lazy job 的 interface op 产生的 blob 创建 eager 的 BlobObject 对象,这样就可以借用已有的 eager 相关功能来读取和修改这些 blob 的值。

一个使用场景是给 variable blob 赋值和读取 variable blob 的值,代替 checkpoint 的部分功能,例子:https://github.com/Oneflow-Inc/oneflow/pull/3226/files#diff-7c7472d3085b6b5f7fbf35bd736b02ff

Copy link
Contributor

@lixinqi lixinqi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

没有找到eager和lazy互斥执行的逻辑。

@daquexian daquexian changed the base branch from develop to master August 3, 2020 03:29
@daquexian daquexian force-pushed the access_lazy_blob_in_eager branch from 6d0e9bb to 144a758 Compare August 3, 2020 03:33
@daquexian daquexian force-pushed the access_lazy_blob_in_eager branch from 144a758 to 8d319f7 Compare August 3, 2020 04:52

@oneflow_export("experimental.get_interface_blob_value")
def GetInterfaceBlobValue(op_name):
flow.sync_default_session()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这两个暴露给用户的 api 在最开始会 sync default session,等待 lazy job 结束,并且这两个 api 是同步的,这样是不是不会出现 eager 和 lazy 重叠的问题了

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好像没有看到lazy等待eager执行完成的代码。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这两个暴露给用户的 api 是同步的,这样是不是相当于有了 lazy 等待 eager 执行完成的效果(因为后续的代码要等这两个 api 执行完成才能执行)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

其实你不能假定eager的api是同步的:-)
一般的eager api都是异步的,只有numpy一族的接口才是同步的。

def AsyncFeedValueToInterfaceBlob(Yield):
def build(builder):
blob_object = builder.MakeLazyRefBlobObject(op_name)
push_util.FeedValueToEagerBlob(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里面用到了 FeedContext,似乎还不支持多机?

@daquexian daquexian changed the title [draft] manipulate lazy blobs in eager manipulate lazy blobs in eager Aug 7, 2020
field_number = op_conf_util.OperatorConf.DESCRIPTOR.fields_by_name[
op_type_field
].number
return oneflow_internal.IsInterfaceOpTypeCase(field_number)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

一定要有出错处理

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是的

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

一定要有出错处理

这里调用的 c++ 函数 IsInterfaceOpTypeCase 是不会出错的,所以是不是不需要错误处理呢

Comment on lines +151 to +153
for k, v in _ONEFLOW_DTYPE_TO_NUMPY_DTYPE.items():
if v == numpy_dtype:
return k
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

过于低效。事先生成好dict

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里有一个问题是 因为 numpy 的 bug https://stackoverflow.com/questions/35293672/why-do-these-dtypes-compare-equal-but-hash-different ,numpy 的 dtype 作为字典的 key 会有意料之外的现象:

Python 3.6.9 (default, Apr 18 2020, 01:56:04)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> d = {np.float32: 'float32'}
>>> a = np.array((), dtype=np.float32)
>>> a.dtype
dtype('float32')
>>> a.dtype == np.float32
True
>>> d[np.float32]
'float32'
>>> d[a.dtype]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: dtype('float32')

所以这里只能这样迂回的实现

@jackalcooper jackalcooper added this to the 0.1.9 milestone Aug 13, 2020
@leaves-zwx
Copy link
Contributor

这个 pr 的原理和起到的作用能大致在 description 里描述一下吗?

@daquexian daquexian changed the title manipulate lazy blobs in eager manipulate lazy interface blobs in eager Aug 13, 2020
@daquexian
Copy link
Contributor Author

这个 pr 的原理和起到的作用能大致在 description 里描述一下吗?

好的,已添加

@lixinqi lixinqi merged commit b5b4855 into master Aug 13, 2020
@lixinqi lixinqi deleted the access_lazy_blob_in_eager branch August 13, 2020 11:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants