Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refine tensor doc and add module to_global doc #7823

Merged
merged 16 commits into from
Mar 19, 2022
Merged

Conversation

hjchen2
Copy link
Contributor

@hjchen2 hjchen2 commented Mar 17, 2022

  • 完善Tensor.to_global接口文档,增加对grad_sbp的描述

截屏2022-03-18 下午5 04 56

截屏2022-03-18 下午5 05 14

截屏2022-03-18 下午5 05 24

  • 完善Tensor.to_local接口文档

截屏2022-03-18 下午5 05 35

  • 增加Tensor Attributes文档

截屏2022-03-18 下午5 05 56

截屏2022-03-18 下午5 06 07

截屏2022-03-18 下午5 06 18

  • 增加Module.to_consistent和Module.to_global接口的文档

截屏2022-03-18 下午5 06 32

Cast a local tensor to global tensor or cast a
global tensor to another global tensor with
different sbp or placement
Convert a local tensor to global tensor or convert a global tensor to another global tensor with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

拆成两段

sbp (flow.sbp.sbp or tuple of flow.sbp.sbp, optional): the desired sbp descriptor of returned global tensor. Default: if None, the input tensor must be consistent one and use its own sbp.
placement (flow.placement, optional): the desired placement of returned global tensor. Default: if None, the input tensor must be global and use its own placement.
sbp (flow.sbp.sbp or tuple of flow.sbp.sbp, optional): the desired sbp descriptor of returned global tensor. Default: if None, the input tensor must be global and use its own sbp.
grad_sbp (flow.sbp.sbp or tuple of flow.sbp.sbp, optional): manually specify the gradient sbp of the operation in the backward pass. Default: if None, the gradient sbp will be infered automatically.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example 需要至少增加一个例子。

local_tensor -> global tensor S0 tensor shape

从 release note 借鉴过来

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S0 的约束;
B 的约束; Note: B 会发生 rank 0 的数据覆盖

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

使用 api doc 的 NOTE 功能描述上述的重点约束

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

更新了,再review一下

hjchen2 and others added 4 commits March 17, 2022 15:15
Co-authored-by: Yao Chi <later@usopp.net>
Co-authored-by: Yao Chi <later@usopp.net>
Co-authored-by: Yao Chi <later@usopp.net>
@@ -156,7 +156,8 @@ OneFlow Tensor Class
tan,
tanh,
tile,
to,
to,
to_consistent,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是按照字典序排的吗,还是随机排的。 to_consistent 是过时接口,是不是放在 to_global 下面比较好。 字典序排的就不用改了

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

挪到to_local下面去了



Note:
This method modifies the module in-place.
Copy link
Contributor

@chengtbf chengtbf Mar 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里要分情况?如果原有的 module 都是 local tensor,这里就不是 inplace 的吧。local tensor -> global tensor

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是的,local to global无法Inplace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是说原地修改module,不是改tensor

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个inplace针对的对象是 module,针对于module是改变自身的操作

>>> m = flow.nn.Conv2d(in_channels=3, out_channels=4, kernel_size=3)
>>> m.to_global(placement=flow.placement("cpu", ranks=[0]), sbp=[flow.sbp.split(0)])
>>> m.weight.is_global
True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

加一个:

>>> m.bias.is_global
True

@chengtbf
Copy link
Contributor

总体上已经觉得写的很好了~ 值得推广

add_docstr(
oneflow.nn.Module.to_consistent,
"""
This interface is no longer available, please use :func:`oneflow.nn.Module.to_global` instead
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead.

少个句号

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

加上了

Args:
placement (flow.placement, optional): the desired placement of returned global tensor. Default: None
sbp (flow.sbp.sbp or tuple of flow.sbp.sbp, optional): the desired sbp of returned global tensor. Default: None
grad_sbp (flow.sbp.sbp or tuple of flow.sbp.sbp, optional): manually specify the gradient sbp of this operation in the backward pass. If None, the gradient sbp will be infered automatically. It is only used if this tensor is a global tensor. Default: None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specify the sbp of this tensor's grad tensor in the backward pass ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


>>> # results on rank 0
oneflow.Size([2])
tensor([0., 1.], placement=oneflow.placement(type="cpu", ranks=[0, 1]), sbp=(oneflow.sbp.split(axis=0),), dtype=oneflow.float32)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里b to s,除了sbp,看起来什么都没变化。

感觉需要体现一下里面的local tensor变化了,也就是自动做了split?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个在to_local 接口文档中有所体现,这里提及to_local是不是有点越界?


``oneflow.sbp`` includes three types:

- oneflow.sbp.split(axis)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

发现split接口这里一个问题,torch下,基本都是叫dim,tf下才叫axis

https://pytorch.org/docs/stable/search.html?q=dim&check_keywords=yes&area=default#

https://stackoverflow.com/questions/62333053/whats-the-difference-between-dim-in-pytorch-and-axis-in-tensorflow

如果在考虑torch的语境的话,这里我们最好改为dim

Comment on lines +40 to +42
A ``oneflow.sbp`` is an object representing that how the data of the global tensor is distributed across the ranks of the ``Tensor`` placement.

``oneflow.sbp`` includes three types:
Copy link
Contributor

@lixinqi lixinqi Mar 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的表述有点奇怪,oneflow.sbp是一个python module,而oneflow.sbp.sbp是一个类。

A oneflow.sbp.sbp is a distribution descriptor object representing how the data of the global tensor is distributed across the ranks of the Tensor placement.

There are three types of distribution descriptor instances in module oneflow.sbp:

@clackhan
Copy link
Contributor

  • 增加Tensor Attributes文档(device)

image

image

  • 完善Tensor.device接口文档

image

  • 完善Tensor.placement接口文档

image

  • 完善Tensor.sbp接口文档

image

  • 完善Tensor.is_global接口文档

image

  • oneflow.to()接口

@hjchen2 hjchen2 requested a review from oneflow-ci-bot March 18, 2022 12:23
@hjchen2 hjchen2 enabled auto-merge (squash) March 18, 2022 15:38
@github-actions
Copy link
Contributor

CI failed when running job: cuda-module-distributed-rank-1. PR label automerge has been removed

@github-actions
Copy link
Contributor

CI failed when running job: cuda-module-distributed-rank-0. PR label automerge has been removed

@hjchen2 hjchen2 requested review from oneflow-ci-bot and removed request for oneflow-ci-bot March 18, 2022 16:33
@github-actions
Copy link
Contributor

Speed stats:

@clackhan clackhan requested review from oneflow-ci-bot and removed request for oneflow-ci-bot March 19, 2022 00:43
@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot March 19, 2022 00:46
@hjchen2 hjchen2 merged commit 42fe221 into master Mar 19, 2022
@hjchen2 hjchen2 deleted the dev_refine_to_global_doc branch March 19, 2022 03:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants