Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support combined_margin_loss op in flow.nn.modules #5830

Merged
merged 9 commits into from
Aug 16, 2021

Conversation

tingkuanpei
Copy link
Contributor

@tingkuanpei tingkuanpei commented Aug 11, 2021

截屏2021-08-13 23 25 00

@tingkuanpei tingkuanpei requested a review from doombeaker August 11, 2021 03:54
@CLAassistant
Copy link

CLAassistant commented Aug 11, 2021

CLA assistant check
All committers have signed the CLA.

@tingkuanpei tingkuanpei force-pushed the move_combined_margin_loss branch 3 times, most recently from b63597e to 31cac18 Compare August 11, 2021 12:12
@@ -64,7 +64,7 @@ def test_combined_margin_loss(
):
assert device_type in ["gpu", "cpu"]
flow.clear_default_session()
flow.config.gpu_device_num(4)
flow.config.gpu_device_num(1)
Copy link
Contributor

@doombeaker doombeaker Aug 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

single_client 目录是为了兼容0.4.0及其之前的代码不得不保留的。已经不再更新了。
也就是这个目录下不需要做更改。测试也不需要写以前的那种lazy模式的,基于 @global_function 的测试了。

完成 python/oneflow/test/modules/test_combined_margin_loss.py 下的测试文件就可以了

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

明白,代码还没有开发完,有一些调试的代码,会在开发完后删掉。

@tingkuanpei tingkuanpei force-pushed the move_combined_margin_loss branch 5 times, most recently from c65e116 to b2c8791 Compare August 13, 2021 06:35
@tingkuanpei tingkuanpei force-pushed the move_combined_margin_loss branch from b2c8791 to c24b428 Compare August 13, 2021 06:36
>>> np_label = np.random.randint(0, 6, size=(10)).astype(np.int32)
>>> x = flow.Tensor(x, dtype=flow.float32)
>>> label = flow.Tensor(label, dtype=flow.int32)
>>> out = flow.combined_margin_loss(x, label, 0.3, 0.5, 0.4)
Copy link
Contributor

@doombeaker doombeaker Aug 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个 docstring 需要切换到
cd ~/oneflow/docs
目录下,安装那个目录下的 requirements.txt 依赖。然后运行

make html SPHINXOPTS="-W --keep-going"

得到 docs/build/html 目录,在里面查看编译生成的文档。

如果编译docs过程有报错(包含警告),也需要检查修正下(CI也要求文档没格式错误才会允许合并)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

要检测 doctest,只需要运行这个文件即可:

python combined_margin_loss.py

>>> import numpy as np
>>> np_x = np.random.uniform(low=-1, high=1, size=(10, 6)).astype(np.float32)
>>> np_label = np.random.randint(0, 6, size=(10)).astype(np.int32)
>>> x = flow.Tensor(x, dtype=flow.float32)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
>>> x = flow.Tensor(x, dtype=flow.float32)
>>> x = flow.Tensor(np_x, dtype=flow.float32)

>>> np_x = np.random.uniform(low=-1, high=1, size=(10, 6)).astype(np.float32)
>>> np_label = np.random.randint(0, 6, size=(10)).astype(np.int32)
>>> x = flow.Tensor(x, dtype=flow.float32)
>>> label = flow.Tensor(label, dtype=flow.int32)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
>>> label = flow.Tensor(label, dtype=flow.int32)
>>> label = flow.Tensor(np_label, dtype=flow.int32)

>>> x = flow.Tensor(x, dtype=flow.float32)
>>> label = flow.Tensor(label, dtype=flow.int32)
>>> out = flow.combined_margin_loss(x, label, 0.3, 0.5, 0.4)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里还要加一个 >>> out 并且写上预期的输出,否则doctest没有预期输出,就失去了 test 的意义。

return y


def combined_margin_loss_op(x, label, m1: float = 1, m2: float = 0, m3: float = 0):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我感觉,可以按照 pytorch Module 的编程习惯,不用导出这种 functional 类型的 op 接口了。
直接导出 CombinedMarginLoss 类就可以了(参考 loss.py 里的各类)。

如果要保留这种 functional 式的接口,那么,就不需要实现 CombinedMarginLoss 类先实例化再调用,而直接调用 flow.F.combined_margin_loss

我比较倾向于前一种方式,看你的意思呢?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我按照第一种方式改,然后把class CombinedMarginLoss(Module) 也移动到loss.py里?

@@ -1162,6 +1162,48 @@ def forward(self, input, target) -> Tensor:
return flow.mean(loss)


class CombinedMarginLoss(Module):
"""The operation implement "loss_name == 'margin_softmax'" in insightface.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个 loss 的资料其实还不是属于“常识类”的吧,要不给个文献的出处?pytorch 和我们的源码里又多处这种情况。像

http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf .
这些算子这样。

@@ -1162,6 +1162,48 @@ def forward(self, input, target) -> Tensor:
return flow.mean(loss)


class CombinedMarginLoss(Module):
"""The operation implement "loss_name == 'margin_softmax'" in insightface.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""The operation implement "loss_name == 'margin_softmax'" in insightface.
"""The operation implements "margin_softmax" in InsightFace.

然后给个文献链接,这样是不是更方便读者了解?


def forward(self, x, label):
depth = x.shape[1]
(y, theta) = flow.F.combined_margin_loss(x, label,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(y, theta) = flow.F.combined_margin_loss(x, label,
(y, _) = flow.F.combined_margin_loss(x, label,

有些静态分析的工具,对不使用的变量会报警告,建议按照习惯,不用的变量名字采用 _

@@ -1162,6 +1162,48 @@ def forward(self, input, target) -> Tensor:
return flow.mean(loss)


class CombinedMarginLoss(Module):
"""The operation implement "loss_name == 'margin_softmax'" in insightface.
insightface's margin_softmax loss implement by several operators, we combined them for speed up.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
insightface's margin_softmax loss implement by several operators, we combined them for speed up.
The implementation of margin_softmax in InsightFace is composed of multiple operators. We fuse them for speed up.

我看算子融合会用 “fuse”,是不是比 combine 要准确一点(这个可以再和郭冉确认下)?
另外 InsightFace 的大小写要注意统一更改下。

@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 16, 2021 05:48
@oneflow-ci-bot oneflow-ci-bot self-requested a review August 16, 2021 06:28
@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 

PyTorch resnet50 time: 146.7ms (= 7332.9ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 128.5ms (= 6427.2ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
Relative speed: 1.14 (= 146.7ms / 128.5ms)

PyTorch resnet50 time: 84.4ms (= 4219.9ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 74.5ms (= 3723.5ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
Relative speed: 1.13 (= 84.4ms / 74.5ms)

PyTorch resnet50 time: 57.4ms (= 2872.2ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 47.3ms (= 2362.9ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
Relative speed: 1.22 (= 57.4ms / 47.3ms)

PyTorch resnet50 time: 50.6ms (= 2530.9ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 39.4ms (= 1968.4ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
Relative speed: 1.29 (= 50.6ms / 39.4ms)

PyTorch resnet50 time: 43.0ms (= 2152.3ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 36.7ms (= 1833.1ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
Relative speed: 1.17 (= 43.0ms / 36.7ms)

@oneflow-ci-bot oneflow-ci-bot merged commit 04bb36c into master Aug 16, 2021
@oneflow-ci-bot oneflow-ci-bot deleted the move_combined_margin_loss branch August 16, 2021 07:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants