AutoMatedTest support test module.parameter.grad #6043

wyg1997 · 2021-08-25T08:51:54Z

增强 AutoMatedTest 的功能，使之比较 module.parameter.grad 的值
修复 bn 层 grad 计算不对齐的问题
修复 randperm 创建 0shape tensor 的 bug，并修复对应的单测
修复 BatchNorm cpu 计算不对齐的问题

TODO：

bn 增加 cpu kernel，不在 python 上拼计算
自动测试给出哪个 module 的哪个参数没对齐的提示
num_batches_tracked 做进 bn 的 UserOp 内
Normalization 需要支持 weight 和 bias 为 None 的情况，需要修改 functor

wyg1997 · 2021-08-25T08:54:17Z

python/oneflow/test_utils/automated_test_util/torch_flow_dual_object.py

+                        dual_objects_to_test.append(
+                            GetDualObject(
+                                "unused",
+                                getattr(x.pytorch, key).grad,
+                                getattr(x.oneflow, key).grad,
                            )
                        )


增加参数梯度的对比

@leaves-zwx 在这里加了对 module 参数梯度的对比，才能找到 momentum 没对齐时，参数更新错的问题

python/oneflow/test/modules/test_batchnorm.py

wyg1997 · 2021-08-25T12:10:10Z

目前 bn 的 cpu 实现还没有对齐，torch 的计算公式为：

  /// Collect the linear and constant terms regarding the input.
  /// output(n, c, h, w)
  ///     = (input(n, c, h, w) - mean(c)) / sqrt(var(c) + eps) * weight(c)
  ///         + bias(c)
  ///     = input(n, c, h, w) * inv_var(c) * weight(c)
  ///         - mean(c) * inv_var(c) * weight(c) + bias(c),
  /// where inv_var(c) = 1 / sqrt(var(c) + eps).
  /// So the linear term, alpha(c) = inv_var(c) * weight(c),
  ///   the constant term beta(c) = bias(c) - mean(c) * inv_var(c) * weight(c)
  /// Note that this is only a good idea if (input_size >> c), in degenerate
  /// cases where image_size == 1 && batch_size == 1, it is slow.

hjchen2 · 2021-08-25T12:23:52Z

目前 bn 的 cpu 实现还没有对齐，torch 的计算公式为：

  /// Collect the linear and constant terms regarding the input.
  /// output(n, c, h, w)
  ///     = (input(n, c, h, w) - mean(c)) / sqrt(var(c) + eps) * weight(c)
  ///         + bias(c)
  ///     = input(n, c, h, w) * inv_var(c) * weight(c)
  ///         - mean(c) * inv_var(c) * weight(c) + bias(c),
  /// where inv_var(c) = 1 / sqrt(var(c) + eps).
  /// So the linear term, alpha(c) = inv_var(c) * weight(c),
  ///   the constant term beta(c) = bias(c) - mean(c) * inv_var(c) * weight(c)
  /// Note that this is only a good idea if (input_size >> c), in degenerate
  /// cases where image_size == 1 && batch_size == 1, it is slow.

我们cpu和这个计算公式应该就是对齐的吧，后面那个等式可能会造成精度差异。

wyg1997 · 2021-08-25T12:25:56Z

我们cpu和这个计算公式应该就是对齐的吧，后面那个等式可能会造成精度差异。

我又核对了一下，是 running_mean 和 running_var 错了

wyg1997 · 2021-08-25T13:23:45Z

python/oneflow/nn/modules/batchnorm.py

-                self.__setattr__("running_mean", running_mean)
-                self.__setattr__("running_var", running_var)
+                # use unbiased variance to update running_var
+                unbiased_variance = x.var(dim=reduce_axis, unbiased=True, keepdim=False)


更新 running_var 时用了无偏估计，后面计算的时候用的是真正的方差

BBuf · 2021-08-25T13:38:08Z

自动测试给出哪个 module 的哪个参数没对齐的提示，这个是打算怎么做呢？

wyg1997 · 2021-08-25T14:03:06Z

自动测试给出哪个 module 的哪个参数没对齐的提示，这个是打算怎么做呢？

游离的 tensor 不好办，module 里的参数都带名字的，这个在创建比较集合的时候就把名字传进去，对比出错打印的时候可以打出来

github-actions · 2021-08-25T18:10:31Z

CI failed, removing label automerge

MARD1NO · 2021-08-26T09:23:21Z

需不需要给batchnorm加入其他参数如 affine 的测试呢？

如果affine为False，这里是运行有错的，要给 functor的gamma和 beta 设置为Optional

wyg1997 · 2021-08-26T09:42:03Z

需不需要给batchnorm加入其他参数如 affine 的测试呢？

如果affine为False，这里是运行有错的，要给 functor的gamma和 beta 设置为Optional

这里就要functor 支持了，我记个 TODO 另外提一个 PR 来改

chengtbf · 2021-08-26T10:16:19Z

python/oneflow/nn/modules/batchnorm.py

@@ -158,6 +164,8 @@ def forward(self, x):
        else:
            if self.training:
                is_training = True
+                if self.track_running_stats:
+                    self.num_batches_tracked += 1


这里不能写 += 。。。。。因为会触发 Inplace Add，推导 Consistent SBP 有 BUG

github-actions · 2021-08-26T16:58:07Z

Speed stats:

GPU Name: GeForce GTX 1080 

PyTorch resnet50 time: 141.7ms (= 7087.3ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 128.2ms (= 6412.2ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
Relative speed: 1.11 (= 141.7ms / 128.2ms)

PyTorch resnet50 time: 83.8ms (= 4192.2ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 74.6ms (= 3731.3ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
Relative speed: 1.12 (= 83.8ms / 74.6ms)

PyTorch resnet50 time: 62.4ms (= 3118.8ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 47.4ms (= 2371.7ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
Relative speed: 1.32 (= 62.4ms / 47.4ms)

PyTorch resnet50 time: 47.9ms (= 2396.6ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 39.3ms (= 1963.4ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
Relative speed: 1.22 (= 47.9ms / 39.3ms)

PyTorch resnet50 time: 43.7ms (= 2182.6ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 33.4ms (= 1672.5ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
Relative speed: 1.31 (= 43.7ms / 33.4ms)

feat(AutoMatedTest): AutoMatedTest support test module.parameter.grad

51aa9ad

wyg1997 commented Aug 25, 2021

View reviewed changes

fix(randperm): fix randperm 0shape bug and fix test bug

754ea1a

wyg1997 requested review from hjchen2 and BBuf August 25, 2021 09:39

wyg1997 added automerge bug eager enhancement labels Aug 25, 2021

hjchen2 reviewed Aug 25, 2021

View reviewed changes

python/oneflow/test/modules/test_batchnorm.py Outdated Show resolved Hide resolved

fix(BatchNorm): fix batch_norm cpu bug

aaf6e29

wyg1997 commented Aug 25, 2021

View reviewed changes

hjchen2 self-requested a review August 25, 2021 13:25

hjchen2 approved these changes Aug 25, 2021

View reviewed changes

Merge branch 'master' into feat-autotest_param_grad

2a671cc

wyg1997 requested a review from oneflow-ci-bot August 25, 2021 13:36

BBuf approved these changes Aug 25, 2021

View reviewed changes

Merge branch 'master' into feat-autotest_param_grad

ea32efb

oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 25, 2021 14:38

Merge branch 'master' into feat-autotest_param_grad

d4915ac

oneflow-ci-bot self-requested a review August 25, 2021 17:00

wyg1997 requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 26, 2021 08:37

oneflow-ci-bot removed their request for review August 26, 2021 09:15

Merge branch 'master' into feat-autotest_param_grad

6a854f2

oneflow-ci-bot self-requested a review August 26, 2021 09:15

chengtbf reviewed Aug 26, 2021

View reviewed changes

fix(BatchNorm): do not use inplace add for num_batches_tracked

2f0dc9d

wyg1997 requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 26, 2021 10:27

chengtbf removed the automerge label Aug 26, 2021

remove num_batches_tracked from batchnorm module

fde22f9

chengtbf added the automerge label Aug 26, 2021

chengtbf approved these changes Aug 26, 2021

View reviewed changes

add comment for batchnorm functor

f8a4ea4

wyg1997 requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 26, 2021 11:16

oneflow-ci-bot removed their request for review August 26, 2021 11:52

Merge branch 'master' into feat-autotest_param_grad

f92effa

oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 26, 2021 11:52

Merge branch 'master' into feat-autotest_param_grad

c306db3

oneflow-ci-bot self-requested a review August 26, 2021 14:31

Merge branch 'master' into feat-autotest_param_grad

7798f00

oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 26, 2021 16:03

oneflow-ci-bot merged commit 7fccccf into master Aug 26, 2021

oneflow-ci-bot deleted the feat-autotest_param_grad branch August 26, 2021 17:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AutoMatedTest support test module.parameter.grad #6043

AutoMatedTest support test module.parameter.grad #6043

wyg1997 commented Aug 25, 2021 •

edited

Loading

wyg1997 Aug 25, 2021

wyg1997 Aug 26, 2021

wyg1997 commented Aug 25, 2021

hjchen2 commented Aug 25, 2021

wyg1997 commented Aug 25, 2021

wyg1997 Aug 25, 2021

BBuf commented Aug 25, 2021

wyg1997 commented Aug 25, 2021

github-actions bot commented Aug 25, 2021

MARD1NO commented Aug 26, 2021

wyg1997 commented Aug 26, 2021

chengtbf Aug 26, 2021

github-actions bot commented Aug 26, 2021

AutoMatedTest support test module.parameter.grad #6043

AutoMatedTest support test module.parameter.grad #6043

Conversation

wyg1997 commented Aug 25, 2021 • edited Loading

TODO：

wyg1997 Aug 25, 2021

Choose a reason for hiding this comment

wyg1997 Aug 26, 2021

Choose a reason for hiding this comment

wyg1997 commented Aug 25, 2021

hjchen2 commented Aug 25, 2021

wyg1997 commented Aug 25, 2021

wyg1997 Aug 25, 2021

Choose a reason for hiding this comment

BBuf commented Aug 25, 2021

wyg1997 commented Aug 25, 2021

github-actions bot commented Aug 25, 2021

MARD1NO commented Aug 26, 2021

wyg1997 commented Aug 26, 2021

chengtbf Aug 26, 2021

Choose a reason for hiding this comment

github-actions bot commented Aug 26, 2021

wyg1997 commented Aug 25, 2021 •

edited

Loading