Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dev adamw graph config #5745

Merged
merged 14 commits into from
Aug 11, 2021
Merged

dev adamw graph config #5745

merged 14 commits into from
Aug 11, 2021

Conversation

MARD1NO
Copy link
Contributor

@MARD1NO MARD1NO commented Aug 5, 2021

No description provided.

@MARD1NO MARD1NO requested a review from Ldpe2G August 10, 2021 03:01

# TODO(): optimizer_conf need to have loss_scale_factor field to support multi scale factor
base_scale = train_conf.loss_scale_factor()
assert math.isclose(base_scale, 1, rel_tol=1e-4), "nn.Graph only support one scale factor at the moment, base_scale {} vs scale {}".format(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里还有必要对 base_scale 做限制吗,我看#5821 这个pr里去掉了

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

的确没有必要了

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同步一下那次讨论的结论

  • loss scale是amp的功能之一,逻辑是防止loss过小,浮点截断后,导致回传梯度消失,所以先增大loss * scale,之后再grad/scale
  • loss scale不属于optimizer的参数,所以optimizer中的scale在5821中都清理掉了
  • optimizer调用modle_update_op的那个scale参数,是scale weight的一个接口,给内部调整weight用的通用接口

@MARD1NO MARD1NO requested review from Ldpe2G, wyg1997 and strint August 11, 2021 06:11
@MARD1NO MARD1NO marked this pull request as ready for review August 11, 2021 06:11
@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 11, 2021 07:25
@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 11, 2021 09:56
@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 11, 2021 11:36
@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 

PyTorch resnet50 time: 140.7ms (= 7033.6ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 128.3ms (= 6416.3ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
Relative speed: 1.10 (= 140.7ms / 128.3ms)

PyTorch resnet50 time: 84.5ms (= 4224.9ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 74.3ms (= 3715.7ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
Relative speed: 1.14 (= 84.5ms / 74.3ms)

PyTorch resnet50 time: 57.2ms (= 2860.7ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 47.8ms (= 2392.1ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
Relative speed: 1.20 (= 57.2ms / 47.8ms)

PyTorch resnet50 time: 48.9ms (= 2445.2ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 46.1ms (= 2304.5ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
Relative speed: 1.06 (= 48.9ms / 46.1ms)

PyTorch resnet50 time: 44.6ms (= 2231.0ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 40.3ms (= 2014.9ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
Relative speed: 1.11 (= 44.6ms / 40.3ms)

@oneflow-ci-bot oneflow-ci-bot merged commit e9f1e23 into master Aug 11, 2021
@oneflow-ci-bot oneflow-ci-bot deleted the dev_adamw_graph_conf branch August 11, 2021 12:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants