-
Notifications
You must be signed in to change notification settings - Fork 825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dev adamw graph config #5745
dev adamw graph config #5745
Conversation
python/oneflow/nn/optimizer/adamw.py
Outdated
|
||
# TODO(): optimizer_conf need to have loss_scale_factor field to support multi scale factor | ||
base_scale = train_conf.loss_scale_factor() | ||
assert math.isclose(base_scale, 1, rel_tol=1e-4), "nn.Graph only support one scale factor at the moment, base_scale {} vs scale {}".format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里还有必要对 base_scale 做限制吗,我看#5821 这个pr里去掉了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
的确没有必要了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同步一下那次讨论的结论
- loss scale是amp的功能之一,逻辑是防止loss过小,浮点截断后,导致回传梯度消失,所以先增大loss * scale,之后再grad/scale
- loss scale不属于optimizer的参数,所以optimizer中的scale在5821中都清理掉了
- optimizer调用modle_update_op的那个scale参数,是scale weight的一个接口,给内部调整weight用的通用接口
Speed stats:
|
No description provided.