Skip to content

Conversation

@wangxicoding
Copy link
Contributor

@wangxicoding wangxicoding commented Jan 5, 2021

PR types

Bug fixes

PR changes

APIs

Describe

  1. Fix Adamw weight_decay not applied when open AMP. see 静态图amp与paddle.optimizer.AdamW不兼容 #29794
  2. Fix Adamw weight_decay LR is fixed when using LRSchedule.
  3. Fix Adam step() not add imperative_base.no_grad, which gradients may be generated in the optimizer.
  4. Optimization calculation, the calculation formula of param = param - param * lr * coeff is optimized as follows
    param = param * (1.0 - lr * coeff)

@paddle-bot-old
Copy link

paddle-bot-old bot commented Jan 5, 2021

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@wangxicoding wangxicoding force-pushed the fix_adamw_apply_gradient branch from e0f6567 to 7b8a46f Compare January 6, 2021 05:37
assert param.dtype == paddle.fluid.core.VarDesc.VarType.FP32, \
"the type of coeff(float) and parameter(%s) is not consistent."%(param.dtype)
else:
assert self._coeff.dtype == param.dtype, \
Copy link
Contributor

@guoshengCS guoshengCS Jan 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感觉用户使用double有点麻烦,能否float时就不要求param.dtype呢,下面的decay_coeff = 1.0 - self._coeff * learning_rate一定会受影响而不是用learning_rate的dtype是吗

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是指把这整个判断逻辑都干掉吗

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.
1.0 - self._coeff * learning_rate,返回的tensor类型是self._coeff的类型,如果self._coeff 和 learning_rate类型不一致,会自动加cast转换类型。
把判断逻辑去掉则支持任何类型,改成1.0 - learning_rate * self._coeff使用learning_rate的类型。

guoshengCS
guoshengCS previously approved these changes Jan 6, 2021
Copy link
Contributor

@guoshengCS guoshengCS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wangxicoding wangxicoding requested a review from swtkiwi January 7, 2021 03:56
swtkiwi
swtkiwi previously approved these changes Jan 7, 2021
Copy link
Contributor

@swtkiwi swtkiwi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wangxicoding wangxicoding dismissed stale reviews from swtkiwi and guoshengCS via 4c0c9d6 January 7, 2021 07:28
Copy link
Contributor

@swtkiwi swtkiwi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wangxicoding wangxicoding merged commit 619c62b into PaddlePaddle:develop Jan 7, 2021
wangxicoding added a commit to wangxicoding/Paddle that referenced this pull request Jan 7, 2021
fuyinno4 pushed a commit that referenced this pull request Jan 10, 2021
@wangxicoding wangxicoding deleted the fix_adamw_apply_gradient branch January 21, 2021 10:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants