-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
【Hackathon 7th PPSCI No.12】Adam、AdamW 优化器支持 amsgrad #949
Conversation
@sunzhongkai588 @luotao1 请帮忙看一下,是 RFC 提交的地方不对吗?PaddlePaddle/Paddle#67603 我看还一直是“报名”状态 ~ 🫠 |
好像是因为脚本忘记监控 community 这个仓库了 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
- 在 Paddle 的 `Adam, AdamW` 接口中,增加 `amsgrad` 选项,使其支持 `AMSGrad` 算法。 | ||
- 在 PaddleScience 的 `Adam, AdamW` 接口中,增加 `amsgrad` 选项,使其支持 `AMSGrad` 算法。 | ||
|
||
> **说明** PaddleScience 的 `Adam, AdamW` 优化器是通过调用 Paddle 的相应优化器实现,而 Paddle 的 `Adam, AdamW` 优化算法通过调用后台 c++ 算子实现,`AMSGrad` 所需要的 `历史平方梯度的最大值` 也需要在 c++ 算子中实现,因此,需要通过修改 Paddle 的 `Adam, AdamW` 优化器接口,从而支持 `AMSGrad`,而无法单独在 PaddleScience 中支持 `AMSGrad`。(单独、且只在 PaddleScience 中实现 `AMSGrad` 优化器,不在本文讨论范围之内。) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是的,这功能主要是框架的C++算子支持
1000, | ||
find_master, | ||
False, | ||
self._amsgrad, # 标记位 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里需要将 amsgrad作为self的成员变量吗?可能直接amsgrad就行了?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里是我代码引用的太少了,应该还是不可以 ... ... 😅
class Adam(Optimizer):
def __init__(
self,
learning_rate: float | LRScheduler = 0.001,
beta1: float | Tensor = 0.9,
beta2: float | Tensor = 0.999,
epsilon: float | Tensor = 1e-8,
parameters: (
Sequence[Tensor] | Sequence[_AdamParameterConfig] | None
) = None,
weight_decay: float | WeightDecayRegularizer | None = None,
grad_clip: GradientClipBase | None = None,
lazy_mode: bool = False,
multi_precision: bool = False,
use_multi_tensor: bool = False,
amsgrad: bool = False, # 标记位
name: str | None = None,
) -> None:
...
self._amsgrad = amsgrad # 标记位
...
def _append_optimize_op(self, block, param_and_grad):
...
_ = _C_ops.adam_( # 调用底层算子
param_and_grad[0],
param_and_grad[1],
lr,
moment1,
moment2,
moment2_max, # 输入参数,最大值
beta1_pow_acc,
beta2_pow_acc,
master_weight,
found_inf,
_beta1,
_beta2,
self._epsilon,
self._lazy_mode,
1000,
find_master,
False,
self._amsgrad, # 标记位
)
底层算子的调用在另一个方法里面,Adam 初始化的时候传入 amsgrad,需要一个私有变量 (self._amsgrad) 进行传递 ~
已在文档中补充代码 ~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
具体代码的修改是否涉及分布式的逻辑呢?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
涉及 spmd rule 的相应代码修改,已经在文档中补充了 ~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
Others
PR changes
Docs
Description
【Hackathon 7th No.12】Adam、AdamW 优化器支持 amsgrad 相关设计文档
请评审 ~