-
Couldn't load subscription status.
- Fork 5.9k
Modify the calculation logic of LambOptimizer #29313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
test=develop
test=develop
test=develop
test=develop
|
Sorry to inform you that 6464340's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually. |
…into lamb_temp
| T mom1 = moment1_[i]; | ||
| T mom2 = moment2_[i]; | ||
| T p = param_[i]; | ||
| T beta1_pow = *beta1_pow_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to initialize T beta1_pow and T beta2_pow here? Can you use pointer directly?
| moment2_out_[i] = mom2; | ||
| trust_ratio_div_[i] = mom1 / (sqrt(mom2) + epsilon_) + weight_decay_ * p; | ||
|
|
||
| mom1_unbiased = mom1 / (1 - beta1_pow); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Write T mom1_unbiased = mom1 / (1 - beta1_pow)' then you can do less constructing for T, same formom2_unbiased`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, done.
| self._cache_founf_inf = True | ||
| else: | ||
| if optimizer.__class__.__name__ in ["Lamb", "LambOptimizer"]: | ||
| from paddle.fluid.clip import ClipGradByGlobalNorm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we import at the file header ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, done.
| def run_simple_conv(inp_np, use_scaler=True): | ||
| paddle.seed(10) | ||
| paddle.framework.random._manual_program_seed(10) | ||
| with fluid.dygraph.guard(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On paddle2.0, should we use paddle.disable_static(), paddle.enable_static() instead of dygraph.guard ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this unittest had be deleted.
| optimizer = paddle.optimizer.Lamb( | ||
| learning_rate=0.01, parameters=model.parameters()) | ||
| scaler = fluid.dygraph.AmpScaler(init_loss_scaling=1024) | ||
| data = fluid.dygraph.to_variable(inp_np) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same to those fluid APIs, should we use Paddle 2.0 API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this unittest had be deleted.
|
|
||
|
|
||
| class TestSparseLambOp(unittest.TestCase): | ||
| class gTestSparseLambOp(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove meaningless g in this class name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, done.
| framework::TensorCopy(beta1_pow, platform::CUDAPlace(), &beta1_pow_gpu); | ||
| framework::TensorCopy(beta2_pow, platform::CUDAPlace(), &beta2_pow_gpu); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either beta1_pow or beta2_pow has only one value. You don't need to do a copy here. Please refer to Adam.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, done.
| framework::TensorCopy(beta1_pow, platform::CUDAPlace(), &beta1_pow_gpu); | ||
| framework::TensorCopy(beta2_pow, platform::CUDAPlace(), &beta2_pow_gpu); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either beta1_pow or beta2_pow has only one value. You don't need to do a copy here. Please refer to Adam.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for op_function_generator.cc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* Modify the calculation logic of LambOptimizer
PR types
Performance optimization
PR changes
OPs
Describe
Modify the calculation logic of Lamb in order to make it correspond to the paper.
The original calculation logic in article is shown in Figure above.
But now the lamb calculation logic in paddle is shown in Figure. Compared with the original, it lacks the calculation of the red box in the first figure.
So, the main function of PR is to complete the calculation of LambOptimizer.
Document preview link: http://10.136.157.23:8090/documentation/docs/zh/api/index_cn.html?reviewVersion=jenkins-doc-review-167
Lamb: http://10.136.157.23:8090/documentation/docs/en/api/paddle/optimizer/Lamb_en.html
LambOptimizer: http://10.136.157.23:8090/documentation/docs/en/api/paddle/fluid/optimizer/LambOptimizer_en.html