Skip to content

Conversation

@gfwm2013
Copy link
Contributor

@gfwm2013 gfwm2013 commented Dec 3, 2020

PR types

Performance optimization

PR changes

OPs

Describe

Modify the calculation logic of Lamb in order to make it correspond to the paper.

41129c20e79f18e19ca03dcecc7cf574
The original calculation logic in article is shown in Figure above.

image
But now the lamb calculation logic in paddle is shown in Figure. Compared with the original, it lacks the calculation of the red box in the first figure.

So, the main function of PR is to complete the calculation of LambOptimizer.

Document preview link: http://10.136.157.23:8090/documentation/docs/zh/api/index_cn.html?reviewVersion=jenkins-doc-review-167
Lamb: http://10.136.157.23:8090/documentation/docs/en/api/paddle/optimizer/Lamb_en.html
LambOptimizer: http://10.136.157.23:8090/documentation/docs/en/api/paddle/fluid/optimizer/LambOptimizer_en.html

@paddle-bot-old
Copy link

Sorry to inform you that 6464340's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

T mom1 = moment1_[i];
T mom2 = moment2_[i];
T p = param_[i];
T beta1_pow = *beta1_pow_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to initialize T beta1_pow and T beta2_pow here? Can you use pointer directly?

moment2_out_[i] = mom2;
trust_ratio_div_[i] = mom1 / (sqrt(mom2) + epsilon_) + weight_decay_ * p;

mom1_unbiased = mom1 / (1 - beta1_pow);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write T mom1_unbiased = mom1 / (1 - beta1_pow)' then you can do less constructing for T, same formom2_unbiased`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done.

self._cache_founf_inf = True
else:
if optimizer.__class__.__name__ in ["Lamb", "LambOptimizer"]:
from paddle.fluid.clip import ClipGradByGlobalNorm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we import at the file header ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done.

def run_simple_conv(inp_np, use_scaler=True):
paddle.seed(10)
paddle.framework.random._manual_program_seed(10)
with fluid.dygraph.guard():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On paddle2.0, should we use paddle.disable_static(), paddle.enable_static() instead of dygraph.guard ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this unittest had be deleted.

optimizer = paddle.optimizer.Lamb(
learning_rate=0.01, parameters=model.parameters())
scaler = fluid.dygraph.AmpScaler(init_loss_scaling=1024)
data = fluid.dygraph.to_variable(inp_np)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same to those fluid APIs, should we use Paddle 2.0 API?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this unittest had be deleted.



class TestSparseLambOp(unittest.TestCase):
class gTestSparseLambOp(unittest.TestCase):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove meaningless g in this class name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done.

Comment on lines 261 to 262
framework::TensorCopy(beta1_pow, platform::CUDAPlace(), &beta1_pow_gpu);
framework::TensorCopy(beta2_pow, platform::CUDAPlace(), &beta2_pow_gpu);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either beta1_pow or beta2_pow has only one value. You don't need to do a copy here. Please refer to Adam.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done.

Comment on lines 329 to 330
framework::TensorCopy(beta1_pow, platform::CUDAPlace(), &beta1_pow_gpu);
framework::TensorCopy(beta2_pow, platform::CUDAPlace(), &beta2_pow_gpu);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either beta1_pow or beta2_pow has only one value. You don't need to do a copy here. Please refer to Adam.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done.

zhiqiu
zhiqiu previously approved these changes Jan 15, 2021
Copy link
Contributor

@zhiqiu zhiqiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for op_function_generator.cc

@gfwm2013 gfwm2013 requested a review from zhiqiu January 16, 2021 11:27
Copy link
Contributor

@wzzju wzzju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Contributor

@TCChenlong TCChenlong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gfwm2013 gfwm2013 merged commit 11e78eb into PaddlePaddle:develop Jan 17, 2021
gfwm2013 added a commit to gfwm2013/Paddle that referenced this pull request Jan 17, 2021
* Modify the calculation logic of LambOptimizer
lanxianghit pushed a commit that referenced this pull request Jan 18, 2021
…30510)

* Modify the calculation logic of LambOptimizer (#29313)

* Modify the calculation logic of LambOptimizer

* Modify the calculation logic of LambOptimizer

* Modify the calculation logic of LambOptimizer
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants