-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Add LARS support #10374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add LARS support #10374
Conversation
|
||
|
||
def append_LARS(params_grads, learning_rate, weight_decay): | ||
"""Applies LARS (LAYER-WISE ADAPTIVE RATE SCALING) to learning rate for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can add the link to the paper here.
Please add a unit test for this learning rate scheduler strategy. |
def __init__(self, | ||
learning_rate, | ||
regularization=None, | ||
LARS_weight_decay=0.0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the paper, I think the default value of LARS_weight_decay may be 1.0.
The program can not transpile to distributed version correctly when using LARS, still debugging. |
… lars_scheduler
@jacquesqiao Can we merge this for now, so I can test it using NCCL2 distributed training |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Fix #6811
Related: #7788
To use, add
LARS_weight_decay=[some value greater than 0]
to enable LARS, LARS can also works along with current learning rate schedulers, like "polynomial_decay" e.g.or