Skip to content

Dropout+layerwisedecay #36482

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 17 commits into from
Closed

Dropout+layerwisedecay #36482

wants to merge 17 commits into from

Conversation

sljlp
Copy link
Contributor

@sljlp sljlp commented Oct 16, 2021

PR types

PR changes

Describe

  1. 参考add layerwise learning rate for adamw #35569
    paddle.optimizer.Adamw 里有个参数lr_ration, 相当于layerwise_lr_scale
  2. 参考[hybrid enhance] add flag to control the avg position for grad merge under pipeline mode #36384
    在num_pp>2时,设置DistributedStrategy.gradient_scale_configs["scale_gradient"] = True, 可以与单卡对的更齐,用法如下:
strategy = paddle.distributed.fleet.DistributedStrategy()
strategy.gradient_scale_configs = {
    'scale_strategy': 'avg',
    'scale_gradient': True
}

@paddle-bot-old
Copy link

❌ The PR is not created using PR's template. You can refer to this Demo.
Please use PR's template, it helps save our maintainers' time so that more developers get helped.

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@sljlp sljlp closed this Nov 17, 2021
@sljlp sljlp deleted the dropout+layerwisedecay branch November 17, 2021 04:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants