Skip to content

Use consistent markdown formatting for the AdamW paper #2722

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 8, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 6 additions & 8 deletions tensorflow_addons/optimizers/weight_decay_optimizers.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
class DecoupledWeightDecayExtension:
"""This class allows to extend optimizers with decoupled weight decay.

It implements the decoupled weight decay described by Loshchilov & Hutter
It implements the decoupled weight decay described by [Loshchilov & Hutter]
(https://arxiv.org/pdf/1711.05101.pdf), in which the weight decay is
decoupled from the optimization steps w.r.t. to the loss function.
For SGD variants, this simplifies hyperparameter search since it decouples
Expand Down Expand Up @@ -334,7 +334,7 @@ class OptimizerWithDecoupledWeightDecay(
This class computes the update step of `base_optimizer` and
additionally decays the variable with the weight decay being
decoupled from the optimization steps w.r.t. to the loss
function, as described by Loshchilov & Hutter
function, as described by [Loshchilov & Hutter]
(https://arxiv.org/pdf/1711.05101.pdf). For SGD variants, this
simplifies hyperparameter search since it decouples the settings
of weight decay and learning rate. For adaptive gradient
Expand All @@ -358,9 +358,8 @@ class SGDW(DecoupledWeightDecayExtension, tf.keras.optimizers.SGD):
"""Optimizer that implements the Momentum algorithm with weight_decay.

This is an implementation of the SGDW optimizer described in "Decoupled
Weight Decay Regularization" by Loshchilov & Hutter
(https://arxiv.org/abs/1711.05101)
([pdf])(https://arxiv.org/pdf/1711.05101.pdf).
Weight Decay Regularization" by [Loshchilov & Hutter]
(https://arxiv.org/pdf/1711.05101.pdf).
It computes the update step of `tf.keras.optimizers.SGD` and additionally
decays the variable. Note that this is different from adding
L2 regularization on the variables to the loss. Decoupling the weight decay
Expand Down Expand Up @@ -438,9 +437,8 @@ class AdamW(DecoupledWeightDecayExtension, tf.keras.optimizers.Adam):
"""Optimizer that implements the Adam algorithm with weight decay.

This is an implementation of the AdamW optimizer described in "Decoupled
Weight Decay Regularization" by Loshch ilov & Hutter
(https://arxiv.org/abs/1711.05101)
([pdf])(https://arxiv.org/pdf/1711.05101.pdf).
Weight Decay Regularization" by [Loshchilov & Hutter]
(https://arxiv.org/pdf/1711.05101.pdf).

It computes the update step of `tf.keras.optimizers.Adam` and additionally
decays the variable. Note that this is different from adding L2
Expand Down