Skip to content

Support exclude_from_weight_decay in AdamW #1903

Closed
@jarednielsen

Description

@jarednielsen

Describe the feature and the current behavior/state.

The LAMB optimizer has this option, but AdamW does not. This is necessary to train transformer models with Adam.

Relevant information

  • Are you willing to contribute it (yes/no): no
  • Are you willing to maintain it going forward? (yes/no): no
  • Is there a relevant academic paper? (if so, where):
  • Is there already an implementation in another framework? (if so, where): Yes, TF 1.
  • Was it part of tf.contrib? (if so, where): no

Which API type would this fall under (layer, metric, optimizer, etc.)

Optimizer

Who will benefit with this feature?

User training NLP models with LayerNorm.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions