Skip to content

Conversation

@tjruwase
Copy link
Contributor

@tjruwase tjruwase commented Aug 9, 2021

API for obtaining global unclipped gradient norm across all parameters groups. Based off #1286.
Optimizers are solely responsible for computing gradient norms. Gradient norms are computed (or refreshed) in optimizer.step().

@stas00 FYI

@stas00
Copy link
Collaborator

stas00 commented Aug 9, 2021

Thank you for working on this, @tjruwase

and then we will need Shaden's deepspeedai/Megatron-DeepSpeed#8 on the Megatron side once this is merged.

Copy link
Contributor

@ShadenSmith ShadenSmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samyam may also want to take a look

@tjruwase tjruwase merged commit cce85b8 into big-science Aug 9, 2021
jeffra pushed a commit that referenced this pull request Sep 8, 2021
* FP16 fused and unfused grad norm query.

* API for obtaining global unclipped gradient norm across parameter groups

* Use global norm not group norms

Co-authored-by: Shaden Smith <shaden.smith@microsoft.com>
@mrwyattii mrwyattii deleted the olruwase/global_gradient_norm branch July 7, 2023 02:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants