-
Notifications
You must be signed in to change notification settings - Fork 349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
max_grad_norm schedule #375
Labels
enhancement
New feature or request
Comments
Closed
Darktex
added a commit
to Darktex/opacus
that referenced
this issue
Jan 20, 2023
Summary: This diff introduces gradient clipping schedulers that can be used to vary gradient clipping throughout training. Addresses pytorch#375 in OSS. Differential Revision: D42644261 fbshipit-source-id: 50dc48a29254b33d581f858a09fc4e03ec9a598a
Darktex
added a commit
to Darktex/opacus
that referenced
this issue
Jan 20, 2023
Summary: Pull Request resolved: pytorch#556 This diff introduces gradient clipping schedulers that can be used to vary gradient clipping throughout training. Addresses pytorch#375 in OSS. Differential Revision: D42644261 fbshipit-source-id: 7480e2f81432bd4a05d58af420ee4a783bdd63f1
Darktex
added a commit
to Darktex/opacus
that referenced
this issue
Jan 20, 2023
Summary: Pull Request resolved: pytorch#556 This diff introduces gradient clipping schedulers that can be used to vary gradient clipping throughout training. Addresses pytorch#375 in OSS. Differential Revision: D42644261 fbshipit-source-id: 0af54b977d41b164531c18d4804a2506a359ce28
Darktex
added a commit
to Darktex/opacus
that referenced
this issue
Jan 24, 2023
Summary: Pull Request resolved: pytorch#556 This diff introduces gradient clipping schedulers that can be used to vary gradient clipping throughout training. Addresses pytorch#375 in OSS. Reviewed By: karthikprasad Differential Revision: D42644261 fbshipit-source-id: 91bedb4c3dd68f336917d16cec42f939ace02406
Darktex
added a commit
to Darktex/opacus
that referenced
this issue
Jan 24, 2023
Summary: Pull Request resolved: pytorch#556 This diff introduces gradient clipping schedulers that can be used to vary gradient clipping throughout training. Addresses pytorch#375 in OSS. Reviewed By: karthikprasad Differential Revision: D42644261 fbshipit-source-id: 94f5fb756dce8ec576cce2ff003c2054eb926e27
Darktex
added a commit
to Darktex/opacus
that referenced
this issue
Jan 24, 2023
Summary: Pull Request resolved: pytorch#556 This diff introduces gradient clipping schedulers that can be used to vary gradient clipping throughout training. Addresses pytorch#375 in OSS. Reviewed By: karthikprasad Differential Revision: D42644261 fbshipit-source-id: a571569785697b369c4c6f9709d35a5d3ff7e73c
Darktex
added a commit
to Darktex/opacus
that referenced
this issue
Jan 24, 2023
Summary: Pull Request resolved: pytorch#556 This diff introduces gradient clipping schedulers that can be used to vary gradient clipping throughout training. Addresses pytorch#375 in OSS. Reviewed By: karthikprasad Differential Revision: D42644261 fbshipit-source-id: 57c87ba5f0b012359761fa015d046edc5c13da88
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
max_grad_norm
is functionally very similar tonoise_multplier
- it's also stored in the optimizer field as a scalar and never updated.We have convenient scheduling option for optimizer, but not for
max_grad_norm
.I can't point to any particular paper proposing this, but some (e.g. https://arxiv.org/pdf/2108.01624.pdf) show that over time singal-to-noise ratio can decline due to natural decline in gradient norm (expected in many training scenarions)
The text was updated successfully, but these errors were encountered: