max_grad_norm schedule #375

ffuuugor · 2022-03-11T15:56:18Z

max_grad_norm is functionally very similar to noise_multplier - it's also stored in the optimizer field as a scalar and never updated.

We have convenient scheduling option for optimizer, but not for max_grad_norm.
I can't point to any particular paper proposing this, but some (e.g. https://arxiv.org/pdf/2108.01624.pdf) show that over time singal-to-noise ratio can decline due to natural decline in gradient norm (expected in many training scenarions)

The text was updated successfully, but these errors were encountered:

Summary: This diff introduces gradient clipping schedulers that can be used to vary gradient clipping throughout training. Addresses pytorch#375 in OSS. Differential Revision: D42644261 fbshipit-source-id: 50dc48a29254b33d581f858a09fc4e03ec9a598a

Summary: Pull Request resolved: pytorch#556 This diff introduces gradient clipping schedulers that can be used to vary gradient clipping throughout training. Addresses pytorch#375 in OSS. Differential Revision: D42644261 fbshipit-source-id: 7480e2f81432bd4a05d58af420ee4a783bdd63f1

Summary: Pull Request resolved: pytorch#556 This diff introduces gradient clipping schedulers that can be used to vary gradient clipping throughout training. Addresses pytorch#375 in OSS. Differential Revision: D42644261 fbshipit-source-id: 0af54b977d41b164531c18d4804a2506a359ce28

Summary: Pull Request resolved: pytorch#556 This diff introduces gradient clipping schedulers that can be used to vary gradient clipping throughout training. Addresses pytorch#375 in OSS. Reviewed By: karthikprasad Differential Revision: D42644261 fbshipit-source-id: 91bedb4c3dd68f336917d16cec42f939ace02406

Summary: Pull Request resolved: pytorch#556 This diff introduces gradient clipping schedulers that can be used to vary gradient clipping throughout training. Addresses pytorch#375 in OSS. Reviewed By: karthikprasad Differential Revision: D42644261 fbshipit-source-id: 94f5fb756dce8ec576cce2ff003c2054eb926e27

Summary: Pull Request resolved: pytorch#556 This diff introduces gradient clipping schedulers that can be used to vary gradient clipping throughout training. Addresses pytorch#375 in OSS. Reviewed By: karthikprasad Differential Revision: D42644261 fbshipit-source-id: a571569785697b369c4c6f9709d35a5d3ff7e73c

Summary: Pull Request resolved: pytorch#556 This diff introduces gradient clipping schedulers that can be used to vary gradient clipping throughout training. Addresses pytorch#375 in OSS. Reviewed By: karthikprasad Differential Revision: D42644261 fbshipit-source-id: 57c87ba5f0b012359761fa015d046edc5c13da88

Summary: Pull Request resolved: #556 This diff introduces gradient clipping schedulers that can be used to vary gradient clipping throughout training. Addresses #375 in OSS. Reviewed By: karthikprasad Differential Revision: D42644261 fbshipit-source-id: 7e200d704d97d0b0f5432153af32753c1d4e6204

ffuuugor added the enhancement New feature or request label Mar 11, 2022

Darktex mentioned this issue Jan 20, 2023

Add Clipping schedulers #556

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

max_grad_norm schedule #375

max_grad_norm schedule #375

ffuuugor commented Mar 11, 2022

max_grad_norm schedule #375

max_grad_norm schedule #375

Comments

ffuuugor commented Mar 11, 2022