Add SGD with Momentum Optimizer, addresses #298 #369

dylanagreen · 2019-07-22T19:25:33Z

This PR adds momentum with a new SGDMomentum object. The default values for SGDMomentum are the same as SGD, and when left as default the SGDMomentum optimizer will operate in the same way the base SGD optimizer. SGDMomentum optimizers must be declared with var, similarly to Adam and unlike SGD. Additionally an SGDMomentum object has a parameter that allows it to use Nesterov momentum rather than regular momentum. I've included a link to the paper that proposes Nesterov momentum in the documentation.

This implementation of SGD with Momentum is within 3-4 decimal places with the PyTorch implementation in both the Nesterov and base momentum cases. I have included a PyTorch output for three SGD test cases (without momentum, with momentum, and with nesterov momentum) as well as one for Adam. Tests for SGDMomentum are located in tests/nn/test_optimizers.nim. The included PyTorch implementation is a modification of the following legacy PyTorch test: https://github.com/pytorch/pytorch/blob/master/test/optim/test.py.

It may be possible to automate these tests, right now I've done them manually by eye. If the output of the PyTorch impl. is saved to a file the Arraymancer test could load the results and check that at each optimizer step the new values of the (x, y) model points is "close enough" to the PyTorch output.

- I've retained an older non-momentum version of SGD for backcompatibility. Storing moments requires a variable SGD object, and most code written for arraymancer prior to this more than likely defines itsoptimizers with `let` since that is how it is done in the examples.

- This reording doesn't change the function of update(), but it does make it easier to implement Nesterov momentum.

- This preserves backwards compatibility with old `let optim` declared SGD optimizers.

mratsim · 2019-07-23T08:24:05Z

Perfect, thank you very much

dylanagreen added 8 commits July 19, 2019 13:22

Move the old moment update to before the weight update.

2288547

- This reording doesn't change the function of update(), but it does make it easier to implement Nesterov momentum.

Separate SGD with momentum into its own object.

12e1ea2

- This preserves backwards compatibility with old `let optim` declared SGD optimizers.

Add (optional) Nesterov momentum to the SGDMomentum optimizer.

71c64a2

Add learning rate decay to SGDMomentum.

b7252b8

Add tests for SGD with momentum.

23a9c32

Add documentation to SGD.

0b14b5e

Remove an extraneous modification to newSGD.

cae1e89

dylanagreen changed the title ~~Add SGD with Momentum Optimizer, partially closes #298~~ Add SGD with Momentum Optimizer, addresses #298 Jul 22, 2019

mratsim merged commit 2e7b193 into mratsim:master Jul 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SGD with Momentum Optimizer, addresses #298 #369

Add SGD with Momentum Optimizer, addresses #298 #369

dylanagreen commented Jul 22, 2019

mratsim commented Jul 23, 2019

Add SGD with Momentum Optimizer, addresses #298 #369

Add SGD with Momentum Optimizer, addresses #298 #369

Conversation

dylanagreen commented Jul 22, 2019

mratsim commented Jul 23, 2019