Add SGD with Momentum Optimizer, addresses #298 #369
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds momentum with a new SGDMomentum object. The default values for SGDMomentum are the same as SGD, and when left as default the SGDMomentum optimizer will operate in the same way the base SGD optimizer. SGDMomentum optimizers must be declared with
var
, similarly to Adam and unlike SGD. Additionally an SGDMomentum object has a parameter that allows it to use Nesterov momentum rather than regular momentum. I've included a link to the paper that proposes Nesterov momentum in the documentation.This implementation of SGD with Momentum is within 3-4 decimal places with the PyTorch implementation in both the Nesterov and base momentum cases. I have included a PyTorch output for three SGD test cases (without momentum, with momentum, and with nesterov momentum) as well as one for Adam. Tests for SGDMomentum are located in tests/nn/test_optimizers.nim. The included PyTorch implementation is a modification of the following legacy PyTorch test: https://github.com/pytorch/pytorch/blob/master/test/optim/test.py.
It may be possible to automate these tests, right now I've done them manually by eye. If the output of the PyTorch impl. is saved to a file the Arraymancer test could load the results and check that at each optimizer step the new values of the (x, y) model points is "close enough" to the PyTorch output.