Added Momentum and Nesterov modifications #148

Spnetic-5 · 2023-06-22T23:49:30Z

Solves #137

Spnetic-5 · 2023-06-23T21:33:42Z

@milancurcic Please review the implementation of suggested changes to the code, and I see the difference in MSE of SGD after adding momentum and nesterov arguments.

milancurcic · 2023-06-30T17:42:01Z

@Spnetic-5 I refactored the subroutines in the quadratic_fit example to accept xtest and ytest. This allows the subroutines to report RMSE every some number of epochs. For example, this is what the output looks like now:

...
Stochastic gradient descent
----------------------------------
Epoch:  100/1000, RMSE =  0.132655
Epoch:  200/1000, RMSE =  0.131642
Epoch:  300/1000, RMSE =  0.131308
Epoch:  400/1000, RMSE =  0.131156
Epoch:  500/1000, RMSE =  0.131071
Epoch:  600/1000, RMSE =  0.131018
Epoch:  700/1000, RMSE =  0.130981
Epoch:  800/1000, RMSE =  0.130953
Epoch:  900/1000, RMSE =  0.130932
Epoch: 1000/1000, RMSE =  0.130915
 
Batch gradient descent
----------------------------------
Epoch:  100/1000, RMSE =  0.474867
Epoch:  200/1000, RMSE =  0.406169
Epoch:  300/1000, RMSE =  0.356077
Epoch:  400/1000, RMSE =  0.320101
Epoch:  500/1000, RMSE =  0.294075
Epoch:  600/1000, RMSE =  0.274908
Epoch:  700/1000, RMSE =  0.260488
Epoch:  800/1000, RMSE =  0.249400
Epoch:  900/1000, RMSE =  0.240700
Epoch: 1000/1000, RMSE =  0.233746

Minibatch gradient descent
----------------------------------
Epoch:  100/1000, RMSE =  0.189657
Epoch:  200/1000, RMSE =  0.186826
Epoch:  300/1000, RMSE =  0.185389
Epoch:  400/1000, RMSE =  0.183885
Epoch:  500/1000, RMSE =  0.181658
Epoch:  600/1000, RMSE =  0.177690
Epoch:  700/1000, RMSE =  0.170329
Epoch:  800/1000, RMSE =  0.159604
Epoch:  900/1000, RMSE =  0.149710
Epoch: 1000/1000, RMSE =  0.143441
 
RMSProp optimizer
----------------------------------
Epoch:  100/1000, RMSE =  0.198733
Epoch:  200/1000, RMSE =  0.167841
Epoch:  300/1000, RMSE =  0.138085
Epoch:  400/1000, RMSE =  0.134165
Epoch:  500/1000, RMSE =  0.132821
Epoch:  600/1000, RMSE =  0.132069
Epoch:  700/1000, RMSE =  0.131591
Epoch:  800/1000, RMSE =  0.131288
Epoch:  900/1000, RMSE =  0.131106
Epoch: 1000/1000, RMSE =  0.130999

This allows us to see how the RMSE evolves during the training.

While doing this, I also discovered a small but serious error in the calculation of xtest. Before the fix, it was evaluated as an integer expression, so xtest values were zeros and ones and no values in between. That means that we were so far only evaluating at these two points, which gave us a false idea about the RMSE.

milancurcic · 2023-06-30T17:42:29Z

I'll continue with the remainder of the review over the weekend.

…ed and initialized

milancurcic · 2023-07-03T21:08:36Z

I've added the necessary data structures in optimizer type definitions to allow keeping track of velocity in case of SGD and RMS gradients in case of RMSProp. The code doesn't work yet--we need to correctly initialize these structures and correctly reference them during each successive minimization calls. The main challenge is that currently we have the minimize method which is an elemental subroutine that works on a per-layer basis. Thus, it's not aware of on which layer it's working on. I'm not yet clear on the best approach but here are some ideas I've been thinking about:

Rather than minimizing at a layer level, we minimize on the network level; We loop over layers and have all the information we need to reference the velocity/RMS values correctly; basically the RMSProp subroutine approach in examples/quadratic.f90.
Pass the layer number/name to optimizer % minimize() to let it know on which layer we are. The subroutine can then correctly address the velocity/RMS arrays.
Possibly the simplest approach: Treat all model parameters and gradients as 1-d arrays, get their size via net % get_num_params(), and update their weights via call net % set_params(). The downside is that getting the gradients from all layers in one place will require a new allocation (maybe, let's see).

milancurcic · 2023-07-06T16:53:51Z

I took the approach 3 above for the time being. We also need layer % get_gradients() and network % get_gradients() methods which are not implemented yet, but they will be essentially the same procedures as layer % get_params() and network % get_params().

Because of velocity in SGD and rms_gradient in RMSProp being allocatable arrays, they can't be used in the arithmetic inside the elemental minimize methods, so minimize is no longer elemental.

Once we have getting the layer gradients implemented and everything is hooked up correctly, it should just work for both SGD (with or without the classic or Nesterov momentum) and RMSProp. Once it does, this huge PR must be merged and we'll make a new minor release.

…lt in optimized mode

milancurcic · 2023-07-10T15:16:05Z

We're almost there. Now we just need to add a test suite for the optimizer classes to ensure that they both converge on a simple problem, e.g. like the one in examples/simple.f90.

Spnetic-5 · 2023-07-10T15:18:21Z

Sure, I'll try to implement test suite

… changed

milancurcic · 2023-07-10T15:25:09Z

@Spnetic-5 Thanks! To clarify: since the default SGD is already being tested implicitly in e.g. examples/test_dense_network.f90 and elsewhere (simple and sine examples), for this test suite let's focus on:

SGD with classic momentum
SGD with momentum + Nesterov
RMSProp
Minimal training example, like the one in examples/simple.f90 which fits 2 constant inputs to 3 constant outputs.

For ideas on how to write the test suite program, follow how other test programs are written.

milancurcic · 2023-07-10T15:27:07Z

BTW, I wrote in d27bf09 that tests pass in debug but segfault in release mode. After I pushed it, both debug and release modes passed in CI. Then I went back and saw that I was working locally with GFortran v9. When I switched to v10, both modes passed tests.

Spnetic-5 · 2023-07-12T03:08:02Z

@milancurcic I have added draft test suite, please confirm if I am on a right track, any suggestions before proceeding further.

Also are we continuing test_suite in the same PR or should I open another explicit PR?

milancurcic · 2023-07-12T15:55:49Z

Thanks @Spnetic-5, yes, the structure of the program is good. I see that you're not actually testing for convergence, I assume because it's on a TODO.

I realized that for optimizers that need to store state (velocity in SGD w/ momentum and RMSProp in general), we actually need to store the optimizer instance as part of the network type so that it can preserve state between successive net % update() calls. Currently we always pass a freshly created optimizer instance, so there is no storing of state. As you're working on the test suite, can you also add a

class(optimizer_base_type), allocatable :: optimizer

member to the network type? And then in train and update we need to store it on first call, but leave it untouched on successive calls.

You can use the allocated(self % optimizer) to check if it's been allocated or not, as a test to determine whether it needs to be set for the first time or not.

milancurcic · 2023-07-12T21:55:55Z

@Spnetic-5 I went ahead and added the optimizer instance as a member of the network as it may not be a trivial thing to do, so you can focus on implementing the convergence tests.

Spnetic-5 · 2023-07-13T02:09:55Z

@Spnetic-5 I went ahead and added the optimizer instance as a member of the network as it may not be a trivial thing to do, so you can focus on implementing the convergence tests.

Thanks @milancurcic, sure I'm working on convergence tests.

milancurcic

Thank you @Spnetic-5; I left a few comments. We're almost there.

test/test_optimizers.f90

milancurcic

Thank you @Spnetic-5, this is now in good shape.

Spnetic-5 added 3 commits June 22, 2023 19:39

Added Momentum and Nesterov modifications

518e9ba

Resolved dummy argument error

3c99216

Changes in update formulas

e531ab3

Spnetic-5 requested a review from milancurcic June 23, 2023 00:00

Corrected formulae, velocity allocation changes

c3ce4ca

Spnetic-5 marked this pull request as ready for review June 23, 2023 21:36

Spnetic-5 and others added 2 commits June 27, 2023 23:41

Added concrete implementation of RMSProp

674e6c8

Report RMSE every 10% of num_epochs; Fix xtest calculation

38b6acd

milancurcic added 2 commits July 3, 2023 16:25

Initialize networks with same weights; larger batch; larger test array

f9b41e6

Start putting RMS and velocity structures in place; yet to be allocat…

6146b75

…ed and initialized

WIP: SGD and RMSprop optimizers plumbing at the network % update level

94f6a82

Spnetic-5 and others added 4 commits July 6, 2023 19:39

Added get_gradients() method (draft)

df756ab

Clean up formatting and docstrings

bff6c4e

Flush gradients to zero; code compiles but segfaults

642b2f2

Set default value for batch_size; tests pass in debug mode but segfau…

d27bf09

…lt in optimized mode

Update learning rates in simple and sine examples because the default…

368a010

… changed

Added draft test suite for optimizers

2465cfa

milancurcic added 2 commits July 12, 2023 17:52

Store the optimizer as a member of the network type

8219286

Don't print to stdout; indentation

f94c218

milancurcic added the enhancement New feature or request label Jul 12, 2023

Added convergence tests

5e23fcd

milancurcic requested changes Jul 13, 2023

View reviewed changes

test/test_optimizers.f90 Outdated Show resolved Hide resolved

test/test_optimizers.f90 Outdated Show resolved Hide resolved

test/test_optimizers.f90 Outdated Show resolved Hide resolved

Resolved comments

92575eb

Spnetic-5 requested a review from milancurcic July 14, 2023 05:49

milancurcic added 4 commits July 14, 2023 10:12

Clean up

b4ee014

Import RMSProp

deb98a6

Remove old code

afaae1a

Add optimizer support notes

18f9aea

milancurcic approved these changes Jul 14, 2023

View reviewed changes

milancurcic merged commit e9bfbd6 into modern-fortran:main Jul 14, 2023

milancurcic mentioned this pull request Jul 14, 2023

Implement momentum and Nesterov modifications in SGD in example/quadratic.f90 #137

Closed

Added Momentum and Nesterov modifications #148

Added Momentum and Nesterov modifications #148

Uh oh!

Conversation

Spnetic-5 commented Jun 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Spnetic-5 commented Jun 23, 2023

Uh oh!

milancurcic commented Jun 30, 2023

Uh oh!

milancurcic commented Jun 30, 2023

Uh oh!

milancurcic commented Jul 3, 2023

Uh oh!

milancurcic commented Jul 6, 2023

Uh oh!

milancurcic commented Jul 10, 2023

Uh oh!

Spnetic-5 commented Jul 10, 2023

Uh oh!

milancurcic commented Jul 10, 2023

Uh oh!

milancurcic commented Jul 10, 2023

Uh oh!

Spnetic-5 commented Jul 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

milancurcic commented Jul 12, 2023

Uh oh!

milancurcic commented Jul 12, 2023

Uh oh!

Spnetic-5 commented Jul 13, 2023

Uh oh!

milancurcic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

milancurcic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Spnetic-5 commented Jun 22, 2023 •

edited

Loading

Spnetic-5 commented Jul 12, 2023 •

edited

Loading