-
Notifications
You must be signed in to change notification settings - Fork 91
Added Momentum and Nesterov modifications #148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@milancurcic Please review the implementation of suggested changes to the code, and I see the difference in MSE of SGD after adding |
@Spnetic-5 I refactored the subroutines in the quadratic_fit example to accept
This allows us to see how the RMSE evolves during the training. While doing this, I also discovered a small but serious error in the calculation of |
I'll continue with the remainder of the review over the weekend. |
I've added the necessary data structures in optimizer type definitions to allow keeping track of velocity in case of SGD and RMS gradients in case of RMSProp. The code doesn't work yet--we need to correctly initialize these structures and correctly reference them during each successive minimization calls. The main challenge is that currently we have the
|
I took the approach 3 above for the time being. We also need Because of Once we have getting the layer gradients implemented and everything is hooked up correctly, it should just work for both SGD (with or without the classic or Nesterov momentum) and RMSProp. Once it does, this huge PR must be merged and we'll make a new minor release. |
We're almost there. Now we just need to add a test suite for the optimizer classes to ensure that they both converge on a simple problem, e.g. like the one in examples/simple.f90. |
Sure, I'll try to implement test suite |
@Spnetic-5 Thanks! To clarify: since the default SGD is already being tested implicitly in e.g. examples/test_dense_network.f90 and elsewhere (simple and sine examples), for this test suite let's focus on:
For ideas on how to write the test suite program, follow how other test programs are written. |
BTW, I wrote in d27bf09 that tests pass in debug but segfault in release mode. After I pushed it, both debug and release modes passed in CI. Then I went back and saw that I was working locally with GFortran v9. When I switched to v10, both modes passed tests. |
@milancurcic I have added draft test suite, please confirm if I am on a right track, any suggestions before proceeding further. Also are we continuing |
Thanks @Spnetic-5, yes, the structure of the program is good. I see that you're not actually testing for convergence, I assume because it's on a TODO. I realized that for optimizers that need to store state (velocity in SGD w/ momentum and RMSProp in general), we actually need to store the optimizer instance as part of the network type so that it can preserve state between successive class(optimizer_base_type), allocatable :: optimizer member to the You can use the |
@Spnetic-5 I went ahead and added the optimizer instance as a member of the network as it may not be a trivial thing to do, so you can focus on implementing the convergence tests. |
Thanks @milancurcic, sure I'm working on convergence tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Spnetic-5; I left a few comments. We're almost there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Spnetic-5, this is now in good shape.
Solves #137