Apply optimizer to model weights without data copy #222
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Alternative approach to #184
Currently implemented only for dense layer so a lot of other stuff is not working. We now send the pointers to each layer's weights and biases to the optimizer where they are updated in-place.
Since this approach runs the optimizer on a layer-by-layer basis, optimizer memory such as velocity, rms gradients, etc. are not implemented yet, as they originally assumed whole-network update. Additional bookkeeping is needed there. A possible approach to this bookkeeping is to require the caller of
optimizer % minimize()
to pass explicit start and end indices over which the optimizer will run. This may seem tedious but helper functions can make it easy.On MNIST training (examples/dense_mnist), the difference is only ~1% in run time. However, running the profiler on both the main branch
dense_mnist
and this branch'sdense_mnist
shows that the data copies (mostly inget_params
; two redundant copies of all model parameters happening there) are now gone:main
branchdense_mnist
:This PR
dense_mnist
:The above runs were compiled using gfortran-14.2.0 and
-pg -Ofast
flags.