This repository contains a superset of the second order optimization code associtated with the paper:
The code is generic so that it can be used in any desired optimization application. The ptychography specific application is available in https://github.com/saugatkandel/ptychoSampling
Tensorflow reverse-mode optimization routines that use a damped Generalized Gauss-Newton matrix. The methods included are:
- Levenberg-Marquardt with projected gradient for convex constraints
- Curveball
- Nonlinear conjugate gradient (PR)
- Backtracking and adaptive linesearch methods
- An interface to the scipy optimizer class for gradient based or full Newton-type algorithms.
Examples for all of these can be found in sopt/tests/tensorflow2
(https://github.com/saugatkandel/sopt/tree/master/sopt/tests/tensorflow2)
Basics:
We can write an optimization problem with m parameters and n data points as a composition of the "model"
and the "loss":
The generalized Gauss-Newton matrix then takes the form
with J is the Jacobian for the function f, and H the hessian for L.
Notes:
-
We can either manually supply the hessian of the loss function (if the hessian is a diagonal matrix), or have the optimizers calculate the hessian-vector products internally. By default, the hvps are calculated internally. For least-squares loss functions, the hessian of the loss function L is simply an identity matrix. In this case, we can simply supply the parameter
_diag_hessian_fn_ = lambda x: 1.0
to save some (miniscule) computational effort. -
When the input variable is not 1-D, it is difficult to keep track of the correct shapes for the various matrix-vector products. While this is certainly doable, I am avoiding this extra work for now by assuming that the input variable and predicted and measured data are all 1-D. It is simpler to just reshape the variable as necessary for any other calculation/analysis.
-
For now, the optimizers support only a single variable, not a list of variables. If I want to use a list of variables, I would either create separate optimizers for each and use an alternating minimization algorithm, or use a single giant concatenated variable that contains the desired variable.
-
The optimizers require callable functions, instead of just the tensors, to calculate the predicted data and the loss value.
-
To see example usages, check the tests module.
Warning: deprecated:
- the Autograd code.
- the Tensorflow 1.x codes
- the examples and the benchmarks modules.
Todo:
- Consistent naming for
loss
, andobjective
.
References:
- https://j-towns.github.io/2017/06/12/A-new-trick.html ("trick" for reverse-mode jacobian-vector product calculations)
- https://arxiv.org/abs/1805.08095 (theory for the Curveball method)
- https://github.com/jotaf98/curveball (a mixed-mode (forward + reverse) implementation of Curveball)
- https://arxiv.org/pdf/1201.5885.pdf (for the LM diagonal scaling)
- Martens, J. (2016). Second-Order Optimization for Neural Networks. U. of Toronto Thesis, 179. (For preconditioning)
- the scipy lm code (for the LM xtol and gtol conditions)
- the manopt package (for the linesearches)
- Numerial optimization by Nocedal and Wright. (the LM, CG, and linesearch codes all rely on this)