Skip to content

Commit

Permalink
use sphinx references instead of hardcoded links.
Browse files Browse the repository at this point in the history
several advantages:

  * looks nicer
  * doesn't break when the links change (and warns if the endpoint changes)
  * references the current version of the documentation (e.g., local links
    when building locally, and the latest version when building on readthedocs etc.)
  • Loading branch information
fabianp committed Oct 9, 2024
1 parent 7f9bc71 commit eefb79a
Show file tree
Hide file tree
Showing 4 changed files with 11 additions and 11 deletions.
8 changes: 4 additions & 4 deletions examples/gradient_accumulation.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
"\n",
"One example where this is useful is to simulate training with a larger batch size than would fit into the available device memory. Another example is in the context of multi-task learning, where batches for different tasks may be visited in a round-robin fashion. Gradient accumulation makes it possible to simulate training on one large batch containing all of the tasks together.\n",
"\n",
"In this example, we give an example of implementing gradient accumulation using `optax.MultiSteps`. We start by bringing in some imports and defining some type annotations."
"In this example, we give an example of implementing gradient accumulation using {py:func}`optax.MultiSteps`. We start by bringing in some imports and defining some type annotations."
]
},
{
Expand Down Expand Up @@ -118,7 +118,7 @@
" optimizer: optax.GradientTransformation,\n",
" params: optax.Params,\n",
" batches: Iterable[dict[str, jnp.ndarray]],\n",
") -\u003e optax.Params:\n",
") -> optax.Params:\n",
" \"\"\"Executes a train loop over the train batches using the given optimizer.\"\"\"\n",
"\n",
" train_step = build_train_step(optimizer)\n",
Expand Down Expand Up @@ -243,11 +243,11 @@
"id": "Ub0GHPvvhIKI"
},
"source": [
"## Interaction of `optax.MultiStep` with schedules.\n",
"## Interaction of {py:func}`optax.MultiStep` with schedules.\n",
"\n",
"The snippet below is identical to the snippet above, except we additionally introduce a learning rate schedule. As above, the second call to `fit` is using gradient accumulation. Similarly to before, we find that both train loops compute compute identical outputs (up to numerical errors).\n",
"\n",
"This happens because the learning rate schedule in `optax.MultiStep` is only updated once for each of the _outer_ steps. In particular, the state of the inner optimizer is only updated each time `every_k_schedule` optimizer steps have been taken."
"This happens because the learning rate schedule in {py:func}`optax.MultiStep` is only updated once for each of the _outer_ steps. In particular, the state of the inner optimizer is only updated each time `every_k_schedule` optimizer steps have been taken."
]
},
{
Expand Down
10 changes: 5 additions & 5 deletions examples/lbfgs.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@
"source": [
"### Using L-BFGS as a gradient transformation\n",
"\n",
"The function [optax.scale_by_lbfgs](https://optax.readthedocs.io/en/latest/api/transformations.html#optax.scale_by_lbfgs) implements the update of the preconditioning matrix given a running optimizer state $s_k$. Given $(g_k, s_k, w_k)$, this function returns $(P_kg_k, s_{k+1})$. We illustrate its performance below on a simple convex quadratic."
"The function {py:func}`optax.scale_by_lbfgs` implements the update of the preconditioning matrix given a running optimizer state $s_k$. Given $(g_k, s_k, w_k)$, this function returns $(P_kg_k, s_{k+1})$. We illustrate its performance below on a simple convex quadratic."
]
},
{
Expand Down Expand Up @@ -147,7 +147,7 @@
"where $c_1$ is some constant set to $10^{-4}$ by default. Consider for example the update direction to be $u_k = -g_k$, i.e., moving along the negative gradient direction. In that case the criterion above reduces to $f(w_k - \\eta_k g_k) \\leq f(w_k) - c_1 \\eta_k ||g_k||_2^2$. The criterion amounts then to choosing the stepsize such that it decreases the objective by an amount proportional to the squared gradient norm.\n",
"\n",
"As long as the update direction is a *descent direction*, that is, $\\langle u_k, g_k\\rangle < 0$ the above criterion is guaranteed to be satisfied by some sufficiently small stepsize.\n",
"A simple linesearch technique to ensure a sufficient decrease is then to decrease a candidate stepsize by a constant factor up until the criterion is satisfied. This amounts to the backtracking linesearch implemented in [optax.scale_by_backtracking_linesearch](https://optax.readthedocs.io/en/latest/api/transformations.html#optax.scale_by_backtracking_linesearch) and briefly reviewed below.\n",
"A simple linesearch technique to ensure a sufficient decrease is then to decrease a candidate stepsize by a constant factor up until the criterion is satisfied. This amounts to the backtracking linesearch implemented in {py:func}`optax.scale_by_backtracking_linesearch` and briefly reviewed below.\n",
"\n",
"#### Small curvature (Strong wolfe criterion)\n",
"\n",
Expand All @@ -158,7 +158,7 @@
"\\leq |\\langle \\nabla f(w_k), u_k\\rangle|.\n",
"$$\n",
"\n",
"See Chapter 3 of [Nocedal and Wright, Numerical Optimization, 1999](https://www.math.uci.edu/~qnie/Publications/NumericalOptimization.pdf) for some illustrations of this criterion. A linesearch method that can ensure both criterions require some form of bisection method implemented in optax with the [optax.scale_by_zoom_linesearch](https://optax.readthedocs.io/en/latest/api/transformations.html#optax.scale_by_zoom_linesearch) method. Several other linesearch techniques exist, see e.g. https://github.com/JuliaNLSolvers/LineSearches.jl. It is generally recommended to combine L-BFGS with a line-search ensuring both sufficient decrease and small curvature, which the [optax.scale_by_zoom_linesearch](https://optax.readthedocs.io/en/latest/api/transformations.html#optax.scale_by_zoom_linesearch) ensures.\n",
"See Chapter 3 of [Nocedal and Wright, Numerical Optimization, 1999](https://www.math.uci.edu/~qnie/Publications/NumericalOptimization.pdf) for some illustrations of this criterion. A linesearch method that can ensure both criterions require some form of bisection method implemented in optax with the {py:func}`optax.scale_by_zoom_linesearch` method. Several other linesearch techniques exist, see e.g. https://github.com/JuliaNLSolvers/LineSearches.jl. It is generally recommended to combine L-BFGS with a line-search ensuring both sufficient decrease and small curvature, which the {py:func}`optax.scale_by_zoom_linesearch` ensures.\n",
"\n"
]
},
Expand All @@ -170,7 +170,7 @@
"source": [
"### Linesearches in practice\n",
"\n",
"To find a stepsize satisfying the above criterions, a linesearch needs to access the value and potentially the gradient of the function. So linesearches in optax are implemented as [optax.GradientTransformationExtraArgs](https://optax.readthedocs.io/en/latest/api/transformations.html#optax.GradientTransformationExtraArgs), which take the current value, gradient of the objective as well as the function itself. We illustrate this below with [optax.scale_by_backtracking_linesearch](https://optax.readthedocs.io/en/latest/api/transformations.html#optax.scale_by_backtracking_linesearch)."
"To find a stepsize satisfying the above criterions, a linesearch needs to access the value and potentially the gradient of the function. So linesearches in optax are implemented as {py:func}`optax.GradientTransformationExtraArgs`, which take the current value, gradient of the objective as well as the function itself. We illustrate this below with {py:func}`optax.scale_by_backtracking_linesearch`."
]
},
{
Expand Down Expand Up @@ -527,7 +527,7 @@
},
"source": [
"By simply taking a maximum of 50 steps of the linesearch instead of 15, we ensured that the first stepsize taken provided a sufficient decrease and the solver worked well.\n",
"Additional debugging information can be found in the source code accessible from the docs of [optax.scale_by_zoom_linesearch](https://optax.readthedocs.io/en/latest/api/transformations.html#optax.scale_by_zoom_linesearch)."
"Additional debugging information can be found in the source code accessible from the docs of {py:func}`optax.scale_by_zoom_linesearch`."
]
}
],
Expand Down
2 changes: 1 addition & 1 deletion examples/linear_assignment_problem.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,7 @@
"id": "f0db2563-131f-4f5d-8bbd-91e49c6457f9",
"metadata": {},
"source": [
"To solve the problem, we call `optax.assignment.hungarian_algorithm` on the cost matrix."
"To solve the problem, we call {py:func}`optax.assignment.hungarian_algorithm` on the cost matrix."
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion examples/lookahead_mnist.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
"\n",
"[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.sandbox.google.com/github/google-deepmind/optax/blob/main/examples/lookahead_mnist.ipynb)\n",
"\n",
"This notebook trains a simple Convolution Neural Network (CNN) for hand-written digit recognition (MNIST dataset) using the [Lookahead optimizer](https://arxiv.org/pdf/1907.08610v1.pdf)."
"This notebook trains a simple Convolution Neural Network (CNN) for hand-written digit recognition (MNIST dataset) using {py:func}`optax.lookahead`."
]
},
{
Expand Down

0 comments on commit eefb79a

Please sign in to comment.