use sphinx references instead of hardcoded links.

several advantages: * looks nicer * doesn't break when the links change (and warns if the endpoint changes) * references the current version of the documentation (e.g., local links when building locally, and the latest version when building on readthedocs etc.)
google-deepmind · Oct 9, 2024 · eefb79a · eefb79a
1 parent 7f9bc71
commit eefb79a
Show file tree

Hide file tree

Showing 4 changed files with 11 additions and 11 deletions.
diff --git a/examples/gradient_accumulation.ipynb b/examples/gradient_accumulation.ipynb
@@ -21,7 +21,7 @@
         "\n",
         "One example where this is useful is to simulate training with a larger batch size than would fit into the available device memory. Another example is in the context of multi-task learning, where batches for different tasks may be visited in a round-robin fashion. Gradient accumulation makes it possible to simulate training on one large batch containing all of the tasks together.\n",
         "\n",
-        "In this example, we give an example of implementing gradient accumulation using `optax.MultiSteps`. We start by bringing in some imports and defining some type annotations."
+        "In this example, we give an example of implementing gradient accumulation using {py:func}`optax.MultiSteps`. We start by bringing in some imports and defining some type annotations."
       ]
     },
     {
@@ -118,7 +118,7 @@
         "    optimizer: optax.GradientTransformation,\n",
         "    params: optax.Params,\n",
         "    batches: Iterable[dict[str, jnp.ndarray]],\n",
-        ") -\u003e optax.Params:\n",
+        ") -> optax.Params:\n",
         "  \"\"\"Executes a train loop over the train batches using the given optimizer.\"\"\"\n",
         "\n",
         "  train_step = build_train_step(optimizer)\n",
@@ -243,11 +243,11 @@
         "id": "Ub0GHPvvhIKI"
       },
       "source": [
-        "## Interaction of `optax.MultiStep` with schedules.\n",
+        "## Interaction of {py:func}`optax.MultiStep` with schedules.\n",
         "\n",
         "The snippet below is identical to the snippet above, except we additionally introduce a learning rate schedule. As above, the second call to `fit` is using gradient accumulation. Similarly to before, we find that both train loops compute compute identical outputs (up to numerical errors).\n",
         "\n",
-        "This happens because the learning rate schedule in `optax.MultiStep` is only updated once for each of the _outer_ steps. In particular, the state of the inner optimizer is only updated each time `every_k_schedule` optimizer steps have been taken."
+        "This happens because the learning rate schedule in {py:func}`optax.MultiStep` is only updated once for each of the _outer_ steps. In particular, the state of the inner optimizer is only updated each time `every_k_schedule` optimizer steps have been taken."
       ]
     },
     {

diff --git a/examples/lbfgs.ipynb b/examples/lbfgs.ipynb
@@ -75,7 +75,7 @@
       "source": [
         "### Using L-BFGS as a gradient transformation\n",
         "\n",
-        "The function [optax.scale_by_lbfgs](https://optax.readthedocs.io/en/latest/api/transformations.html#optax.scale_by_lbfgs) implements the update of the preconditioning matrix given a running optimizer state $s_k$. Given $(g_k, s_k, w_k)$, this function returns $(P_kg_k, s_{k+1})$. We illustrate its performance below on a simple convex quadratic."
+        "The function {py:func}`optax.scale_by_lbfgs` implements the update of the preconditioning matrix given a running optimizer state $s_k$. Given $(g_k, s_k, w_k)$, this function returns $(P_kg_k, s_{k+1})$. We illustrate its performance below on a simple convex quadratic."
       ]
     },
     {
@@ -147,7 +147,7 @@
         "where $c_1$ is some constant set to $10^{-4}$ by default. Consider for example the update direction to be $u_k = -g_k$, i.e., moving along the negative gradient direction. In that case the criterion above reduces to $f(w_k - \\eta_k g_k) \\leq f(w_k) - c_1 \\eta_k ||g_k||_2^2$. The criterion amounts then to choosing the stepsize such that it decreases the objective by an amount proportional to the squared gradient norm.\n",
         "\n",
         "As long as the update direction is a *descent direction*, that is, $\\langle u_k, g_k\\rangle < 0$ the above criterion is guaranteed to be satisfied by some sufficiently small stepsize.\n",
-        "A simple linesearch technique to ensure a sufficient decrease is then to decrease a candidate stepsize by a constant factor up until the criterion is satisfied. This amounts to the backtracking linesearch implemented in [optax.scale_by_backtracking_linesearch](https://optax.readthedocs.io/en/latest/api/transformations.html#optax.scale_by_backtracking_linesearch) and briefly reviewed below.\n",
+        "A simple linesearch technique to ensure a sufficient decrease is then to decrease a candidate stepsize by a constant factor up until the criterion is satisfied. This amounts to the backtracking linesearch implemented in {py:func}`optax.scale_by_backtracking_linesearch` and briefly reviewed below.\n",
         "\n",
         "#### Small curvature (Strong wolfe criterion)\n",
         "\n",
@@ -158,7 +158,7 @@
         "\\leq |\\langle \\nabla f(w_k), u_k\\rangle|.\n",
         "$$\n",
         "\n",
-        "See Chapter 3 of [Nocedal and Wright, Numerical Optimization, 1999](https://www.math.uci.edu/~qnie/Publications/NumericalOptimization.pdf) for some illustrations of this criterion. A linesearch method that can ensure both criterions require some form of bisection method implemented in optax with the [optax.scale_by_zoom_linesearch](https://optax.readthedocs.io/en/latest/api/transformations.html#optax.scale_by_zoom_linesearch) method. Several other linesearch techniques exist, see e.g. https://github.com/JuliaNLSolvers/LineSearches.jl. It is generally recommended to combine L-BFGS with a line-search ensuring both sufficient decrease and small curvature, which the [optax.scale_by_zoom_linesearch](https://optax.readthedocs.io/en/latest/api/transformations.html#optax.scale_by_zoom_linesearch) ensures.\n",
+        "See Chapter 3 of [Nocedal and Wright, Numerical Optimization, 1999](https://www.math.uci.edu/~qnie/Publications/NumericalOptimization.pdf) for some illustrations of this criterion. A linesearch method that can ensure both criterions require some form of bisection method implemented in optax with the {py:func}`optax.scale_by_zoom_linesearch` method. Several other linesearch techniques exist, see e.g. https://github.com/JuliaNLSolvers/LineSearches.jl. It is generally recommended to combine L-BFGS with a line-search ensuring both sufficient decrease and small curvature, which the {py:func}`optax.scale_by_zoom_linesearch` ensures.\n",
         "\n"
       ]
     },
@@ -170,7 +170,7 @@
       "source": [
         "### Linesearches in practice\n",
         "\n",
-        "To find a stepsize satisfying the above criterions, a linesearch needs to access the value and potentially the gradient of the function. So linesearches in optax are implemented as [optax.GradientTransformationExtraArgs](https://optax.readthedocs.io/en/latest/api/transformations.html#optax.GradientTransformationExtraArgs), which take the current value, gradient of the objective as well as the function itself. We illustrate this below with [optax.scale_by_backtracking_linesearch](https://optax.readthedocs.io/en/latest/api/transformations.html#optax.scale_by_backtracking_linesearch)."
+        "To find a stepsize satisfying the above criterions, a linesearch needs to access the value and potentially the gradient of the function. So linesearches in optax are implemented as {py:func}`optax.GradientTransformationExtraArgs`, which take the current value, gradient of the objective as well as the function itself. We illustrate this below with {py:func}`optax.scale_by_backtracking_linesearch`."
       ]
     },
     {
@@ -527,7 +527,7 @@
       },
       "source": [
         "By simply taking a maximum of 50 steps of the linesearch instead of 15, we ensured that the first stepsize taken provided a sufficient decrease and the solver worked well.\n",
-        "Additional debugging information can be found in the source code accessible from the docs of [optax.scale_by_zoom_linesearch](https://optax.readthedocs.io/en/latest/api/transformations.html#optax.scale_by_zoom_linesearch)."
+        "Additional debugging information can be found in the source code accessible from the docs of {py:func}`optax.scale_by_zoom_linesearch`."
       ]
     }
   ],

diff --git a/examples/linear_assignment_problem.ipynb b/examples/linear_assignment_problem.ipynb
@@ -177,7 +177,7 @@
    "id": "f0db2563-131f-4f5d-8bbd-91e49c6457f9",
    "metadata": {},
    "source": [
-    "To solve the problem, we call `optax.assignment.hungarian_algorithm` on the cost matrix."
+    "To solve the problem, we call {py:func}`optax.assignment.hungarian_algorithm` on the cost matrix."
    ]
   },
   {

diff --git a/examples/lookahead_mnist.ipynb b/examples/lookahead_mnist.ipynb
@@ -10,7 +10,7 @@
         "\n",
         "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.sandbox.google.com/github/google-deepmind/optax/blob/main/examples/lookahead_mnist.ipynb)\n",
         "\n",
-        "This notebook trains a simple Convolution Neural Network (CNN) for hand-written digit recognition (MNIST dataset) using the [Lookahead optimizer](https://arxiv.org/pdf/1907.08610v1.pdf)."
+        "This notebook trains a simple Convolution Neural Network (CNN) for hand-written digit recognition (MNIST dataset) using {py:func}`optax.lookahead`."
       ]
     },
     {
-Original file line number
+Diff line change
@@ Expand Up / @@ -10,7 +10,7 @@ @@
             "\n",
             "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.sandbox.google.com/github/google-deepmind/optax/blob/main/examples/lookahead_mnist.ipynb)\n",
             "\n",
-            "This notebook trains a simple Convolution Neural Network (CNN) for hand-written digit recognition (MNIST dataset) using the [Lookahead optimizer](https://arxiv.org/pdf/1907.08610v1.pdf)."
+            "This notebook trains a simple Convolution Neural Network (CNN) for hand-written digit recognition (MNIST dataset) using {py:func}`optax.lookahead`."
           ]
         },
         {
@@ Expand Down @@