From 6e94e59afd8ff518c4f793b985d66193e0ffdc06 Mon Sep 17 00:00:00 2001
From: Fredrik Bagge Carlson <baggepinnen@gmail.com>
Date: Tue, 3 Dec 2019 15:27:44 +0800
Subject: [PATCH 1/2] Improve docs for decay optimisers

---
 src/optimise/optimisers.jl | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/optimise/optimisers.jl b/src/optimise/optimisers.jl
index c9c40764be..888d308774 100644
--- a/src/optimise/optimisers.jl
+++ b/src/optimise/optimisers.jl
@@ -444,7 +444,8 @@ end
 """
   InvDecay(γ)
 
-Applies inverse time decay to an optimiser
+Applies inverse time decay to an optimiser, i.e., the step effective step size at iteration `n` is `eta / (1 + γ * n)` where `eta` is the initial step size. The wrapped optimisers step size is not modified.
+```
 
 ## Parameters
   - gamma (γ): Defaults to `0.001`
@@ -472,7 +473,7 @@ end
 """
   ExpDecay(eta, decay, decay_step, clip)
 
-Discount the learning rate `eta` by `decay` every `decay_step` till a minimum of `clip`.
+Discount the learning rate `eta` by `decay` every `decay_step` till a minimum of `clip`. The wrapped optimisers step size is being modified by the outer optimiser.
 
 ## Parameters
   - Learning Rate (eta): Defaults to `0.001`.

From e67f09c06d73bc8e0b0702732f63a77eee26e151 Mon Sep 17 00:00:00 2001
From: Fredrik Bagge Carlson <baggepinnen@gmail.com>
Date: Tue, 3 Dec 2019 15:32:23 +0800
Subject: [PATCH 2/2] Correct some comments in decay docs

---
 src/optimise/optimisers.jl | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/optimise/optimisers.jl b/src/optimise/optimisers.jl
index 888d308774..fb3b9fc534 100644
--- a/src/optimise/optimisers.jl
+++ b/src/optimise/optimisers.jl
@@ -444,7 +444,7 @@ end
 """
   InvDecay(γ)
 
-Applies inverse time decay to an optimiser, i.e., the step effective step size at iteration `n` is `eta / (1 + γ * n)` where `eta` is the initial step size. The wrapped optimisers step size is not modified.
+Applies inverse time decay to an optimiser, i.e., the effective step size at iteration `n` is `eta / (1 + γ * n)` where `eta` is the initial step size. The wrapped optimiser's step size is not modified.
 ```
 
 ## Parameters
@@ -473,7 +473,7 @@ end
 """
   ExpDecay(eta, decay, decay_step, clip)
 
-Discount the learning rate `eta` by `decay` every `decay_step` till a minimum of `clip`. The wrapped optimisers step size is being modified by the outer optimiser.
+Discount the learning rate `eta` by a multiplicative factor `decay` every `decay_step` till a minimum of `clip`.
 
 ## Parameters
   - Learning Rate (eta): Defaults to `0.001`.