Skip to content

Don't use state anywhere for the whole state tree #136

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 12, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,13 +45,13 @@ image = rand(Float32, 224, 224, 3, 1) |> gpu; # dummy data
@show sum(model(image)); # dummy loss function

rule = Optimisers.Adam() # use the Adam optimiser with its default settings
state = Optimisers.setup(rule, model); # initialise this optimiser's momentum etc.
state_tree = Optimisers.setup(rule, model); # initialise this optimiser's momentum etc.

∇model, _ = gradient(model, image) do m, x # calculate the gradients
sum(m(x))
end;

state, model = Optimisers.update(state, model, ∇model);
state_tree, model = Optimisers.update(state_tree, model, ∇model);
@show sum(model(image)); # reduced

```
Expand All @@ -60,7 +60,7 @@ Notice that a completely new instance of the model is returned. Internally, this
is handled by [Functors.jl](https://fluxml.ai/Functors.jl), where we do a walk over the
tree formed by the model and update the parameters using the gradients.

There is also [`Optimisers.update!`](@ref) which similarly returns a new model and new state,
There is also [`Optimisers.update!`](@ref) which similarly returns a new model,
but is free to mutate arrays within the old one for efficiency.
(The method of `apply!` above is likewise free to mutate arrays within its state;
they are defensively copied when this rule is used with `update`.)
Expand Down
10 changes: 6 additions & 4 deletions src/Optimisers.jl
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ init
###

"""
Optimisers.setup(rule, model) -> tree
Optimisers.setup(rule, model) -> state_tree

Initialises the given optimiser for every trainable parameter within the model.
Returns a tree of the relevant states, which must be passed to [`update`](@ref)
Expand Down Expand Up @@ -141,6 +141,7 @@ This is used in exactly the same manner as [`update`](@ref), but because it may
arrays within the old model (and the old state), it will be faster for models of ordinary
`Array`s or `CuArray`s. However, you should not rely on the old model being fully updated
but rather use the returned model.
(The original state tree is always mutated, as each `Leaf` is mutable.)

# Example

Expand All @@ -149,9 +150,10 @@ julia> using StaticArrays, Zygote, Optimisers

julia> m = (x = [1f0, 2f0], y = SA[4f0, 5f0]); # partly mutable model

julia> t = Optimisers.setup(Momentum(1/30, 0.9), m);
julia> t = Optimisers.setup(Momentum(1/30, 0.9), m) # tree of states
(x = Leaf(Momentum{Float64}(0.0333333, 0.9), Float32[0.0, 0.0]), y = Leaf(Momentum{Float64}(0.0333333, 0.9), Float32[0.0, 0.0]))

julia> g = gradient(m -> sum(abs2.(m.x .+ m.y)), m)[1]
julia> g = gradient(m -> sum(abs2.(m.x .+ m.y)), m)[1] # structural gradient
(x = Float32[10.0, 14.0], y = Float32[10.0, 14.0])

julia> t2, m2 = Optimisers.update!(t, m, g);
Expand All @@ -165,7 +167,7 @@ true
julia> m # original should be discarded, may be mutated but no guarantee
(x = Float32[0.6666666, 1.5333333], y = Float32[4.0, 5.0])

julia> t == t2 # original state is in fact guaranteed to be mutated
julia> t == t2 # original state tree is guaranteed to be mutated
true
```
"""
Expand Down