Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: basic sketch of scheduling #15

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 67 additions & 0 deletions src/schedulers.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# f(t, p) = (2 / π) * abs(asin(sin(π * (t - 1) / p)))
# sine(t, p = 1.) = (2 / π) * abs(asin(sin(π * (t - 1) / p)))

# Simple scheduling can happen as a basic closure.
triangle(t) = (1 - 2 * abs(round(Int, t/2) - t/2))

mutable struct Schedule{O,F}
f::F
opt::O
cursor::Float32
DhairyaLGandhi marked this conversation as resolved.
Show resolved Hide resolved
end

"""
Schedule(f, opt)

Create a scheduled optimiser whose update rule is controlled by `f`.

`f` can be any callable, while `opt` can be any optimiser.

See also [next](@ref), [init](@ref)
"""
function Schedule(f, opt)
Schedule(f, opt, 1.f0 + eps(Float32))
end


# Timestep: t - always depends on run
# Phase: p - mostly likely constant

# What we want to "schedule" - everything (so will have to let people mess with things)
# Most likely learning rate
# If `f` relies on only time - it can be generated on the fly

"""
next(s::Schedule, state)

Returns a new optimiser as described by the scheduler function
and the optimiser. This allocates a new optimiser on the stack, keeping the original one intact.

The state is a tuple of the state as defined by the scheduling function.
"""
function next(s::Schedule{O}, (cursor, cursor_step)) where O
cursor += cursor_step
ADAM(s.opt, eta = s.f(cursor) * s.opt.eta) #replace with O(..)
end
DhairyaLGandhi marked this conversation as resolved.
Show resolved Hide resolved

init(f, x) = (1.f0, 0.1f0)
init(s::Schedule, x) = (init(s.f, x), init(s.opt, x))

function apply(s::Schedule, x, dx, st)
schedst, optst = st
cursor, cursor_step = schedst
o = next(s, schedst)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of next here, I would suggest the following:

struct Schedule{F, O}
  schedule::F
  opt::O
end

# usage example 1
Schedule(f, eta -> Momentum(eta, 0.9))

# usage example 2
Schedule(f, (o, eta) -> Momentum(eta, o.rho))

Then in apply:

# ...
o = s.opt(s.schedule(cursor))
# ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I like the simplicity of the apply, having the function there seems verbose. You would also have to return things more carefully from the schedule to not get cryptic method errors.

next maybe is poorly worded? We want to segregate the steps of generating a new optimiser to update the fields and state from the step of applying the scheduler.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I not sure what you mean by the method errors. Could you elaborate?

We want to segregate the steps of generating a new optimiser to update the fields and state from the step of applying the scheduler.

I agree, but what I am suggesting is that instead of next which restricts the step of generating a new optimizer to modify LR only, we use an anonymous function for this step. Verbosity can be avoided by defining:

Schedule(f, o::Momentum) = Schedule(f, eta -> Momentum(eta, o.rho))

It would be just as succinct as the current API in the case of LR, as well as allow someone to schedule something other than the LR if necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it restrict it to lr?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could give it any field to modify, this is an example. I think it's fair to ask for a shorter syntax than overloading next though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next specifically updates opt.eta? You could of course add an argument to next to allow any field, then it is a question of whether we prefer the closure way of specifying the field to update or the symbol way. I think the closure is much more clear, doesn't require the user to look up the struct field names, and less opaque.

Δ, optst2 = apply(o, x, dx, optst)
Δ, ((cursor .+ cursor_step, cursor_step), optst2)
end

struct InvDecay{T}
decay::T
end

init(inv::InvDecay, x) = (1, 1)
(inv::InvDecay)(t) = 1 / (1 + inv.decay * t)

InvDecay(s = 0.1f0) = InvDecay{typeof(s)}(s)

sine(p) = t -> sin(Float32(π) * t / p)