Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: basic sketch of scheduling #15

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

RFC: basic sketch of scheduling #15

wants to merge 6 commits into from

Conversation

DhairyaLGandhi
Copy link
Member

@darsnack (whose ParametersSchedule.jl has implemented a bunch of these and I tried to understand the different design choices of those) one concern that I do have is that we don't have much of a view in the running stats of a training loop in this abstraction.

We also want to generally eliminate a state as part of the scheduler, which adds an extra line to the apply function. We generally have a decent view of how to implement a bunch of different routines with this. It allows a few different kinds of callables, and there's usually little boilerplate.

It would look something like

Schedule(InvDecay(0.5f0), ADAM())

Or

Schedule(sine(0.5f0), opt)

We would also want to think of standard schedules to introduce. We can customise working with apply and next which is a clean separation of concern with this.

@DhairyaLGandhi
Copy link
Member Author

While going through PS, I felt there was some extra boilerplate needed for simple utilities, so I want to keep the API minimal. I think it's fair to say that if you want to do more than mess with the learning rate, you'll need to overload next but otherwise, a function/ closure/ callable is standard and self explanatory about what different parts do

Copy link
Member

@darsnack darsnack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this approach looks good to me in general (it is what I would call the "scheduler"). AFAICT in general it follows a similar design to Scheduler from the original FluxML/Flux.jl#1506. So, I am totally cool with the design proposed here. I left some more minor comments.

Just for clarity, I think there is a misconception about what ParameterSchedulers.jl does. None of the functionality in this PR is intended to be part of ParameterSchedulers.jl. You can think of ParameterSchedulers.jl as just providing standard f to pass to Schedule.

src/schedulers.jl Show resolved Hide resolved
src/schedulers.jl Outdated Show resolved Hide resolved
function apply(s::Schedule, x, dx, st)
schedst, optst = st
cursor, cursor_step = schedst
o = next(s, schedst)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of next here, I would suggest the following:

struct Schedule{F, O}
  schedule::F
  opt::O
end

# usage example 1
Schedule(f, eta -> Momentum(eta, 0.9))

# usage example 2
Schedule(f, (o, eta) -> Momentum(eta, o.rho))

Then in apply:

# ...
o = s.opt(s.schedule(cursor))
# ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I like the simplicity of the apply, having the function there seems verbose. You would also have to return things more carefully from the schedule to not get cryptic method errors.

next maybe is poorly worded? We want to segregate the steps of generating a new optimiser to update the fields and state from the step of applying the scheduler.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I not sure what you mean by the method errors. Could you elaborate?

We want to segregate the steps of generating a new optimiser to update the fields and state from the step of applying the scheduler.

I agree, but what I am suggesting is that instead of next which restricts the step of generating a new optimizer to modify LR only, we use an anonymous function for this step. Verbosity can be avoided by defining:

Schedule(f, o::Momentum) = Schedule(f, eta -> Momentum(eta, o.rho))

It would be just as succinct as the current API in the case of LR, as well as allow someone to schedule something other than the LR if necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it restrict it to lr?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could give it any field to modify, this is an example. I think it's fair to ask for a shorter syntax than overloading next though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next specifically updates opt.eta? You could of course add an argument to next to allow any field, then it is a question of whether we prefer the closure way of specifying the field to update or the symbol way. I think the closure is much more clear, doesn't require the user to look up the struct field names, and less opaque.

@darsnack
Copy link
Member

darsnack commented Feb 16, 2021

While going through PS, I felt there was some extra boilerplate needed for simple utilities, so I want to keep the API minimal. I think it's fair to say that if you want to do more than mess with the learning rate, you'll need to overload next but otherwise, a function/ closure/ callable is standard and self explanatory about what different parts do

I think there is some confusion. Like I mention above, ParameterSchedulers.jl is not meant to provide the functionality in this PR. It provides various standard fs that can be used with this PR. I have no problems with the design of this PR, since it is pretty much the design I've been trying to push.

The main difference between appears to be in defining f itself. Here is the example from this PR vs. the same implementation in PS:

### Approach outline in this PR

struct InvDecay{T}
   decay::T
end

# (start, step) -> consider step to be a function/ vector of step sizes?
init(inv::InvDecay, x) = (1, 1)
(inv::InvDecay)(t) = 1 / (1 + inv.decay * t)

### Approach in ParameterSchedulers

struct Inv{T} <: AbstractSchedule
    decay::T
end

Base.getindex(inv::Inv, t) = 1 / (1 + inv.decay * t)

As you can see, there isn't really much of a difference, so I don't see the point about boilerplate. But this PR treats f as adhering to the optimizer interface. There are a couple reasons that I don't like this:

  • There is the extra boilerplate init(inv, x) that needs to be defined
  • Schedules only implement init but not apply? Looks like abusing the optimizer interface to define scheduling policies to me.

I would suggest looking at the interface requirements in https://darsnack.github.io/ParameterSchedulers.jl/dev/docs/interfaces/generic.html. It is exactly two functions and quite flexible. Implementing a new schedule requires no more boilerplate than what's being proposed here, and you unlock a lot more functionality/reuse than the approach outlined here.

Take for example if my schedule function was f(t) = t^3 + 0.5 * t^2 - 0.1. Now I want to apply that schedule periodically every 10 iterations/epochs and after 100 epochs, I want to switch to an exponential decay. Obviously, I am pushing this example to the extreme to highlight the differences. In this approach, it would require a lot of boilerplate to support this schedule. In ParameterSchedulers.jl, I could just do:

s = Sequence(schedules = [Loop(Lambda(f = t -> t^3 + 0.5 * t^2 - 0.1), period = 10), Exp(0.1)],
             step_sizes = [100, nepochs - 100])

In other words, there is no boilerplate required (you don't even need to define a new struct). And more importantly, you could use the s above with this PR (or a modified version with my review comments):

opt = Momentum()
scheduler = Schedule(s, opt)

PLUS you can still do

for (eta, epoch) in zip(s, 1:nepochs)
  # ...
end

and others. It's not impossible to do the same stuff with this PR, but it requires more boilerplate. You could reduce some of that boilerplate, but I feel it will still be more than ParameterSchedulers.jl because it uses the optimizers interface which isn't meant for scheduling policies.

On Slack, there was some concern over the abstract types in ParameterSchedulers.jl. I agree it is better to avoid abstract types just for the sake of defining a type hierarchy. But that's not their role in ParameterSchedulers.jl. They are actually used for dispatch to reduce boilerplate code. I can look into whether they can be eliminated. I would prefer no abstract types, but I think they are required.

@darsnack
Copy link
Member

Also mentioned on Slack, how do implement time step state consistently for a group of parameters?

There is no need to worry about this. As long as we define init(s::Schedule, x::AbstractArray), state(s, xs) should take care of xs that are weird collections. And correspondingly for Params specifically, #1481 from Flux.jl should define init(o, xs::Params) = [init(o, p) for x in xs]. Either way, my point is that we do not need to worry about this for Schedule specifically. The time step state is just like the state of any other optimizer.

@DhairyaLGandhi
Copy link
Member Author

DhairyaLGandhi commented Feb 16, 2021

Maybe something is misunderstood. This is currently a minimal sketch. We would expect to flesh out details about the examples you suggest. You don't need to overload init for most cases. What would you have apply do? Initialising a scheduler doesn't seem like abuse to me, since it's not tied with apply in any way.

no more boilerplate than what's being proposed here, and you unlock a lot more functionality/reuse than the approach outlined here.

I would say that the interface you suggest is possible - not to mention, this isn't complete so I would not comment on its current functionality as being its final form. Why do you think we wouldn't be able to reuse code?

Btw i was pointing to use of startval, endval, cycle etc in my previous comments, apologies I should have been clearer.

Loop(Lambda(f = t -> t^3 + 0.5 * t^2 - 0.1), period = 10)

I would typically not want syntax like this but replace it with a generator or existing iterator from base to get this behavior.

@darsnack
Copy link
Member

darsnack commented Feb 16, 2021

What would you have apply do?

My point exactly. It seems weird to me that some structs in Optimisers.jl will implement init/apply and some will only implement init. My larger point is why are the scheduling policies in Optimisers.jl? Don't they belong in a separate package like Schedules.jl (or ParameterSchedulers.jl 😉)?

I would say that the interface you suggest is possible

Yes, I totally agree that it is all possible in this PR. I am not trying to suggest it isn't possible. If it isn't clear, I'm trying to highlight the fact that all the infrastructure is already implemented in ParameterSchedulers.jl. Why are we trying to reimplement the same functionality all over again?

Btw i was pointing to use of startval, endval, cycle etc in my previous comments, apologies I should have been clearer.

Those are optional interfaces. The minimal interface requirements (if you read the page I linked) are just a single function provided by Base. The optional interfaces are meant to reduce boilerplate code across "decay" type schedules and "cyclic" type schedules. If you look at the cyclic optional interface, there is a common bit that all "cyclic" type schedules share. The optional interface allows a user to avoid having to repeat that common bit over and over for every different kind of cyclic schedule and just define the part that's unique. The purpose of the optional interfaces is to reduce boilerplate code.

Just as a note, I didn't arbitrarily decide to make optional interfaces for "decay" and "cyclic" types. Those types are based on published papers on scheduling policies. I didn't make up those definitions, they are defined in the literature. We can remove the DecaySchedule and CyclicSchedule interfaces from ParameterSchedulers.jl if that's what we want. It's not a massive rewrite, it's a simple find and replace style change. But I think those optional interfaces are useful. Either way, it's not a big deal to change that part.

@DhairyaLGandhi
Copy link
Member Author

I am trying to steer the conversation into finding out what building blocks we need to support to write proper scheduling routines.

This isn't about PS and it's choices. I don't understand why it should be that way either.

@darsnack
Copy link
Member

darsnack commented Feb 16, 2021

I am trying to steer the conversation into finding out what building blocks we need to support to write proper scheduling routines.

This isn't about PS and it's choices. I don't understand why it should be that way either.

I agree 100%. I guess the earlier comments about PS confused me into thinking we were conflating the two.

In which case, aside from my review comments, I think the design for applying schedules to optimizers presented here is a good one. In my view it is the correct abstraction, so we can just discuss the finer points in the comment threads.

As for designing the schedules themselves (e.g. Exp, Inv, etc.), I don't think I have any major problems. I just don't think it makes sense to use init or for these pieces to use the Optimisers.jl interface specifically. The schedule itself is a simple one function interface regardless of implementation. Any schedule is just a struct with some fields (e.g. decay parameter) and a function that returns the schedule value at "cursor" t. Right now, Schedule assumes that f can be called as f(t). Why not just leave it at that? It's the part of the InvDecay struct that I don't really like about this approach.

@darsnack
Copy link
Member

darsnack commented Feb 16, 2021

Thinking about this a little more and trying to be more concrete about my thoughts
(TL;DR - merge Schedule after review, don't write schedule policies like Inv in Optimisers.jl):

First, I think that the schedules (i.e. policies like Inv) and the scheduler (i.e. the thing that sets opt.eta to the current schedule value) should be separate. This PR takes that approach, so +1 from me.


Second, I think that the schedules should be defined by simply "value of s at iteration t." In PS, I wrote that as s[t], in this PR, it is s(t). I think the specific syntax in this PR makes it so that a regular function has the same syntax as a struct. Whereas the s[t] syntax requires something like Lambda(f = ...) as a wrapper. So, I think the syntax in this PR is better.


Third, I do not like init(s::Inv, x). I think the only part that should implement init is Schedule. By default, you'll almost always want init(s::Inv, x) = (1, 1) (i.e. the initial iteration is 1 and you step by 1 each time). If the user did want to do something like step every 2, then overriding init is not a nice way for the user to do that. For starters, defining Optimisers.init(s::Inv, x) = (1, 2) is opaque in my user code as to what the intent is for the program. Second, it also means that now all Inv must step by 2. If I wanted both an Inv that stepped by 1 and an Inv that stepped by 2, I would need to define a second struct and copy the source code for Inv or wrap an Inv. A better approach might look something like:

struct Throttle{S, T<:Integer}
  schedule::S
  rate::T
end

(s::Throttle)(t) = s.schedule(s.rate * t)

Starting at a specific t other than 1 also isn't really something anyone might want. If that is what they really want, then we can implement a similar generic struct like Throttle above. Probably, what they want is to be able to start at an arbitrary initial value. In the case of Inv, it would be the addition of a scale parameter:

struct Inv{T}
  scale::T
  decay::T
end

(s::Inv)(t) = s.scale / (1 + s.decay * t)

Fourth (this ties into my previous point), the schedule policies should not be tied to optimizers only. These are generic annealing policies that can be used in a variety of contexts and domains. This is why I feel that the appropriate location for them is not in Flux or Optimisers.jl. Flux/Optimisers should be one specific user of these schedule policies (arguably the most important user from our perspective), but we shouldn't design everything as if we are the only consumers of the package. That's why I don't like relying on init/apply at all. What's important for us to decide is what the expected contract to get "the value of s at t" is. I think s(t) is a nice one.

@DhairyaLGandhi
Copy link
Member Author

So now we should be able to write things like

Schedule(f, opt)
Schedule(Iterators.cycle(f(x) for x in 1:period), opt) (we can extend it to any iterable, but maybe best to reserve that syntax for a scheduling policy that says "move to next in a vector once you've exhausted the current iterable". It is currently allowed to chuck in any vector with literal schedules.)
Schedule(triangle(period), opt)

I am also working on getting things like vectors of scheduling functions working.

Schedule([f,g,[1,2,3]], opt)

I think it might be reasonable to move scheduling functions from PS into here (or a separate Schedulers.jl)

I'm thinking the design should allow for a scheduler to either be passed into a callback and produce the optimiser state correctly, or be linked into the update step.

Having it as a separate call is not particularly useful (we allow it still). Maybe if we want to chuck out the new optimiser and schedule state, that would be a way forward, but that forces the presence of a schedule to begin with, which is a strong assumption in a training loop IMO.

opt, schedst = next(s, schedst)

@darsnack
Copy link
Member

darsnack commented Feb 20, 2021

I think it might be reasonable to move scheduling functions from PS into here (or a separate Schedulers.jl)

Or just keep them in PS and use them here and elsewhere (would be equivalent to have a separate Schedulers.jl no?)

Schedule(f, opt)
Schedule(Iterators.cycle(f(x) for x in 1:period), opt) (we can extend it to any iterable, but maybe best to reserve that syntax for a scheduling policy that says "move to next in a vector once you've exhausted the current iterable". It is currently allowed to chuck in any vector with literal schedules.)
Schedule(triangle(period), opt)

For things like Sequence and Loop, I tried (again) using the Base.Iterators approach that you suggested. Indeed you can write everything using cycle, take, and repeated. But then you are limited to only using the schedule with iterate. s(t) will no longer work. If we have our own type that wraps the Base.Iterators, then we can write s(t) but the implementation will require collecting the iterator until t. In the end, Base.Iterators is only really designed to be used in an iteration like context. It's a shame that we can't rely on it, but I don't really think that's terrible.

People can still write cycle(Inv(x) for x in 1:100) if they want to. We compose perfectly fine with all the iteration utilities in Base. We just can use them for our implementations.

I am also working on getting things like vectors of scheduling functions working.

Also already implemented in PS. I just want to be clear that I am not trying to force PS. I am just suggesting that a lot of completed work is being redone.

I'm thinking the design should allow for a scheduler to either be passed into a callback and produce the optimiser state correctly, or be linked into the update step.

Having it as a separate call is not particularly useful (we allow it still). Maybe if we want to chuck out the new optimiser and schedule state, that would be a way forward, but that forces the presence of a schedule to begin with, which is a strong assumption in a training loop IMO.

Can you explain this a bit more? I didn't quite follow what you were getting at. I'm not against a schedule being passed as a callback, but I think the Schedule(f, opt) implementing the optimizer interface is more usable in different contexts.

@darsnack
Copy link
Member

I am just suggesting that a lot of completed work is being redone.

Have you seen the darsnack/rm-optim branch of PS? It incorporates almost all of your suggestions, and I think matches what you are implementing here w.r.t. to the schedule policies. The Schedule(f, opt) syntax belongs in Optimisers.jl IMO.

@DhairyaLGandhi
Copy link
Member Author

The reason I want to hook in to the update (and therefore apply) is that that allows people to pass in a scheduled optimiser without changing any code in Flux.

The reason it is being done here is simple. This is where some of the infra is written, and PS is thought of as a collection of scheduling functions, so it can consume this api. if we are to use PS, and go through the trouble of transferring and so on, I would also want to ensure the code structure is that PS simply consumes the API defined here. (I am deliberately trying to make them compatible).

I also don't like the collection of the generators - makes things slower than they should be.

Schedule(f, opt) syntax belongs in Optimisers.jl IMO.

Right, and by extension, the Schedule([f,g], opt) does too, as does the notion of what f might be.

@darsnack
Copy link
Member

The reason I want to hook in to the update (and therefore apply) is that that allows people to pass in a scheduled optimiser without changing any code in Flux.

I completely agree, I'm only trying to point out that you don't need all the schedules to also hook into apply/update. The definition of Schedule here which calls f(t) to get the parameter value, return the new optimizer, and then call apply on the new optimizer is what I want too. All I'm saying is that this interface only requires that f is callable like f(t). You don't need f itself to do anything related to the optimizer interface other than be callable.

code structure is that PS simply consumes the API defined here

I agree, that's why the updated darsnack/rm-optim branch consumes this API. That branch modifies the existing code in PS to use the f(t) interface that you have defined here. I also got rid of the type hierarchy and other syntax that you didn't like.

Right, and by extension, the Schedule([f,g], opt) does too, as does the notion of what f might be.

Sure, I have no problem with that. Though in general, for most schedules, f and g are not exhaustible. Scheduling policies are generally infinite iterators. That's why PS has the ability to define Sequence(f => T1, g => T2, ...) which specifies how long to run f before switching to g. If Schedule([f, g], opt) defines this sequential like behavior, then I think it is better to just define it with a composite schedule. We could do Schedule([f => T1, g => T2], opt) but that would limit the sequence of schedules to be used with an opt. Defining this using a composite schedule like Sequence allows us to use the sequence in more contexts.

Unless you meant for some other functionality from Schedule([f, g], opt). I have no problem with the API. Just that if it is used for the behavior described above, then I think Sequence is better.

@DhairyaLGandhi
Copy link
Member Author

So the implementation of Sequence is essentially this right.

Sure scheduling functions are infinite (as long as they are functions), but Sequence is still used to have a composite scheduling policy. Re

move to next in a vector once you've exhausted the current iterable

Sequence is just generalised f. You can't have s(t) but you can have a next(s, st), since the state can tell you where you are in the schedule. Think of it as Schedule([f,g], opt) as [Schedule(f, opt), Schedule(g, opt)]

@darsnack
Copy link
Member

scheduling functions are infinite (as long as they are functions)

Not just arbitrary functions. Almost all common schedules like exponential decay, inverse decay, cosine annealing, cyclic LR, etc. are infinite iterators.

You can't have s(t) but you can have a next(s, st)

Then Sequence is not a schedule as defined by the interface here. Composite schedules need to also adhere to the s(t) interface. If they only adhere to iteration interfaces, then you can't swap out a schedule for a composite schedule. An analogy: it would be weird if I couldn't treat Optimiser(opts...) the same as Momentum(...).

Think of it as Schedule([f,g], opt) as [Schedule(f, opt), Schedule(g, opt)]

This isn't necessarily a sequence. This could read as opt.eta follows the values f(1), g(1), f(2), g(2), ... or g(f(1)), g(f(2)), ..., or f(1), f(2), f(T), g(1), g(2), ....

@darsnack
Copy link
Member

Also, sequences aren't the only composite schedule. Looping is a composite schedule. Throttling/accelerating is a composite schedule. It seems weird that sequences get the special Schedule([f, g], opt) syntax (assuming that's what you are proposing). And even if we were okay with that asymmetry, restricting sequences to Schedule([f, g], opt) means I can only define and use a sequence of schedules with an optimizer? What if I want to use the schedule to adjust a hyper-parameter completely orthogonal to the optimizer?

@DhairyaLGandhi
Copy link
Member Author

Almost all common schedules like exponential decay, inverse decay, cosine annealing, cyclic LR, etc. are infinite iterators

Those are handled as usual, so I guess it's fine to table that

then you can't swap out a schedule for a composite schedule

You totally could, the change only had to be handled internally, nothing changes from the consumer point, be that PS, or an end user writing a custom scheduler or training loop

What if I want to use the schedule to adjust a hyper-parameter completely orthogonal to the optimizer

Could you write an MWE? The current implementation of this or ps should not have access to state beyond that of the scheduler. But if you want to mess with a parameter, we can totally add that too. The optimiser is in its own little space in this abstraction, what you're modifying is of no concern here.

@darsnack
Copy link
Member

darsnack commented Feb 20, 2021

You totally could, the change only had to be handled internally, nothing changes from the consumer point, be that PS, or an end user writing a custom scheduler or training loop

Currently, with the proposed interface, I can write the following:

s = Inv(...)
for t in 1:100
  # some code
  my_object.param = s(t)
  # some other code
end

If you define Sequence(f => T1, g => T2, ...) = Iterators.flatten(take(f, T1), take(g, T2), ...) (syntax etc. is just pseudo-code), then you cannot do

s = Sequence(Inv(...) => 50, Exp(...) => 50)
for t in 1:100
  # some code
  my_object.param = s(t)
  # some other code
end

Could you write an MWE? The current implementation of this or ps should not have access to state beyond that of the scheduler. But if you want to mess with a parameter, we can totally add that too. The optimiser is in its own little space in this abstraction, what you're modifying is of no concern here.

I work on spiking neural networks. Being another learning system, these networks have various hyper-parameters. I want to schedule them the same way you might schedule the learning rate in deep learning. The proposed syntax of Schedule([Inv(...), Exp(...)], opt) doesn't allow me to use the sequence of schedules with anything other than opt.

Another example from my work. In adversarial ML, there are various hyper-parameters for the attack algorithm. I want to schedule these hyper-parameters. They don't belong to any optimizer, so the syntax Schedule([Inv(...), Exp(...)], opt) is again meaningless in my situation.

@DhairyaLGandhi
Copy link
Member Author

I think there is a misunderstanding, I am proposing that the opt be replaced by some other entity you want to tweak, be that an array, the size of something or whatever.

Re Sequence, you can do that if you track the state as well, which we can technically hide from user code but I don't know if it's necessary. Rather than give it infinite schedules like Inv, you can give it a generator? I guess we can add sugar to it to make it easier to write, but I'm hoping for one of two things. Either there is a regular function in the vector of schedules to finish the remainder of the training (or we just add an identity block at the end), or break the training loop once the schedules are exhausted. I think if we can handle both, with some tweaks.

@DhairyaLGandhi
Copy link
Member Author

I think I understand the point of spiking nets and why you might shy away from an explicit state, but in those cases if you're opting in to scheduling, we can also store the state internally, but I am not sure if it's a deal breaker

@darsnack
Copy link
Member

Re Sequence, you can do that if you track the state as well, which we can technically hide from user code but I don't know if it's necessary. Rather than give it infinite schedules like Inv, you can give it a generator? I guess we can add sugar to it to make it easier to write, but I'm hoping for one of two things. Either there is a regular function in the vector of schedules to finish the remainder of the training (or we just add an identity block at the end), or break the training loop once the schedules are exhausted. I think if we can handle both, with some tweaks.

Or you could just use an immutable version of Sequence which is exactly what PS provides. It's not like the code for Sequence or any of the composite optimizers is complex. Seems like a lot of extra effort just to use Iterators.flatten, etc.

I am proposing that the opt be replaced by some other entity you want to tweak, be that an array, the size of something or whatever

Having something like Schedule(f, x) for any object x is very interesting. I guess you could make x return a re-initialized struct (assumes initialization is cheap). If we could do this well for any x, not just optimizers, then that should belong with the scheduling package, I think. We would still define apply etc. for Schedule in here. Overall, this is interesting but seems harder than just supporting optimizers.

Just to be clear, I support your proposal here for Schedule(f, opt). Looks like the right approach to me. The whole aside with PS is pointing out that the Inv, Exp, Sequence, etc. in PS already use the interface proposed here, and so they will compose well, and we don't need to re-write code. We would still need to do Schedule(f, opt) in Optimisers.jl like you are proposing.

@DhairyaLGandhi
Copy link
Member Author

The branch you mention does seem to reduce code bloat. While you're at it, I would also definitely remove the need for defining IteratorLength (?) etc. Straight callables don't need it, and limited iterables carry state explicitly to handle that sort of thing.

I do think Sequence isn't the biggest deal breaker since you could get that effect in a loop anyway. It doesn't seem to belong in PS imo.

@darsnack
Copy link
Member

I would also definitely remove the need for defining IteratorLength (?) etc.

I can do that. These are part of the optional iteration interface in Base. I don't really know how they are used in practice though.

I do think Sequence isn't the biggest deal breaker since you could get that effect in a loop anyway. It doesn't seem to belong in PS imo.

I feel like all composite schedules belong with the basic schedules. For example, having sequence be like Optimiser is to Descent means that you can do Loop(Sequence(...), ...) etc. Sequence + looping + arbitrary functions gives all the building blocks you want for composable schedules.

@DhairyaLGandhi
Copy link
Member Author

Right exactly. These building blocks belong here, and the consumption in PS.

@darsnack
Copy link
Member

But none of those blocks relate specifically to optimizers? They are just another schedule. Ergo, the schedule package makes most sense to me.

@darsnack
Copy link
Member

Going back to my examples with the spiking neural network. Why should I need to use Optimisers.jl/Flux.jl AND ParameterSchedulers.jl when all I need is related to hyper-parameter schedules and fits neatly in PS? I don't need nor want anything related to optimization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants