-
-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add simple Schedulers #1434
base: master
Are you sure you want to change the base?
Add simple Schedulers #1434
Conversation
Is the intention here to add other Schedulers as well that Pytorch implements?
|
Happy to have help with this! Having said that, the design of the optimisers seems to lend itself fine to adding schedulers. Note that we would want to have a fairly shallow type hierarchy. We probably want to talk about what all features schedulers might need that can fall off from this design and what the limitations are. |
I find the method to find the current epoch number a bit janky, which I would love to improve upon as a start |
I think this should be reconciled with https://github.com/darsnack/ParameterSchedulers.jl somehow. cc @darsnack and @lorenzoh |
ParameterSchedulers.jl should have most of PyTorch's functionality already implemented. The only ones that I'd want to double check are @CarloLucibello mentioned integrating ParameterSchedulers.jl into Flux. I would have already made a PR, but I ran head first into Flux's limited optimizer interface. The main issue is that I am currently writing up what a future interface should look like, and I will post it soon. I'll follow up with a PR to Flux that implements it based on the discussion that follows that post. PS: This really only presents a problem for |
Looking at the source of this PR, it will hit the same issue that |
I'm aware of the bug, which isn't difficult to fix but this was more for design discussion than final implementation. You'll notice the janky name :) |
Good to hear — a design discussion is exactly what I think is needed too. |
Do post your thoughts here @darsnack, that's kind of the motivation of the PR. The question on composition is that we can make it safe to call apply on a number of parameters together, but it's not very intuitive. To be clear, i want to move these over to optimisers.jl now, so we have a unified interface. It's written with that in mind. |
Perfect, I was thinking the exact same thing. |
I do think the interaction between learning rate schedule and optimizer will be very different moving from the current Flux interface to the mostly stateless one in Optimisers.jl. For example, ScheduledOptim.update_func can no longer be a purely mutating function, and I wonder if a PyTorch-esque interface is the best way to go. I know I keep mentioning it, but it's illustrative to see what a mutation-free library like Optax does here. |
I posted my thoughts in a discussion. I'm interested to hear how people think the interface can be improved. Personally, I felt that I'm still unsatisfied with hyperparameter accesses, but I can't figure out anything better. |
Simple design for scheduling learning rate decay using the Optimiser interface