-
-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizer handling of infinite loss #821
Comments
Just dropping it feels like a risky move, but I think it might actually be the right one. There are loss functions that have a degenerate case of OTOH, this is a bit risky since this could also be a result of a user mistake in the loss function definition. |
Dropping everything on |
Not gradients, losses. |
If it's really losses you're concerned about, that's independent of the optimisers, which only ever see gradients. It's less clear what the action item is here since normally it's up to the user to decide what to do with a loss. We could perhaps add a check in |
I used DiffEqFlux to estimate parameters of an differential equation whose solution has a finite escape in some areas of the parameter space.
So even from a mathematical point of view there is no well behaved way to handle this. If i currently return an infinity in my loss function DiffEqFlux (and the underlying machinery) throws an error halting the training. It would be really helpful if there was way to tell Flux to ignore this loss and possibly stay away from this area (optional but nice to have in my case where bad parameters form a region). I am not here to debate whether this should be a +Inf or a NaN or something similar. If i had to give an action item it would be:
My proposal would look like this:
|
Dropping NaN would be bad. I am coming round to the argument that should just use a custom training loop. -Inf doesn't always mean perfect. |
This counter argument makes sense. I also get the position "if there are weird values throw an error". It is a design decision. I think i have communicated my needs and i hand the discussion back to the experts on this code base until i am asked for. I trust you come up with something good. Thanks |
Sorry, did not mean to come across as hostile On further thought I don't think this needs a custom loop, if you original loss function was
|
Potentially we could add |
If Flux.skip() could be called from the loss function that would work too. |
Yes, that's probably what you need, because even if it's not |
In Optim (because this was mentioned above) we rely on users setting the objective value of "bad" inputs to Inf, but that's slightly different. There, a "bad" input is typically an in put from a region in parameter space that you cannot step into (model not defined, model can't be solved succesfully, ...), so we just use the information to backtrack into a "nice region" and do the line search there. "Optimizers" or "trainers" are a bit different in Flux world, because linesearch i rarely used. In Optim, if it's not possible to backtrack into a finite-valued region, we simply halt. |
Yes. |
Poke: |
We haven't added FWIW, you can also do something like |
1232: add Flux.skip() r=DhairyaLGandhi a=Moelf per #821 ### PR Checklist - [x] Tests are added - [ ] Entry in NEWS.md - [x] Documentation, if applicable - [ ] Final review from `@MikeInnes` or `@dhairyagandhi96` (for API changes). Co-authored-by: Moelf <jerryling315@gmail.com> Co-authored-by: Moelf <proton@jling.dev> Co-authored-by: Jerry Ling <proton@jling.dev>
Can this issue be closed, now that |
It hasn't quite been solved yet. There is now a manual way for users to be able to change the optimizer behavior in the presence of infinites, but the optimizers are still not robust to infinite loss values like something from Optim, NLopt, IPOPT, etc. |
Do you think we can get some resources to add a few global optimisation routines, as well as document GalacticOptim.jl and NLOpt.jl use with Flux in Optimisers.jl? |
Sure, but that's still orthogonal. It would be good for Flux optimizers to have a robust handling of Infs which cause a rejection, a pullback to previous parameters, and a change in the step or something like what other optimizers do. Or maybe just automatically apply |
Seems like that would fit well in a higher-level library like FastAI.jl. |
Referring Mike's comment, Flux's optimizers never see objective functions or values, and Flux doesn't have any optimization routines other than GalacticOptim.jl calls packages like NLopt by invoking a complete optimization routine. But for Flux, it manually steps Flux's optimizers with |
Ref SciML/DiffEqFlux.jl#70 . As discussed at JuliaCon, nicer strategies can be used so that if you get an infinite loss the training won't just explode. You can discard the point and take another random draw and things like that. @oxinabox @pkofod probably know more, since I know Optim.jl and other libraries handle this.
(@freemin7)
The text was updated successfully, but these errors were encountered: