-
-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
additional arguments to loss function? #1730
Comments
There's many ways! First a little bit of context. The data you send in Another way, in the case you wish to supplement arguments that don't change, you can write your loss function as See #1530 for how data loaders can be made to produce |
Thanks a lot! Also it seems that the parameters being updated have to be arrays. I observed that if |
Yes, for now all parameters need to an All these restrictions will probably go away in the long term, but for now they are there. The only other restriction I can think of is that your loss function must return a scalar value. If it returns a vector value, then you can still use Flux + its AD system + optimizers to update the model, but you won't be able to use I am closing this issue, since there doesn't seem to be anything actionable here for Flux development. |
Thanks! I just would like to make sure I'm doing the right thing: Suppose my loss functions is I'm now able to do the following: data = DataLoader((x, y), batchsize=32, shuffle=true, partial=true) The above code can run, but I am just wondering if this is actually the correct syntax to do it, and if there are things silently breaking down like the issue of scalar being ignored during training, and how do I detect such silent issues in general if there are no warning or error message. For example, the documentation on training uses |
|
Just to hammer Kyle's point home, this:
Will never happen because it's exactly what an AD framework like Zygote is made to handle. What would happen is that you don't get a gradient for Why not allow gradients for scalar-valued parameters, you might ask? That's a great question, and it comes down to design trade-offs. You might have noticed that most ML libraries expect you to use their own custom array/tensor/variable types. This is no accident: they need those types to be able to keep track of parameters for AD. The huge downside, of course, is that it makes them incompatible with external libraries without tedious manual conversion. As a source-to-source AD, Zygote avoids this problem entirely. You can pass in native Julia values like Arrays and plain numbers, and Zygote will happily calculate gradients for you. These are what we call "explicit" gradients, and you'll notice both the Flux and Zygote tutorials demonstrate how to work with them. But what if you want to train a model that has millions of parameters contained in hundreds of arrays? Passing them in as individual arguments is untenable, so f(x, a, b) = ax + b
a = 1
b = 1
gs = gradient(() -> f(5), Params([a, b]))
∇a, ∇b = gs[a], gs[b] It's impossible for Now all of the above rarely comes up in most ML models because all of the params are arrays, so implicit params were considered a reasonable trade-off. However, we are working to make all kinds of parameters work as part of https://github.com/FluxML/Optimisers.jl. Feel free to open a discussion topic there if you have any questions! |
Thanks for the useful comments and information! I will be more careful. The reason I am worried about silent errors is for example I did not get a warning or error message when I did |
This is good feedback, and you are not the first user to have been bitten by this silent error. I have created #1731 to fix it. |
How do I use
Flux.Optimise.train!
when my loss function has arguments in addition tox
andy
?The code below works for loss function with x and y being the only arguments, that is, loss(x, y)
data = DataLoader((x, y), batchsize=2, shuffle=true, partial=true)
opt = ADAM(0.01, (0.9, 0.999))
for epoch = 1:100
Flux.Optimise.train!(loss, params(W, b), data, opt)
println(loss(x, y))
end
What if my loss function is in the form
loss(x, y, z, W, b, c, d)
wherex, y, z
are the data for each batch, andW, b
are the parameters I want to train, andc, d
are the other fixed constant parameters for the loss function? How do I write/modify the above training code for this loss function?Thanks!
The text was updated successfully, but these errors were encountered: