-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mutating calls #242
Comments
In theory its fine (well there are some rules about things you have to do). |
That's great news, though I'm not sure in what sense zygote doesn't support this - their buffer type for example mutates in-place https://github.com/FluxML/Zygote.jl/blob/84bf62ea18330389c64d0d918c91d7b897e1a5d8/src/lib/buffer.jl |
The Buffer type is special. Its the only thing in Zygote that support mutation. |
While I remember. The two rules of pullbacks for mutating functions
This second rule does seem pose a problem for mutation support of functions that mutate a value and then don't return it. |
And in zygote, what happens when I naively define this pullback for a mutating function that does return the argument it modifies (and the underlying code always uses the return value) ? Currently I just define my rule so that it's not actually mutating in-place ... |
If you do that then sometimes Zygote will silently return the wrong answer. I can't off hand tell you what times those are though |
Example of what this looks like (if an AD did support mutation) following the rules posted above function f(x)
for i in eachindex(x)
x[i] = x^2
end
return x
end
function rrule(typeof(f), x)
x_is_negative = x .< 0
function pullback(dy)
# need to undo the change to `x` incase it is used in another rule.
x .= sqrt.(x) .* (x_is_negative .* -1)
# if mutated on the forward need to mutate to store the derivative
dy .= 1/sqrt.(dy) # is this math right?
# return zero not dy as we have already accumulated that by mutating dy
return NO_FIELDS, ZeroTangent()
end
return f(x), pullback
end |
So in Enzyme we support mutating calls, aliasing (c.f #350) and activity (c.f. #452). All of these problems are somewhat tightly correlated. For me a motivating example is supporting GPU codes where outputs are mutated and there is no return value, One of the issues @wsmoses have been debating the responsibility of the caller. Since Enzyme doesn't use closures to capture the inputs, but expects the user to pass in both the shadow and the primal value. So if another part of the program mutates I think we currently expect the user to cache it. Aliasing within the adjoint is solved by caching it, but since we can use LLVM alias-analysis we can limit the amount we need to cache. |
Is it possible to somehow define derivatives of in-place mutating functions? Eg axpy!(a,x,y) updates y to be y = a*x+y, and therefore it's derivative also needs to be updated.
Apologies if this is the documentation, I missed it.
The text was updated successfully, but these errors were encountered: