-
-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong results with higher order pullback -- chained if/else in accum
#937
Comments
|
Huh interesting. This is a bit odd. This might be because of an incorrect |
I tried commenting out the adjoints for julia> function mwe()
a = 0
b = .5
data = 1.0
function loss()
y, pb = Zygote.pullback(data) do x
aa = abs2(x - a)
bb = abs2(x - b)
r = 1 - aa - bb
end
dfdx, = pb(data)
dfdx
end
l1 = sum(loss())
l2, pb = Zygote.pullback() do
sum(loss())
end
@show l1, l2
@assert l1 == l2
end
mwe (generic function with 1 method)
julia> mwe()
(l1, l2) = (-3.0, -2.0)
ERROR: AssertionError: l1 == l2
Stacktrace:
[1] mwe()
@ Main ./REPL[9]:23
[2] top-level scope
@ REPL[10]:1 |
I tried running a few experiments with explicit parameters to see if it was global lookups which didn't work either |
Might this be related to perturbation confusion (c.f. JuliaDiff/ForwardDiff.jl#83)? |
The example from the referenced paper (which mentions the problem for Forward-Mode) |
The outer gradients are computed wrong too (not in this example though, since there is no outer gradient). |
I'm going to have to jump deeper in here, will dig in |
Following up on @mzgubic's simplification, here is a more minimal example, and one without a second derivative at all. The problem appears to be that Zygote is getting confused by the julia> using Zygote
julia> function mmwe()
f() = gradient(x -> 13*x + x, 17)[1]
α = f()
β, _ = pullback(f)
@show α, β
nothing
end;
julia> mmwe()
(α, β) = (14, 2)
julia> let
α = Zygote.accum(1,2)
β, _ = pullback(Zygote.accum,1,2)
@show α, β
end;
(α, β) = (3, 2)
julia> Zygote.accum(x, y) =
x === nothing ? y :
# y === nothing ? x : # this won't fix mwe(), it needs accum(::Float64, ::Missing)
x + y
julia> mmwe()
(α, β) = (14, 14) |
accum
Here comes a strange one... This is the minimal working example I could find to get the erroneous result
The loss function computes the gradients of some function (think of it as a mixture of eucl. distances) at each point given by the columns of
x
/data
. I extract the gradients with thedfdx, ...
statement and return it as result of the loss function.In a later stage I now wish to optimize some model over this loss function, so I need the derivative of this spatial gradient wrt. to the model parameters. Here the model is just the identity and the loss does not depend on the parameter
z
.However, when evaluating the loss via/inside the pullback function it returns another result.
Strangely this only happens when the
sum
s involve theabs2
terms and I also need to subtract both,aa
andbb
, otherwise the resultsl1
andl2
are the same 😮Any feedback is welcome :)
The text was updated successfully, but these errors were encountered: