You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am building an LSTM network using Flux.jl. The network consists of an LSTM layer followed by a sigmoid (dense) layer. I found that the gradients of the LSTM layer estimated by Zygote.jl's gradient function deviate considerably from numerical approximations. The gradients of the dense layer are correct. I have made a small reproducible example below. I suspect that this issue is related to #1209.
using Flux
ϵ =1e-6
x = [rand(2) for i in1:3]
m = Flux.Chain(
Flux.LSTM(2, 3),
Flux.Dense(3, 1, σ))
m = m |> f64
grads =gradient(() ->m.(x)[1][1], Flux.params(m))
Flux.reset!(m)
m[1].cell.Wi[1] += ϵ
o1 =m.(x)[1][1]
Flux.reset!(m)
m[1].cell.Wi[1] -=2*ϵ
o2 =m.(x)[1][1]
display(grads[m[1].cell.Wi][1])
display((o1-o2)/(2*ϵ))
The text was updated successfully, but these errors were encountered:
I am building an LSTM network using Flux.jl. The network consists of an LSTM layer followed by a sigmoid (dense) layer. I found that the gradients of the LSTM layer estimated by Zygote.jl's gradient function deviate considerably from numerical approximations. The gradients of the dense layer are correct. I have made a small reproducible example below. I suspect that this issue is related to #1209.
The text was updated successfully, but these errors were encountered: