Incorrect gradients LSTM #1222

triepels · 2020-06-12T08:14:41Z

I am building an LSTM network using Flux.jl. The network consists of an LSTM layer followed by a sigmoid (dense) layer. I found that the gradients of the LSTM layer estimated by Zygote.jl's gradient function deviate considerably from numerical approximations. The gradients of the dense layer are correct. I have made a small reproducible example below. I suspect that this issue is related to #1209.

using Flux

ϵ = 1e-6

x = [rand(2) for i in 1:3]

m = Flux.Chain(
    Flux.LSTM(2, 3),
    Flux.Dense(3, 1, σ))

m = m |> f64

grads = gradient(() -> m.(x)[1][1], Flux.params(m))

Flux.reset!(m)
m[1].cell.Wi[1] += ϵ

o1 = m.(x)[1][1]

Flux.reset!(m)
m[1].cell.Wi[1] -= 2*ϵ

o2 = m.(x)[1][1]

display(grads[m[1].cell.Wi][1])
display((o1-o2)/(2*ϵ))

triepels · 2020-06-12T09:07:49Z

The issue is fixed by this branch.

triepels closed this as completed Jun 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect gradients LSTM #1222

Incorrect gradients LSTM #1222

triepels commented Jun 12, 2020 •

edited

Loading

triepels commented Jun 12, 2020

Incorrect gradients LSTM #1222

Incorrect gradients LSTM #1222

Comments

triepels commented Jun 12, 2020 • edited Loading

triepels commented Jun 12, 2020

triepels commented Jun 12, 2020 •

edited

Loading