-
-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with CRF loss function #1087
Comments
Oops. Could you please add in the stacktrace you see as well? Itd make it easier to spot where the issue is |
`crf_loss=11.085941941312553ERROR: LoadError: DimensionMismatch("cannot broadcast array to have fewer dimensions") ` |
So /src/lib/array.jl:47 is
You can make the error go away by defining (Notice also, aside, that the gradient has a different element type, which is a common performance bug, ref #1031.) |
Thank you for responding so quickly :-D Yes, adding _droplike does seem to make the issue go away, at least in my short test. Working on a bigger test now. Please let me know if I can be helpful in any way |
@mcabbott sorry, but I don't think the fix works. It does run without complaint (and quickly). |
I am going to create some known trivial input for the CRF training, and compare the output for Flux9 and 10. With the same hyper parameters the result should be pretty similar. Will report back |
No smarter ideas, that sounds like the right course. |
Good morning Michael. I am pleased to report that when trained with identical starting conditions, the CRF exactly matches the results to Flux 9/Tracker. Unfortunately, I am building a CRF/LSTM model, and when placed on top of another layer it is not training properly. It runs, but does not produce good models. Since I am new to Flux 10, I assume it is my bug and Tomas Pevny has offered to look at my code. However, is it possible that your suggested fix could affect lower layers in a DNN? |
Hello Michael, happy April Fool. Unfortunately, not a fool here. I now have 2 versions of CRF on top of other layers that behave the same way-- the CRF works on its own, but the lower layers do not train properly. As in the first example, the weights change, but the loss does not decrease. The second example is the CRF test from TextAnalysis.jl ported to Flux 10. Here is the code. Let me know if you want the full branch of TextAnalysis port to Flux 10 `
|
Reopening... |
@opus111 can you check if this issue has been resolved on master? |
Bump @opus111 I think you said this can be closed? |
Yes. I think RNNs work now. Haven't checked running a CRF on top of an
RNN yet
…On Mon, Dec 7, 2020 at 5:24 PM Kyle Daruwalla ***@***.***> wrote:
Bump @opus111 <https://github.com/opus111> I think you said this can be
closed?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1087 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAANIBA4ZRGVTWXPH5R666LSTVIYPANCNFSM4LNNRZJA>
.
|
Here is a file that reproduces the problem. This code is copied from the TextAnalysis package and slightly altered for Flux 10. The version in GitHub works with Flux 9
`#=
This code is copied from CRF of TextAnalysis
The current version in GitHub works with Flux 0.9
https://github.com/JuliaText/TextAnalysis.jl/tree/master/src/CRF
=#
using Flux
log_sum_exp(z) = log_sum_exp(z, maximum(z, dims = 1))
log_sum_exp(z, m) = log.(sum(exp.(z .- m), dims = 1)) .+ m
mutable struct CRF{S}
W::S # Transition Scores
n::Int # Num Labels
end
function CRF(n::Integer)
W = rand(Float32, n + 2, n + 2)
W[:, n + 1] .= -10000
W[n + 2, :] .= -10000
return CRF(W, n)
end
Flux.@functor CRF (W,)
preds_first(c::CRF, y) = c.W[c.n + 1, Flux.onecold(y, 1:length(y))]
preds_last(c::CRF, y) = c.W[Flux.onecold(y, 1:length(y)), c.n + 2]
preds_single(c::CRF, y, y_prev) = c.W[Flux.onecold(y_prev, 1:length(y_prev)), Flux.onecold(y, 1:length(y))]
function forward_score(c::CRF, x, init_α)
forward_var = log_sum_exp((c.W .+ transpose(x[1])) .+ init_α)
for i in 2:length(x)
forward_var = log_sum_exp((c.W .+ transpose(x[i])) .+ transpose(forward_var))
end
fs = log_sum_exp(c.W[:, c.n + 2] + transpose(forward_var))
return fs[1]
end
function score_sequence(c::CRF, x, label_seq)
score = preds_first(c, label_seq[1]) + Flux.onecold(label_seq[1], x[1])
for i in 2:length(label_seq)
score += preds_single(c, label_seq[i], label_seq[i-1]) +
Flux.onecold(label_seq[i], x[i])
end
return score + preds_last(c, label_seq[end])
end
crf_loss(c::CRF, x, label_seq, init_α) = forward_score(c, x, init_α) - score_sequence(c, x, label_seq)
label_count = 10
seq_length = 5
crf = CRF(label_count-2)
init_α = fill(-10000.0,label_count)
init_α[label_count-1] = 0.0
label_seq = [Flux.onehot(i,1:label_count) for i in 1:seq_length]
x = [rand(label_count) for _ in 1:seq_length]
print("crf_loss=$(crf_loss(crf,x,label_seq,init_α))")
print("gradient(crf_loss)=$(gradient(() -> crf_loss(crf,x,label_seq,init_α)))")
`
The text was updated successfully, but these errors were encountered: