-
-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Type Promotion often Unwieldy and day Ruining #1026
Comments
More minimum + usable w/ revise... using Flux
flatten(x) = reshape(x, :, size(x, 4))
stop_gradient(x) = x
Flux.@nograd stop_gradient
function main()
#Let's make a model....
obs, vars = ( 10, 200 )
proxydata = randn(vars, 1, 1, obs)
model = Chain(Conv( ( 5, 1 ), 1 => 16, relu),
Conv( ( 5, 1 ), 16 => 24, relu),
# stop_gradient,
flatten,
Dense( 4608, 32, relu ),
Dense( 32, 32, relu ),
Dense( 32, 1, relu ))
println(model(proxydata))
mseloss( x1, y1 ) = sum( model( x1 ) .- y1 )
Flux.gradient(()->mseloss(proxydata, randn(1,10)), Flux.params(model))
end This is a weird bug, as I've been using zygote+flatten for a bit now on the GPU and haven't run into this at all. It could be CPU specific (again which is really odd). Something weird I found when minimizing this is it works w/ |
Interesting it seems associated with the CPU? Unfortunately for my use case I must be able to backprop on the CPU... Yes the error is sporadic. Making it even harder to debug. I think 1/20 times the code will actually run making you believe you have solved the problem only to be met with undef ref on the second go around... Thank you for cleaning up the code. What I posted was a hot mess from ripping apart a somewhat large architecture but light weight architecture(i've been around the block) and presuming Flux 0.1.0 worked how the previous versions did... Once this is fixed I have another fundamental bug to report but this takes serious priority for me and likely many others... |
If you insert this function in place of
If you further change the input data to be And with With hindsight this ought to have been obvious from the stacktrace, which contains a mixture of AbstractFloat and Float64 and Float32. Would he helpful if Flux could catch that, though. |
okay hang on... So Flux is changing data from Float64 to Float32, then when it compares at the loss function it cannot do the conversion(because it would break the graph). Oof - this reminds me of pytorchy juggling types and tensors. Why does it does type conversion happen if there is a reshape, but nowhere else? So the solution may be to degrade the y values from double to single precision? I'll try this, because I can accept single precision for this application, but weird! Awesome debugging. |
Everything in Flux is by default Float32, because GPUs like that. But sometimes things get promoted e.g. FluxML/NNlib.jl#149, and there is some logic to auto-convert things #815 (comment) . If you ask me the default should be to error anytime an array of floats has a gradient of different precision, perhaps It's extra-weird that in your example, it varies from case to case, that would be worth tracking down. |
Okay I can confirm changing both the X & Y to Float32's prevents the undef ref error but the gradient does not update. Any thoughts on why the gradient is all zeros? I do agree that a helpful user message could prevent serious hardship for someone else. |
I'm running into this reshape issue as well; I'm using a reshape in the loss function. A forward pass preserves the Float32s, the loss function takes Float32s and outputs Float32 as well. The stracktrace also shows a bunch of For what it's worth, I think my issue happens just in the loss function where I compute a weighted crossentropy like this: function weighted_crossentropy(ŷ::AbstractArray{T,5}, y::AbstractArray{T,5}) where {T}
total_pixels = size(y, 1) * size(y, 2) * size(y, 3)
# Give every class some weight
α = 0.05f0
# Rows are pixels, columns are classes
classes_per_pixel_y = reshape(y, total_pixels, :)
classes_per_pixel_ŷ = reshape(ŷ, total_pixels, :)
# Compute weights relative to area (more area = smaller weight)
weights = α .+ area .- sum(classes_per_pixel_y, dims = 1)
weights_normalized = weights ./ sum(weights)
# Finally compute the weighted cross entropy
β = 1.0f-45
return -sum(classes_per_pixel_y .* log.(classes_per_pixel_ŷ .+ β) .* weights_normalized) / area
end If I add @mcabbott's |
For completeness, the answer here is:
Ref. JuliaDiff/DiffRules.jl#26 perhaps. |
Yes, I can confirm, the issue in my code is not |
Thank you for getting further along with this. It's no just Reshape, it appears to be type promotion in "unexpected" places. Such as |
Great! With that PR, dividing by an integer seems safe again: julia> DiffRules.diffrule(:Base, :/, :x, :y)
(:(one(x) / y), :(-((x / y) / y)))
julia> gradient(x -> x/5 * x, Float32(1))[1]
0.4f0 |
awesome work @haampie ! @mcabbott do you feel this issue made a large enough stink that it can be closed now? I feel with integer division safety back in place, most other issues will naturally occur? I think exponentiation (elementwise) also had type instability, but am not in a position to check this right now. |
Your call but I'd close it. I don't see obvious problems with julia> gradient(x -> sum(identity, exp.(x ./ 2)), [1f-1, 2f-2])
(Float32[0.52563554, 0.5050251],) |
So I've blown 2 days tracking pervasive errors in making basic toy models for a project at work... The errors are Undef Ref errors... Somewhere in Zygote on an @adjoint function...
I believe the issue is reshape in a chain or a model or something explodes the world... Below is a minimum not working example...
The text was updated successfully, but these errors were encountered: