-
-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gradient calculation with explicit type cast is broken #810
Comments
xref FluxML/Flux.jl#815 I think. Ideally Zygote would handle this without blinking. Ideally (IMO) Flux would simply give you an error on mismatched eltypes, instead of weird silent conversions or slow paths. And ideally you would make sparse |
Yes, if flux raised error suggesting there is mismatched eltypes, I would cast my sparse |
I tried to add warnings about this and other type mismatches in FluxML/Flux.jl#1031 but well nothing happened. Did not think about sparse anything. I don't see why dense |
Well, what I meant is: Thanks for the link to the PR. Is there reason why you closed it? If it's still relevant, which I think it is, maybe I can try to ping other people to look at that PR and hopefully we can get it merged. |
Besides utopian ideas like fixing Flux, this might be a duplicate of #575 -- any broadcast over a sparse array seems to give an error: julia> using Zygote, SparseArrays
julia> gradient(x -> sum(x), sprand(6, 0.9))[1] # ok
6-element Fill{Float64}: entries equal to 1.0
julia> ans == ones(6)
true
julia> gradient(x -> sum(Float32.(x)), rand(6))[1] == ones(6) # ok
true
julia> gradient(x -> sum(Float32.(x)), sprand(6, 0.9))[1] # as above
ERROR: MethodError: no method matching zero(::Type{Any})
julia> gradient(x -> sum(exp.(x)), sprand(6, 0.9))[1]
ERROR: MethodError: no method matching zero(::Type{Tuple{Float64,Zygote.ZBack{ChainRules.var"#exp_pullback#1289"{Float64}}}})
...
Stacktrace:
[1] _zeros_eltypes at /Applications/Julia-1.5.app/Contents/Resources/julia/share/julia/stdlib/v1.5/SparseArrays/src/higherorderfns.jl:203 [inlined]
[2] _noshapecheck_map(::Zygote.var"#1068#1075", ::SparseVector{Tuple{Float64,Zygote.ZBack{ChainRules.var"#exp_pullback#1289"{Float64}}},Int64}) at /Applications/Julia-1.5.app/Contents/Resources/julia/share/julia/stdlib/v1.5/SparseArrays/src/higherorderfns.jl:159
[3] map(::Zygote.var"#1068#1075", ::SparseVector{Tuple{Float64,Zygote.ZBack{ChainRules.var"#exp_pullback#1289"{Float64}}},Int64}) at /Applications/Julia-1.5.app/Contents/Resources/julia/share/julia/stdlib/v1.5/SparseArrays/src/higherorderfns.jl:142
[4] adjoint at /Users/me/.julia/packages/Zygote/c0awc/src/lib/broadcast.jl:137 [inlined]
julia> Float32.(sprand(10,0.5)) # respects sparsity
10-element SparseVector{Float32,Int64} with 3 stored entries:
[7 ] = 0.453431
[8 ] = 0.898222
[10] = 0.298508
julia> exp.(sprand(10,0.5)) # all entries nonzero
10-element SparseVector{Float64,Int64} with 10 stored entries:
[1 ] = 1.26619
[2 ] = 1.0
[3 ] = 1.38125
[4 ] = 1.46056
[5 ] = 2.5163
[6 ] = 2.35844
[7 ] = 1.0
[8 ] = 1.99315
[9 ] = 2.48701
[10] = 1.24585 For broadcasting Float32 in particular, it looks like #762 has a special case for that: which fixes |
I implemented a julia> using Zygote, Test, ChainRulesCore, Random, SparseArrays
julia> Zygote.refresh()
julia> function ChainRulesCore.rrule(::typeof(Broadcast.broadcasted), T::Type{TT}, x::AbstractSparseArray) where TT
function broadcasted_cast_sparse(Δ)
return NoTangent(), NoTangent(), Δ
end
T.(x), broadcasted_cast_sparse
end
julia> s = sprand(Float32, 5, 5, 0.5)
5×5 SparseMatrixCSC{Float32, Int64} with 12 stored entries:
⋅ ⋅ ⋅ 0.321893 0.835169
⋅ 0.459585 0.325418 0.265581 0.0364544
0.678645 ⋅ ⋅ 0.229887 0.73394
⋅ ⋅ ⋅ 0.755401 ⋅
0.953619 0.926159 ⋅ ⋅ ⋅
julia> l1, gs1 = withgradient(s) do s
sum(Float64.(s))
end
(val = 6.521751821041107, grad = (sparse([3, 5, 2, 5, 2, 1, 2, 3, 4, 1, 2, 3], [1, 1, 2, 2, 3, 4, 4, 4, 4, 5, 5, 5], Float32[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], 5, 5),))
# zero-entries get zero derivatives
julia> gs1[1]
5×5 SparseMatrixCSC{Float32, Int64} with 12 stored entries:
⋅ ⋅ ⋅ 1.0 1.0
⋅ 1.0 1.0 1.0 1.0
1.0 ⋅ ⋅ 1.0 1.0
⋅ ⋅ ⋅ 1.0 ⋅
1.0 1.0 ⋅ ⋅ ⋅
# same matrix but now collected in a dense array gets all 1 derivatives
julia> l2, gs2 = withgradient(collect(s)) do s
sum(Float64.(s))
end
(val = 6.521751821041107, grad = (Float32[1.0 1.0 … 1.0 1.0; 1.0 1.0 … 1.0 1.0; … ; 1.0 1.0 … 1.0 1.0; 1.0 1.0 … 1.0 1.0],))
julia> gs2[1]
5×5 Matrix{Float32}:
1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 So specifically, it is fine to return a gradient in the restricted manifold spanned by the non-zero entries? Or we should consider a sparse matrix as embedded in the full matrix space? |
Yes that sound great. The policy in CR is to regard zero entries as structurally zero, and |
is not working, it throws following error.
The gradient computation is working without the explicit cast, but the cast is done in Flux.jl in Dense Layer in order to hit BLAS in https://github.com/FluxML/Flux.jl/blob/master/src/layers/basic.jl#L138
And this causes following issue: FluxML/Flux.jl#965
I guess Zygote should be able to handle this case, right?
The text was updated successfully, but these errors were encountered: