Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Computation of Hessian fails #1312

Open
renatobellotti opened this issue Sep 22, 2022 · 7 comments
Open

Computation of Hessian fails #1312

renatobellotti opened this issue Sep 22, 2022 · 7 comments
Labels
second order zygote over zygote, or otherwise

Comments

@renatobellotti
Copy link

Hi,

I'm trying to calculate a masked squared loss as part of my loss function. The loss function and its gradient can be computed without problems, but the Hessian fails:

using Zygote
using SparseArrays
using Random
using CUDA


function masked_mean(x::AbstractArray{T, N}, mask::AbstractArray{T, N}) where {T, N}
    n = sum(mask)
    if n > zero(T)
        return sum(x .* mask) ./ n
    else
        return zero(T)
    end
end


function test(x::AbstractArray{T, N}, mask::AbstractArray{T, N}, goal::T) where {T, N}
    return  (masked_mean(x, mask) - goal)^2
end


A = cu(sprand(200, 100, 0.8))
v = cu(rand(100))
mask = cu(round.(rand(200)))

goal = 1.f0


# test(A*v, mask, goal)
# Zygote.gradient(v -> test(A*v, mask, goal), v)
Zygote.hessian(v -> test(A*v, mask, goal), v)

Error message:

MethodError: no method matching test(::CuArray{ForwardDiff.Dual{Nothing, Float32, 12}, 1, CUDA.Mem.DeviceBuffer}, ::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, ::Float32)
Closest candidates are:
  test(::AbstractArray{T, N}, ::AbstractArray{T, N}, ::T) where {T, N} at In[1]:17

Stacktrace:
  [1] macro expansion
    @ ~/.julia/packages/Zygote/dABKa/src/compiler/interface2.jl:0 [inlined]
  [2] _pullback(::Zygote.Context{false}, ::typeof(test), ::CuArray{ForwardDiff.Dual{Nothing, Float32, 12}, 1, CUDA.Mem.DeviceBuffer}, ::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, ::Float32)
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/compiler/interface2.jl:9
  [3] _pullback
    @ ./In[1]:31 [inlined]
  [4] _pullback(ctx::Zygote.Context{false}, f::var"#1#2", args::CuArray{ForwardDiff.Dual{Nothing, Float32, 12}, 1, CUDA.Mem.DeviceBuffer})
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/compiler/interface2.jl:0
  [5] pullback(f::Function, cx::Zygote.Context{false}, args::CuArray{ForwardDiff.Dual{Nothing, Float32, 12}, 1, CUDA.Mem.DeviceBuffer})
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/compiler/interface.jl:44
  [6] pullback
    @ ~/.julia/packages/Zygote/dABKa/src/compiler/interface.jl:42 [inlined]
  [7] gradient(f::Function, args::CuArray{ForwardDiff.Dual{Nothing, Float32, 12}, 1, CUDA.Mem.DeviceBuffer})
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/compiler/interface.jl:96
  [8] (::Zygote.var"#102#103"{var"#1#2"})(x::CuArray{ForwardDiff.Dual{Nothing, Float32, 12}, 1, CUDA.Mem.DeviceBuffer})
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/lib/grad.jl:64
  [9] forward_jacobian(f::Zygote.var"#102#103"{var"#1#2"}, x::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, #unused#::Val{12})
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/lib/forward.jl:29
 [10] forward_jacobian(f::Function, x::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}; chunk_threshold::Int64)
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/lib/forward.jl:44
 [11] forward_jacobian
    @ ~/.julia/packages/Zygote/dABKa/src/lib/forward.jl:43 [inlined]
 [12] hessian_dual
    @ ~/.julia/packages/Zygote/dABKa/src/lib/grad.jl:64 [inlined]
 [13] hessian(f::Function, x::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/lib/grad.jl:62
 [14] top-level scope
    @ In[1]:31
 [15] eval
    @ ./boot.jl:373 [inlined]
 [16] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
    @ Base ./loading.jl:1196

If I interpret the message correctly, Zygote changes the datatype of the first parameter of test(). Is this to be expected or a bug?

@ToucheSir
Copy link
Member

It is to be expected. Zygote.hessian does forward over reverse mode AD using ForwardDiff, which itself works by passing Dual numbers through functions. If you relax the restriction on x and mask both having eltype T, then you should no longer get this error.

@mcabbott mcabbott added the second order zygote over zygote, or otherwise label Sep 23, 2022
@renatobellotti
Copy link
Author

renatobellotti commented Sep 23, 2022

Thanks for the hint. I made all the element types different and the error message changes. Here is the updated code snippet:

using Zygote
using SparseArrays
using Random
using CUDA


function masked_mean(x::AbstractArray{T, N}, mask::AbstractArray{S, N}) where {T, S, N}
    n = sum(mask)
    if n > zero(T)
        return sum(x .* mask) ./ n
    else
        return zero(T)
    end
end


function test(x::AbstractArray{T, N}, mask::AbstractArray{S, N}, goal::R) where {T, S, R, N}
    return  (masked_mean(x, mask) - goal)^2
end


A = cu(sprand(200, 100, 0.8))
v = cu(rand(100))
mask = cu(round.(rand(200)))

goal = 1.f0


# test(A*v, mask, goal)
# Zygote.gradient(v -> test(A*v, mask, goal), v)
Zygote.hessian(v -> test(A*v, mask, goal), v)

Then there is a warning about bad performance introduced in the Zygote code:

Warning: Performing scalar indexing on task Task (runnable) @0x00002ba13b278010.
│ Invocation of getindex resulted in scalar indexing of a GPU array.
│ This is typically caused by calling an iterating implementation of a method.
│ Such implementations *do not* execute on the GPU, but very slowly on the CPU,
│ and therefore are only permitted from the REPL for prototyping purposes.
│ If you did intend to index this array, annotate the caller with @allowscalar.
└ @ GPUArraysCore <some path>/.julia/packages/GPUArraysCore/lojQM/src/GPUArraysCore.jl:90

And finally the error message:

InvalidIRError: compiling kernel #broadcast_kernel#17(CUDA.CuKernelContext, CuDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(identity), Tuple{ForwardDiff.Dual{Nothing, Float32, 12}}}, Int64) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to Float32)
Stacktrace:
 [1] convert
   @ ./number.jl:7
 [2] setindex!
   @ ~/.julia/packages/CUDA/DfvRa/src/device/array.jl:194
 [3] setindex!
   @ ~/.julia/packages/CUDA/DfvRa/src/device/array.jl:207
 [4] broadcast_kernel
   @ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:57
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code with Cthulhu.jl

Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#17", Tuple{CUDA.CuKernelContext, CuDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(identity), Tuple{ForwardDiff.Dual{Nothing, Float32, 12}}}, Int64}}}, args::LLVM.Module)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/07qaN/src/validation.jl:141
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/07qaN/src/driver.jl:418 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/4yHI4/src/TimerOutput.jl:253 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/GPUCompiler/07qaN/src/driver.jl:416 [inlined]
  [5] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module; strip::Bool, validate::Bool, format::LLVM.API.LLVMCodeGenFileType)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/07qaN/src/utils.jl:68
  [6] cufunction_compile(job::GPUCompiler.CompilerJob, ctx::LLVM.Context)
    @ CUDA ~/.julia/packages/CUDA/DfvRa/src/compiler/execution.jl:354
  [7] #224
    @ ~/.julia/packages/CUDA/DfvRa/src/compiler/execution.jl:347 [inlined]
  [8] JuliaContext(f::CUDA.var"#224#225"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#17", Tuple{CUDA.CuKernelContext, CuDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(identity), Tuple{ForwardDiff.Dual{Nothing, Float32, 12}}}, Int64}}}})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/07qaN/src/driver.jl:76
  [9] cufunction_compile(job::GPUCompiler.CompilerJob)
    @ CUDA ~/.julia/packages/CUDA/DfvRa/src/compiler/execution.jl:346
 [10] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/07qaN/src/cache.jl:90
 [11] cufunction(f::GPUArrays.var"#broadcast_kernel#17", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(identity), Tuple{ForwardDiff.Dual{Nothing, Float32, 12}}}, Int64}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA ~/.julia/packages/CUDA/DfvRa/src/compiler/execution.jl:299
 [12] cufunction
    @ ~/.julia/packages/CUDA/DfvRa/src/compiler/execution.jl:293 [inlined]
 [13] macro expansion
    @ ~/.julia/packages/CUDA/DfvRa/src/compiler/execution.jl:102 [inlined]
 [14] #launch_heuristic#248
    @ ~/.julia/packages/CUDA/DfvRa/src/gpuarrays.jl:17 [inlined]
 [15] _copyto!
    @ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:63 [inlined]
 [16] materialize!
    @ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:41 [inlined]
 [17] materialize!
    @ ./broadcast.jl:868 [inlined]
 [18] (::Zygote.var"#960#961"{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}})(Δ::ForwardDiff.Dual{Nothing, Float32, 12})
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/lib/broadcast.jl:275
 [19] (::Zygote.var"#3755#back#962"{Zygote.var"#960#961"{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}})(Δ::ForwardDiff.Dual{Nothing, Float32, 12})
    @ Zygote ~/.julia/packages/ZygoteRules/AIbCs/src/adjoint.jl:67
 [20] Pullback
    @ ./In[3]:8 [inlined]
 [21] (::typeof(∂(masked_mean)))(Δ::ForwardDiff.Dual{Nothing, Float32, 12})
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/compiler/interface2.jl:0
 [22] Pullback
    @ ./In[3]:18 [inlined]
 [23] Pullback
    @ ./In[3]:31 [inlined]
 [24] (::typeof(∂(#3)))(Δ::ForwardDiff.Dual{Nothing, Float32, 12})
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/compiler/interface2.jl:0
 [25] (::Zygote.var"#60#61"{typeof(∂(#3))})(Δ::ForwardDiff.Dual{Nothing, Float32, 12})
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/compiler/interface.jl:45
 [26] gradient(f::Function, args::CuArray{ForwardDiff.Dual{Nothing, Float32, 12}, 1, CUDA.Mem.DeviceBuffer})
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/compiler/interface.jl:97
 [27] (::Zygote.var"#102#103"{var"#3#4"})(x::CuArray{ForwardDiff.Dual{Nothing, Float32, 12}, 1, CUDA.Mem.DeviceBuffer})
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/lib/grad.jl:64
 [28] forward_jacobian(f::Zygote.var"#102#103"{var"#3#4"}, x::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, #unused#::Val{12})
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/lib/forward.jl:29
 [29] forward_jacobian(f::Function, x::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}; chunk_threshold::Int64)
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/lib/forward.jl:44
 [30] forward_jacobian
    @ ~/.julia/packages/Zygote/dABKa/src/lib/forward.jl:43 [inlined]
 [31] hessian_dual
    @ ~/.julia/packages/Zygote/dABKa/src/lib/grad.jl:64 [inlined]
 [32] hessian(f::Function, x::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/lib/grad.jl:62
 [33] top-level scope
    @ In[3]:31
 [34] eval
    @ ./boot.jl:373 [inlined]
 [35] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
    @ Base ./loading.jl:1196

@ToucheSir
Copy link
Member

Setting CUDA.allowscalar(false) should give you a better stacktrace. In general, second derivates of non-scalar functions on GPU arrays is pretty shaky with Zygote (and most other Julia reverse mode ADs, unfortunately), so you may have to define some rules to help things along.

@FelixBenning
Copy link

It is to be expected. Zygote.hessian does forward over reverse mode AD using ForwardDiff, which itself works by passing Dual numbers through functions. If you relax the restriction on x and mask both having eltype T, then you should no longer get this error.

So I am working with kernels of the type ::IncrementalGRF.Kernels.SquaredExponential{Float64, 1} and implemented a call for these, so that I can ask for k(x). Now of course the eltype of x should match the kernel type, otherwise I can not guarantee that the call implementation actually works. It feels really weird (more like a workaround) to drop this restriction due to an implementation detail of Zygote. How can I know, whether my call implementation even works with Dual{...}?

hessian(k, [1.])
ERROR: MethodError: no method matching (::IncrementalGRF.Kernels.SquaredExponential{Float64, 1})(::Vector{ForwardDiff.Dual{Nothing, Float64, 1}})
Closest candidates are:
  (::IsotropicKernel{T, N})(::AbstractVector{T}) where {T, N} at ~/code/incrementalGP/src/kernels.jl:15

@ToucheSir
Copy link
Member

Dual is an implementation detail of ForwardDiff, not Zygote. There is a pure Zygote hessian_reverse function, but it is generally much slower (reverse over reverse, longer compile times, etc) and buggier. Given that ForwardDiff is more or less the only mature forward mode AD in the Julia ecosystem, you'll find many reverse mode ADs using it internally for various features.

Barring that, adding a dispatch for Dual{T} for your callable kernel would be one way to keep the constraint while still supporting ForwardDiff. The downside is that it requires a dep on ForwardDiff, so if that's a big deal you could bump JuliaDiff/ForwardDiff.jl#518.

@renatobellotti
Copy link
Author

Thanks for the answers. My takeaway from this thread is that Hessian computation in Julia is not stable yet. For my purposes it is not really needed, so I leave it at that.

@ToucheSir
Copy link
Member

Kind of. If you must use Zygote, then everything said previously applies. If your code is amenable to working with other ADs, then libraries like ForwardDiff, ReverseDiff and https://github.com/JuliaDiff/SparseDiffTools.jl/ all have more battle-tested ways of calculating hessians.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
second order zygote over zygote, or otherwise
Projects
None yet
Development

No branches or pull requests

4 participants