Computation of Hessian fails #1312

renatobellotti · 2022-09-22T15:03:05Z

Hi,

I'm trying to calculate a masked squared loss as part of my loss function. The loss function and its gradient can be computed without problems, but the Hessian fails:

using Zygote
using SparseArrays
using Random
using CUDA


function masked_mean(x::AbstractArray{T, N}, mask::AbstractArray{T, N}) where {T, N}
    n = sum(mask)
    if n > zero(T)
        return sum(x .* mask) ./ n
    else
        return zero(T)
    end
end


function test(x::AbstractArray{T, N}, mask::AbstractArray{T, N}, goal::T) where {T, N}
    return  (masked_mean(x, mask) - goal)^2
end


A = cu(sprand(200, 100, 0.8))
v = cu(rand(100))
mask = cu(round.(rand(200)))

goal = 1.f0


# test(A*v, mask, goal)
# Zygote.gradient(v -> test(A*v, mask, goal), v)
Zygote.hessian(v -> test(A*v, mask, goal), v)

Error message:

MethodError: no method matching test(::CuArray{ForwardDiff.Dual{Nothing, Float32, 12}, 1, CUDA.Mem.DeviceBuffer}, ::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, ::Float32)
Closest candidates are:
  test(::AbstractArray{T, N}, ::AbstractArray{T, N}, ::T) where {T, N} at In[1]:17

Stacktrace:
  [1] macro expansion
    @ ~/.julia/packages/Zygote/dABKa/src/compiler/interface2.jl:0 [inlined]
  [2] _pullback(::Zygote.Context{false}, ::typeof(test), ::CuArray{ForwardDiff.Dual{Nothing, Float32, 12}, 1, CUDA.Mem.DeviceBuffer}, ::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, ::Float32)
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/compiler/interface2.jl:9
  [3] _pullback
    @ ./In[1]:31 [inlined]
  [4] _pullback(ctx::Zygote.Context{false}, f::var"#1#2", args::CuArray{ForwardDiff.Dual{Nothing, Float32, 12}, 1, CUDA.Mem.DeviceBuffer})
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/compiler/interface2.jl:0
  [5] pullback(f::Function, cx::Zygote.Context{false}, args::CuArray{ForwardDiff.Dual{Nothing, Float32, 12}, 1, CUDA.Mem.DeviceBuffer})
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/compiler/interface.jl:44
  [6] pullback
    @ ~/.julia/packages/Zygote/dABKa/src/compiler/interface.jl:42 [inlined]
  [7] gradient(f::Function, args::CuArray{ForwardDiff.Dual{Nothing, Float32, 12}, 1, CUDA.Mem.DeviceBuffer})
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/compiler/interface.jl:96
  [8] (::Zygote.var"#102#103"{var"#1#2"})(x::CuArray{ForwardDiff.Dual{Nothing, Float32, 12}, 1, CUDA.Mem.DeviceBuffer})
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/lib/grad.jl:64
  [9] forward_jacobian(f::Zygote.var"#102#103"{var"#1#2"}, x::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, #unused#::Val{12})
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/lib/forward.jl:29
 [10] forward_jacobian(f::Function, x::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}; chunk_threshold::Int64)
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/lib/forward.jl:44
 [11] forward_jacobian
    @ ~/.julia/packages/Zygote/dABKa/src/lib/forward.jl:43 [inlined]
 [12] hessian_dual
    @ ~/.julia/packages/Zygote/dABKa/src/lib/grad.jl:64 [inlined]
 [13] hessian(f::Function, x::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/lib/grad.jl:62
 [14] top-level scope
    @ In[1]:31
 [15] eval
    @ ./boot.jl:373 [inlined]
 [16] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
    @ Base ./loading.jl:1196

If I interpret the message correctly, Zygote changes the datatype of the first parameter of test(). Is this to be expected or a bug?

The text was updated successfully, but these errors were encountered:

ToucheSir · 2022-09-22T20:20:07Z

It is to be expected. Zygote.hessian does forward over reverse mode AD using ForwardDiff, which itself works by passing Dual numbers through functions. If you relax the restriction on x and mask both having eltype T, then you should no longer get this error.

renatobellotti · 2022-09-23T08:41:20Z

Thanks for the hint. I made all the element types different and the error message changes. Here is the updated code snippet:

using Zygote
using SparseArrays
using Random
using CUDA


function masked_mean(x::AbstractArray{T, N}, mask::AbstractArray{S, N}) where {T, S, N}
    n = sum(mask)
    if n > zero(T)
        return sum(x .* mask) ./ n
    else
        return zero(T)
    end
end


function test(x::AbstractArray{T, N}, mask::AbstractArray{S, N}, goal::R) where {T, S, R, N}
    return  (masked_mean(x, mask) - goal)^2
end


A = cu(sprand(200, 100, 0.8))
v = cu(rand(100))
mask = cu(round.(rand(200)))

goal = 1.f0


# test(A*v, mask, goal)
# Zygote.gradient(v -> test(A*v, mask, goal), v)
Zygote.hessian(v -> test(A*v, mask, goal), v)

Then there is a warning about bad performance introduced in the Zygote code:

Warning: Performing scalar indexing on task Task (runnable) @0x00002ba13b278010.
│ Invocation of getindex resulted in scalar indexing of a GPU array.
│ This is typically caused by calling an iterating implementation of a method.
│ Such implementations *do not* execute on the GPU, but very slowly on the CPU,
│ and therefore are only permitted from the REPL for prototyping purposes.
│ If you did intend to index this array, annotate the caller with @allowscalar.
└ @ GPUArraysCore <some path>/.julia/packages/GPUArraysCore/lojQM/src/GPUArraysCore.jl:90

And finally the error message:

InvalidIRError: compiling kernel #broadcast_kernel#17(CUDA.CuKernelContext, CuDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(identity), Tuple{ForwardDiff.Dual{Nothing, Float32, 12}}}, Int64) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to Float32)
Stacktrace:
 [1] convert
   @ ./number.jl:7
 [2] setindex!
   @ ~/.julia/packages/CUDA/DfvRa/src/device/array.jl:194
 [3] setindex!
   @ ~/.julia/packages/CUDA/DfvRa/src/device/array.jl:207
 [4] broadcast_kernel
   @ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:57
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code with Cthulhu.jl

Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#17", Tuple{CUDA.CuKernelContext, CuDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(identity), Tuple{ForwardDiff.Dual{Nothing, Float32, 12}}}, Int64}}}, args::LLVM.Module)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/07qaN/src/validation.jl:141
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/07qaN/src/driver.jl:418 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/4yHI4/src/TimerOutput.jl:253 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/GPUCompiler/07qaN/src/driver.jl:416 [inlined]
  [5] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module; strip::Bool, validate::Bool, format::LLVM.API.LLVMCodeGenFileType)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/07qaN/src/utils.jl:68
  [6] cufunction_compile(job::GPUCompiler.CompilerJob, ctx::LLVM.Context)
    @ CUDA ~/.julia/packages/CUDA/DfvRa/src/compiler/execution.jl:354
  [7] #224
    @ ~/.julia/packages/CUDA/DfvRa/src/compiler/execution.jl:347 [inlined]
  [8] JuliaContext(f::CUDA.var"#224#225"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#17", Tuple{CUDA.CuKernelContext, CuDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(identity), Tuple{ForwardDiff.Dual{Nothing, Float32, 12}}}, Int64}}}})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/07qaN/src/driver.jl:76
  [9] cufunction_compile(job::GPUCompiler.CompilerJob)
    @ CUDA ~/.julia/packages/CUDA/DfvRa/src/compiler/execution.jl:346
 [10] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/07qaN/src/cache.jl:90
 [11] cufunction(f::GPUArrays.var"#broadcast_kernel#17", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(identity), Tuple{ForwardDiff.Dual{Nothing, Float32, 12}}}, Int64}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA ~/.julia/packages/CUDA/DfvRa/src/compiler/execution.jl:299
 [12] cufunction
    @ ~/.julia/packages/CUDA/DfvRa/src/compiler/execution.jl:293 [inlined]
 [13] macro expansion
    @ ~/.julia/packages/CUDA/DfvRa/src/compiler/execution.jl:102 [inlined]
 [14] #launch_heuristic#248
    @ ~/.julia/packages/CUDA/DfvRa/src/gpuarrays.jl:17 [inlined]
 [15] _copyto!
    @ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:63 [inlined]
 [16] materialize!
    @ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:41 [inlined]
 [17] materialize!
    @ ./broadcast.jl:868 [inlined]
 [18] (::Zygote.var"#960#961"{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}})(Δ::ForwardDiff.Dual{Nothing, Float32, 12})
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/lib/broadcast.jl:275
 [19] (::Zygote.var"#3755#back#962"{Zygote.var"#960#961"{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}})(Δ::ForwardDiff.Dual{Nothing, Float32, 12})
    @ Zygote ~/.julia/packages/ZygoteRules/AIbCs/src/adjoint.jl:67
 [20] Pullback
    @ ./In[3]:8 [inlined]
 [21] (::typeof(∂(masked_mean)))(Δ::ForwardDiff.Dual{Nothing, Float32, 12})
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/compiler/interface2.jl:0
 [22] Pullback
    @ ./In[3]:18 [inlined]
 [23] Pullback
    @ ./In[3]:31 [inlined]
 [24] (::typeof(∂(#3)))(Δ::ForwardDiff.Dual{Nothing, Float32, 12})
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/compiler/interface2.jl:0
 [25] (::Zygote.var"#60#61"{typeof(∂(#3))})(Δ::ForwardDiff.Dual{Nothing, Float32, 12})
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/compiler/interface.jl:45
 [26] gradient(f::Function, args::CuArray{ForwardDiff.Dual{Nothing, Float32, 12}, 1, CUDA.Mem.DeviceBuffer})
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/compiler/interface.jl:97
 [27] (::Zygote.var"#102#103"{var"#3#4"})(x::CuArray{ForwardDiff.Dual{Nothing, Float32, 12}, 1, CUDA.Mem.DeviceBuffer})
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/lib/grad.jl:64
 [28] forward_jacobian(f::Zygote.var"#102#103"{var"#3#4"}, x::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, #unused#::Val{12})
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/lib/forward.jl:29
 [29] forward_jacobian(f::Function, x::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}; chunk_threshold::Int64)
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/lib/forward.jl:44
 [30] forward_jacobian
    @ ~/.julia/packages/Zygote/dABKa/src/lib/forward.jl:43 [inlined]
 [31] hessian_dual
    @ ~/.julia/packages/Zygote/dABKa/src/lib/grad.jl:64 [inlined]
 [32] hessian(f::Function, x::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/lib/grad.jl:62
 [33] top-level scope
    @ In[3]:31
 [34] eval
    @ ./boot.jl:373 [inlined]
 [35] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
    @ Base ./loading.jl:1196

ToucheSir · 2022-09-23T20:18:42Z

Setting CUDA.allowscalar(false) should give you a better stacktrace. In general, second derivates of non-scalar functions on GPU arrays is pretty shaky with Zygote (and most other Julia reverse mode ADs, unfortunately), so you may have to define some rules to help things along.

FelixBenning · 2022-09-26T13:47:19Z

It is to be expected. Zygote.hessian does forward over reverse mode AD using ForwardDiff, which itself works by passing Dual numbers through functions. If you relax the restriction on x and mask both having eltype T, then you should no longer get this error.

So I am working with kernels of the type ::IncrementalGRF.Kernels.SquaredExponential{Float64, 1} and implemented a call for these, so that I can ask for k(x). Now of course the eltype of x should match the kernel type, otherwise I can not guarantee that the call implementation actually works. It feels really weird (more like a workaround) to drop this restriction due to an implementation detail of Zygote. How can I know, whether my call implementation even works with Dual{...}?

hessian(k, [1.])
ERROR: MethodError: no method matching (::IncrementalGRF.Kernels.SquaredExponential{Float64, 1})(::Vector{ForwardDiff.Dual{Nothing, Float64, 1}})
Closest candidates are:
  (::IsotropicKernel{T, N})(::AbstractVector{T}) where {T, N} at ~/code/incrementalGP/src/kernels.jl:15

ToucheSir · 2022-09-26T14:18:46Z

Dual is an implementation detail of ForwardDiff, not Zygote. There is a pure Zygote hessian_reverse function, but it is generally much slower (reverse over reverse, longer compile times, etc) and buggier. Given that ForwardDiff is more or less the only mature forward mode AD in the Julia ecosystem, you'll find many reverse mode ADs using it internally for various features.

Barring that, adding a dispatch for Dual{T} for your callable kernel would be one way to keep the constraint while still supporting ForwardDiff. The downside is that it requires a dep on ForwardDiff, so if that's a big deal you could bump JuliaDiff/ForwardDiff.jl#518.

renatobellotti · 2022-11-24T15:18:13Z

Thanks for the answers. My takeaway from this thread is that Hessian computation in Julia is not stable yet. For my purposes it is not really needed, so I leave it at that.

ToucheSir · 2022-11-25T01:41:34Z

Kind of. If you must use Zygote, then everything said previously applies. If your code is amenable to working with other ADs, then libraries like ForwardDiff, ReverseDiff and https://github.com/JuliaDiff/SparseDiffTools.jl/ all have more battle-tested ways of calculating hessians.

mcabbott added the second order zygote over zygote, or otherwise label Sep 23, 2022

FelixBenning mentioned this issue Sep 26, 2022

hessian does not like AbstractVectors #1315

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Computation of Hessian fails #1312

Computation of Hessian fails #1312

renatobellotti commented Sep 22, 2022

ToucheSir commented Sep 22, 2022

renatobellotti commented Sep 23, 2022 •

edited

Loading

ToucheSir commented Sep 23, 2022

FelixBenning commented Sep 26, 2022

ToucheSir commented Sep 26, 2022

renatobellotti commented Nov 24, 2022

ToucheSir commented Nov 25, 2022

Computation of Hessian fails #1312

Computation of Hessian fails #1312

Comments

renatobellotti commented Sep 22, 2022

ToucheSir commented Sep 22, 2022

renatobellotti commented Sep 23, 2022 • edited Loading

ToucheSir commented Sep 23, 2022

FelixBenning commented Sep 26, 2022

ToucheSir commented Sep 26, 2022

renatobellotti commented Nov 24, 2022

ToucheSir commented Nov 25, 2022

renatobellotti commented Sep 23, 2022 •

edited

Loading