Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batchnorm fails on GPU #385

Closed
avik-pal opened this issue Sep 5, 2018 · 12 comments
Closed

Batchnorm fails on GPU #385

avik-pal opened this issue Sep 5, 2018 · 12 comments

Comments

@avik-pal
Copy link
Member

avik-pal commented Sep 5, 2018

When trying to run batchnorm on gpus the following error occurs

ERROR: LoadError: GPU compilation failed, try inspecting generated code with any of the @device_code_... macros
InvalidIRError: compiling #19(CuArrays.CuKernelState, CuDeviceArray{Float32,4,CUDAnative.AS.Global}, Base.Broadcast.Broadcasted{Nothing,NTuple{4,Base.OneTo{Int64}},getfield(Base.Broadcast, Symbol("##26#28")){getfield(Base.Broadcast, Symbol("##27#29")){typeof(+),getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##11#12"))}},getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##15#16"))}},getfield(Base.Broadcast, Symbol("##27#29")){typeof(*),getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##11#12"))}},getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##15#16"))}},getfield(Base.Broadcast, Symbol("##5#6")){getfield(Base.Broadcast, Symbol("##27#29")){typeof(/),getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##11#12"))}},getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##15#16"))}},getfield(Base.Broadcast, Symbol("##27#29")){typeof(-),getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##11#12"))}},getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##15#16"))}},getfield(Base.Broadcast, Symbol("##5#6")){getfield(Base.Broadcast, Symbol("##5#6")){getfield(Base.Broadcast, Symbol("##5#6")){getfield(Base.Broadcast, Symbol("##5#6")){getfield(Base.Broadcast, Symbol("##3#4"))}}}}}}}}},typeof(identity)},NTuple{5,Base.Broadcast.Extruded{CuDeviceArray{Float32,4,CUDAnative.AS.Global},NTuple{4,Bool},NTuple{4,Int64}}}}) resulted in invalid LLVM IR
Reason: unsupported call to the Julia runtime (jl_box_float32)
Stacktrace:
 [1] #5 at broadcast.jl:329
 [2] #27 at /home/avik-pal/.julia/packages/Flux/jbpWo/src/tracker/array.jl:404 (repeats 2 times)
 [3] #26 at /home/avik-pal/.julia/packages/Flux/jbpWo/src/tracker/array.jl:393
 [4] _broadcast_getindex_evalf at broadcast.jl:574
 [5] _broadcast_getindex at broadcast.jl:547
 [6] getindex at broadcast.jl:507
 [7] macro expansion at /home/avik-pal/.julia/packages/GPUArrays/3E1qk/src/abstract_gpu_interface.jl:69
 [8] #19 at /home/avik-pal/.julia/packages/GPUArrays/3E1qk/src/broadcast.jl:15
Reason: unsupported call to the Julia runtime (jl_invoke)
Stacktrace:
 [1] #5 at broadcast.jl:329
 [2] #27 at /home/avik-pal/.julia/packages/Flux/jbpWo/src/tracker/array.jl:404 (repeats 2 times)
 [3] #26 at /home/avik-pal/.julia/packages/Flux/jbpWo/src/tracker/array.jl:393
 [4] _broadcast_getindex_evalf at broadcast.jl:574
 [5] _broadcast_getindex at broadcast.jl:547
 [6] getindex at broadcast.jl:507
 [7] macro expansion at /home/avik-pal/.julia/packages/GPUArrays/3E1qk/src/abstract_gpu_interface.jl:69
 [8] #19 at /home/avik-pal/.julia/packages/GPUArrays/3E1qk/src/broadcast.jl:15
Stacktrace:
 [1] check_ir at /home/avik-pal/.julia/packages/CUDAnative/FzmMm/src/validation.jl:131 [inlined]
 [2] macro expansion at ./logging.jl:322 [inlined]
 [3] #compile_function#69(::Bool, ::Function, ::CUDAnative.CompilerContext) at /home/avik-pal/.julia/packages/CUDAnative/FzmMm/src/compiler.jl:651
 [4] compile_function at /home/avik-pal/.julia/packages/CUDAnative/FzmMm/src/compiler.jl:651 [inlined]
 [5] #cufunction#70(::Base.Iterators.Pairs{Symbol,getfield(GPUArrays, Symbol("##19#20")),Tuple{Symbol},NamedTuple{(:inner_f,),Tuple{getfield(GPUArrays, Symbol("##19#20"))}}}, ::Function, ::CUDAdrv.CuDevice, ::Any, ::Any) at /home/avik-pal/.julia/packages/CUDAnative/FzmMm/src/compiler.jl:714
 [6] (::getfield(CUDAnative, Symbol("#kw##cufunction")))(::NamedTuple{(:inner_f,),Tuple{getfield(GPUArrays, Symbol("##19#20"))}}, ::typeof(cufunction), ::CUDAdrv.CuDevice, ::Function, ::Type) at ./none:0
 [7] macro expansion at /home/avik-pal/.julia/packages/CUDAnative/FzmMm/src/execution.jl:219 [inlined]
 [8] _cuda(::CUDAnative.KernelWrapper{getfield(GPUArrays, Symbol("##19#20"))}, ::getfield(GPUArrays, Symbol("##19#20")), ::Tuple{}, ::NamedTuple{(:blocks, :threads),Tuple{Tuple{Int64},Tuple{Int64}}}, ::CuArrays.CuKernelState, ::CuDeviceArray{Float32,4,CUDAnative.AS.Global}, ::Base.Broadcast.Broadcasted{Nothing,NTuple{4,Base.OneTo{Int64}},getfield(Base.Broadcast, Symbol("##26#28")){getfield(Base.Broadcast, Symbol("##27#29")){typeof(+),getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##11#12"))}},getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##15#16"))}},getfield(Base.Broadcast, Symbol("##27#29")){typeof(*),getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##11#12"))}},getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##15#16"))}},getfield(Base.Broadcast, Symbol("##5#6")){getfield(Base.Broadcast, Symbol("##27#29")){typeof(/),getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##11#12"))}},getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##15#16"))}},getfield(Base.Broadcast, Symbol("##27#29")){typeof(-),getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##11#12"))}},getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##15#16"))}},getfield(Base.Broadcast, Symbol("##5#6")){getfield(Base.Broadcast, Symbol("##5#6")){getfield(Base.Broadcast, Symbol("##5#6")){getfield(Base.Broadcast, Symbol("##5#6")){getfield(Base.Broadcast, Symbol("##3#4"))}}}}}}}}},typeof(identity)},NTuple{5,Base.Broadcast.Extruded{CuDeviceArray{Float32,4,CUDAnative.AS.Global},NTuple{4,Bool},NTuple{4,Int64}}}}) at /home/avik-pal/.julia/packages/CUDAnative/FzmMm/src/execution.jl:177
 [9] _gpu_call(::Function, ::CuArray{Float32,4}, ::Tuple{CuArray{Float32,4},Base.Broadcast.Broadcasted{Nothing,NTuple{4,Base.OneTo{Int64}},getfield(Base.Broadcast, Symbol("##26#28")){getfield(Base.Broadcast, Symbol("##27#29")){typeof(+),getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##11#12"))}},getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##15#16"))}},getfield(Base.Broadcast, Symbol("##27#29")){typeof(*),getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##11#12"))}},getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##15#16"))}},getfield(Base.Broadcast, Symbol("##5#6")){getfield(Base.Broadcast, Symbol("##27#29")){typeof(/),getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##11#12"))}},getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##15#16"))}},getfield(Base.Broadcast, Symbol("##27#29")){typeof(-),getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##11#12"))}},getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##15#16"))}},getfield(Base.Broadcast, Symbol("##5#6")){getfield(Base.Broadcast, Symbol("##5#6")){getfield(Base.Broadcast, Symbol("##5#6")){getfield(Base.Broadcast, Symbol("##5#6")){getfield(Base.Broadcast, Symbol("##3#4"))}}}}}}}}},typeof(identity)},NTuple{5,Base.Broadcast.Extruded{CuArray{Float32,4},NTuple{4,Bool},NTuple{4,Int64}}}}}, ::Tuple{Tuple{Int64},Tuple{Int64}}) at ./gcutils.jl:87
 [10] gpu_call(::Function, ::CuArray{Float32,4}, ::Tuple{CuArray{Float32,4},Base.Broadcast.Broadcasted{Nothing,NTuple{4,Base.OneTo{Int64}},getfield(Base.Broadcast, Symbol("##26#28")){getfield(Base.Broadcast, Symbol("##27#29")){typeof(+),getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##11#12"))}},getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##15#16"))}},getfield(Base.Broadcast, Symbol("##27#29")){typeof(*),getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##11#12"))}},getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##15#16"))}},getfield(Base.Broadcast, Symbol("##5#6")){getfield(Base.Broadcast, Symbol("##27#29")){typeof(/),getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##11#12"))}},getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##15#16"))}},getfield(Base.Broadcast, Symbol("##27#29")){typeof(-),getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##11#12"))}},getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##15#16"))}},getfield(Base.Broadcast, Symbol("##5#6")){getfield(Base.Broadcast, Symbol("##5#6")){getfield(Base.Broadcast, Symbol("##5#6")){getfield(Base.Broadcast, Symbol("##5#6")){getfield(Base.Broadcast, Symbol("##3#4"))}}}}}}}}},typeof(identity)},NTuple{5,Base.Broadcast.Extruded{CuArray{Float32,4},NTuple{4,Bool},NTuple{4,Int64}}}}}, ::Int64) at /home/avik-pal/.julia/packages/GPUArrays/3E1qk/src/abstract_gpu_interface.jl:151
 [11] gpu_call at /home/avik-pal/.julia/packages/GPUArrays/3E1qk/src/abstract_gpu_interface.jl:128 [inlined]
 [12] copyto! at /home/avik-pal/.julia/packages/GPUArrays/3E1qk/src/broadcast.jl:14 [inlined]
 [13] copyto! at ./broadcast.jl:768 [inlined]
 [14] copy at ./broadcast.jl:744 [inlined]
 [15] materialize(::Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArray},Nothing,getfield(Base.Broadcast, Symbol("##26#28")){getfield(Base.Broadcast, Symbol("##27#29")){typeof(+),getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##11#12"))}},getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##15#16"))}},getfield(Base.Broadcast, Symbol("##27#29")){typeof(*),getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##11#12"))}},getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##15#16"))}},getfield(Base.Broadcast, Symbol("##5#6")){getfield(Base.Broadcast, Symbol("##27#29")){typeof(/),getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##11#12"))}},getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##15#16"))}},getfield(Base.Broadcast, Symbol("##27#29")){typeof(-),getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##11#12"))}},getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##15#16"))}},getfield(Base.Broadcast, Symbol("##5#6")){getfield(Base.Broadcast, Symbol("##5#6")){getfield(Base.Broadcast, Symbol("##5#6")){getfield(Base.Broadcast, Symbol("##5#6")){getfield(Base.Broadcast, Symbol("##3#4"))}}}}}}}}},typeof(identity)},NTuple{5,CuArray{Float32,4}}}) at ./broadcast.jl:724
 [16] broadcast(::getfield(Base.Broadcast, Symbol("##26#28")){getfield(Base.Broadcast, Symbol("##27#29")){typeof(+),getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##11#12"))}},getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##15#16"))}},getfield(Base.Broadcast, Symbol("##27#29")){typeof(*),getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##11#12"))}},getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##15#16"))}},getfield(Base.Broadcast, Symbol("##5#6")){getfield(Base.Broadcast, Symbol("##27#29")){typeof(/),getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##11#12"))}},getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##15#16"))}},getfield(Base.Broadcast, Symbol("##27#29")){typeof(-),getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##11#12"))}},getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##15#16"))}},getfield(Base.Broadcast, Symbol("##5#6")){getfield(Base.Broadcast, Symbol("##5#6")){getfield(Base.Broadcast, Symbol("##5#6")){getfield(Base.Broadcast, Symbol("##5#6")){getfield(Base.Broadcast, Symbol("##3#4"))}}}}}}}}},typeof(identity)}, ::CuArray{Float32,4}, ::CuArray{Float32,4}, ::CuArray{Float32,4}, ::Vararg{CuArray{Float32,4},N} where N) at ./broadcast.jl:702
 [17] ∇broadcast(::getfield(Base.Broadcast, Symbol("##26#28")){getfield(Base.Broadcast, Symbol("##27#29")){typeof(+),getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##11#12"))}},getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##15#16"))}},getfield(Base.Broadcast, Symbol("##27#29")){typeof(*),getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##11#12"))}},getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##15#16"))}},getfield(Base.Broadcast, Symbol("##5#6")){getfield(Base.Broadcast, Symbol("##27#29")){typeof(/),getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##11#12"))}},getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##15#16"))}},getfield(Base.Broadcast, Symbol("##27#29")){typeof(-),getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##11#12"))}},getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##15#16"))}},getfield(Base.Broadcast, Symbol("##5#6")){getfield(Base.Broadcast, Symbol("##5#6")){getfield(Base.Broadcast, Symbol("##5#6")){getfield(Base.Broadcast, Symbol("##5#6")){getfield(Base.Broadcast, Symbol("##3#4"))}}}}}}}}},typeof(identity)}, ::TrackedArray{…,CuArray{Float32,4}}, ::CuArray{Float32,4}, ::CuArray{Float32,4}, ::CuArray{Float32,4}, ::TrackedArray{…,CuArray{Float32,4}}) at /home/avik-pal/.julia/packages/Flux/jbpWo/src/tracker/array.jl:350
 [18] materialize(::Base.Broadcast.Broadcasted{Flux.Tracker.TrackedStyle,Nothing,typeof(identity),Tuple{Base.Broadcast.Broadcasted{Flux.Tracker.TrackedStyle,Nothing,typeof(+),Tuple{Base.Broadcast.Broadcasted{Flux.Tracker.TrackedStyle,Nothing,typeof(*),Tuple{TrackedArray{,CuArray{Float32,4}},Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArray},Nothing,typeof(/),Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArray},Nothing,typeof(-),Tuple{CuArray{Float32,4},CuArray{Float32,4}}},CuArray{Float32,4}}}}},TrackedArray{,CuArray{Float32,4}}}}}}) at /home/avik-pal/.julia/packages/Flux/jbpWo/src/tracker/array.jl:381
 [19] (::BatchNorm{typeof(identity),TrackedArray{…,CuArray{Float32,1}},CuArray{Float32,1},Float64})(::CuArray{Float32,4}) at /home/avik-pal/.julia/packages/Flux/jbpWo/src/layers/normalise.jl:143
 [20] macro expansion at /home/avik-pal/DeepLearningBenchmarks/src/flux/algorithm_benchmarks_gpu.jl:6 [inlined]
 [21] ##core#567(::BatchNorm{typeof(identity),TrackedArray{…,CuArray{Float32,1}},CuArray{Float32,1},Float64}, ::CuArray{Float32,4}) at /home/avik-pal/.julia/packages/BenchmarkTools/vesay/src/execution.jl:293
 [22] ##sample#568(::BenchmarkTools.Parameters) at /home/avik-pal/.julia/packages/BenchmarkTools/vesay/src/execution.jl:299
 [23] #_run#13(::Bool, ::String, ::Base.Iterators.Pairs{Symbol,Integer,NTuple{4,Symbol},NamedTuple{(:samples, :evals, :gctrial, :gcsample),Tuple{Int64,Int64,Bool,Bool}}}, ::Function, ::BenchmarkTools.Benchmark{Symbol("##benchmark#566")}, ::BenchmarkTools.Parameters) at /home/avik-pal/.julia/packages/BenchmarkTools/vesay/src/execution.jl:327
 [24] (::getfield(Base, Symbol("#inner#2")){Base.Iterators.Pairs{Symbol,Integer,NTuple{5,Symbol},NamedTuple{(:verbose, :samples, :evals, :gctrial, :gcsample),Tuple{Bool,Int64,Int64,Bool,Bool}}},typeof(BenchmarkTools._run),Tuple{BenchmarkTools.Benchmark{Symbol("##benchmark#566")},BenchmarkTools.Parameters}})() at ./none:0
 [25] #invokelatest#1 at ./essentials.jl:690 [inlined]
 [26] #invokelatest at ./none:0 [inlined]
 [27] #run_result#16 at /home/avik-pal/.julia/packages/BenchmarkTools/vesay/src/execution.jl:32 [inlined]
 [28] #run_result at ./none:0 [inlined]
 [29] #run#18(::Base.Iterators.Pairs{Symbol,Integer,NTuple{5,Symbol},NamedTuple{(:verbose, :samples, :evals, :gctrial, :gcsample),Tuple{Bool,Int64,Int64,Bool,Bool}}}, ::Function, ::BenchmarkTools.Benchmark{Symbol("##benchmark#566")}, ::BenchmarkTools.Parameters) at /home/avik-pal/.julia/packages/BenchmarkTools/vesay/src/execution.jl:46
 [30] #run at ./none:0 [inlined] (repeats 2 times)
 [31] #warmup#21 at /home/avik-pal/.julia/packages/BenchmarkTools/vesay/src/execution.jl:79 [inlined]
 [32] warmup(::BenchmarkTools.Benchmark{Symbol("##benchmark#566")}) at /home/avik-pal/.julia/packages/BenchmarkTools/vesay/src/execution.jl:79
 [33] macro expansion at /home/avik-pal/.julia/packages/BenchmarkTools/vesay/src/execution.jl:387 [inlined]
 [34] run_benchmarks() at /home/avik-pal/DeepLearningBenchmarks/src/flux/algorithm_benchmarks_gpu.jl:5
 [35] top-level scope at none:0
 [36] include at ./boot.jl:317 [inlined]
 [37] include_relative(::Module, ::String) at ./loading.jl:1038
 [38] include(::Module, ::String) at ./sysimg.jl:29
 [39] include(::String) at ./client.jl:388
 [40] top-level scope at none:0
in expression starting at /home/avik-pal/DeepLearningBenchmarks/src/flux/algorithm_benchmarks_gpu.jl:48
@terasakisatoshi
Copy link

same problem for me

I read source code of BatchNorm and checked type of members

using Flux
using CuArrays

bn=BatchNorm(3) |> gpu

@show typeof(bn.β)
@show typeof(bn.γ)
@show typeof(bn.λ)
@show typeof(bn.μ)
@show typeof(bn.σ)
@show typeof(bn.ϵ)
@show typeof(bn.momentum)

the result is

typeof(bn.β) = TrackedArray{…,CuArray{Float32,1}}
typeof(bn.γ) = TrackedArray{…,CuArray{Float32,1}}
typeof(bn.λ) = typeof(identity)
typeof(bn.μ) = CuArray{Float32,1}
typeof(bn.σ) = CuArray{Float32,1}
typeof(bn.ϵ) = Float64
typeof(bn.momentum) = Float64

I thought there are some inconsistency between Float32 and Float64 (or CuArray{Float32,1}) and some operations with them cause some error.

I tried to set \epsilon and momentum as Float32 like this:

bn=BatchNorm(3=Float32(1e-7),momentum=Float32(0.1)) |> gpu
@show typeof(bn.ϵ) # should be Float32
@show typeof(bn.momentum) # should be Float32

but this prescription has no effect and same error occurs.

@omi-key
Copy link

omi-key commented Oct 21, 2018

in the function (BN::BatchNorm)(x),

  let λ = BN.λ
    λ.(reshape(γ, affine_shape...) .* ((x .- μ) ./ σ) .+ reshape(β, affine_shape...)) #error
  end

changing this to

 let λ = BN.λ
   #λ.(reshape(γ, affine_shape...) .* ((x .- μ) ./ σ) .+ reshape(β, affine_shape...))
   foo = reshape(γ, affine_shape...) .* ((x .- μ) ./ σ)
   bar = reshape(β, affine_shape...) 
   λ.(foo .+ bar)
 end

works for me, but I don't understand why.

Though ϵ (in args) and momentum may be Float64, they are converted to the type T (= eltype(x)). I think it doesn't matter.

@yashbhalgat
Copy link

Anybody found a solution to this?

1 similar comment
@yashbhalgat
Copy link

Anybody found a solution to this?

@avik-pal
Copy link
Member Author

avik-pal commented Dec 17, 2018

Closing this as this is fixed on master.

@staticfloat
Copy link
Contributor

It looks like it might be some kind of inference failure, as breaking up the computation like that fixes things. @avik-pal I do not think this is fixed on master; using the latest master of Flux still breaks with the above error using the following MWE:

using CuArrays, Flux
μ = gpu(randn(Float32, 1, 1, 8, 1)); σ = μ; β = param(μ); γ = param(μ);
x = gpu(randn(Float32, 10, 10, 8, 1))
γ .* ((x .- μ) ./ σ) .+ β

It appears that breaking up any part of the computation causes things to no longer break:

temp = γ .* ((x .- μ) ./ σ)
temp .+ β

@maleadt do you have any idea why this would be happening? I tried using @device_code_typed and friends to figure out where the boxes are coming from per @vchuravy's advice, but unfortunately I haven't had much luck, since the error prevents the printing of most of those macros. @device_code_llvm does work, but there's an awful lot of code.

@ranjanan
Copy link
Contributor

I ran into this problem too, and I'm looking forward to the fix.

@xukai92
Copy link

xukai92 commented Oct 25, 2019

BatchNorm is still buggy, with the current master branch of Flux and CuArrays. See MWE below:

julia> using Flux;
julia> x = rand(10, 5) |> gpu;
julia> f = BatchNorm(10) |> gpu;
julia> f(x)
ERROR: ArgumentError: cannot take the CPU address of a CuArrays.CuArray{Float32,1}
Stacktrace:
 [1] cconvert(::Type{Ptr{Nothing}}, ::CuArrays.CuArray{Float32,1}) at /afs/inf.ed.ac.uk/user/s16/s1672897/.julia/packages/CuArrays/nhhwE/src/array.jl:152
 [2] macro expansion at /afs/inf.ed.ac.uk/user/s16/s1672897/.julia/packages/CuArrays/nhhwE/src/dnn/error.jl:17 [inlined]
 [3] cudnnBatchNormalizationForwardInference(::Ptr{Nothing}, ::CuArrays.CUDNN.cudnnBatchNormMode_t, ::Base.RefValue{Float32}, ::Base.RefValue{Float32}, ::CuArrays.CUDNN.TensorDesc, ::CuArrays.CuArray{Float32,4}, ::CuArrays.CUDNN.TensorDesc, ::CuArrays.CuArray{Float32,4}, ::CuArrays.CUDNN.TensorDesc, ::CuArrays.CuArray{Float32,1}, ::CuArrays.CuArray{Float32,1}, ::CuArrays.CuArray{Float32,1}, ::CuArrays.CuArray{Float32,1}, ::Float32) at /afs/inf.ed.ac.uk/user/s16/s1672897/.julia/packages/CuArrays/nhhwE/src/dnn/libcudnn.jl:1016
 [4] #cudnnBNForward!#120(::Nothing, ::Int64, ::Int64, ::Float32, ::Bool, ::typeof(CuArrays.CUDNN.cudnnBNForward!), ::CuArrays.CuArray{Float32,4}, ::CuArrays.CuArray{Float32,1}, ::CuArrays.CuArray{Float32,1}, ::CuArrays.CuArray{Float32,4}, ::CuArrays.CuArray{Float32,1}, ::CuArrays.CuArray{Float32,1}, ::Float32) at /afs/inf.ed.ac.uk/user/s16/s1672897/.julia/packages/CuArrays/nhhwE/src/dnn/batchnorm.jl:62
 [5] #cudnnBNForward! at ./none:0 [inlined]
 [6] #batchnorm#119 at /afs/inf.ed.ac.uk/user/s16/s1672897/.julia/packages/CuArrays/nhhwE/src/dnn/batchnorm.jl:26 [inlined]
 [7] #batchnorm at ./none:0 [inlined]
 [8] #batchnorm#118 at /afs/inf.ed.ac.uk/user/s16/s1672897/.julia/packages/CuArrays/nhhwE/src/dnn/batchnorm.jl:17 [inlined]
 [9] #batchnorm at ./none:0 [inlined]
 [10] (::BatchNorm{typeof(identity),CuArrays.CuArray{Float32,1},CuArrays.CuArray{Float32,1},Float32})(::CuArrays.CuArray{Float32,2}, ::Nothing) at /afs/inf.ed.ac.uk/user/s16/s1672897/.julia/packages/Flux/E2wBe/src/cuda/cudnn.jl:4 (repeats 2 times)
 [11] top-level scope at REPL[4]:1

I looked into the functions used in stacktrace but didn't manage to find things obvious to me. I'm happy to dig into this as I need to use it. Any idea? @MikeInnes @maleadt

@xukai92
Copy link

xukai92 commented Oct 25, 2019

PR in CuAarrays to fix this: JuliaGPU/CuArrays.jl#464

@afternone
Copy link

Another error of BatchNorm on GPU:

julia> using Flux

julia> m = Chain(
                Dense(28^2, 64),
                BatchNorm(64, relu),
                Dense(64, 10),
                BatchNorm(10),
                softmax)
Chain(Dense(784, 64), BatchNorm(64, λ = relu), Dense(64, 10), BatchNorm(10), softmax)

julia> gpu(m)(gpu(rand(28^2, 10)))
ERROR: CUDNNError(code CUDNN_STATUS_BAD_PARAM, CUDNN_STATUS_BAD_PARAM)
Stacktrace:
 [1] cudnnBatchNormalizationForwardInference(::Ptr{Nothing}, ::CuArrays.CUDNN.cudnnBatchNormMode_t, ::Base.RefValue{Float32}, ::Base.RefValue{Float32}, ::CuArrays.CUDNN.TensorDesc, ::CuArrays.CuArray{Float32,4,CuArrays.CuArray{Float32,2,Nothing}}, ::CuArrays.CUDNN.TensorDesc, ::CuArrays.CuArray{Float32,4,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArrays.CuArray{Float32,1,Nothing}, ::CuArrays.CuArray{Float32,1,Nothing}, ::CuArrays.CuArray{Float32,1,Nothing}, ::CuArrays.CuArray{Float32,1,Nothing}, ::Float32) at C:\Users\shi\.julia\packages\CuArrays\ZYCpV\src\dnn\error.jl:19
 [2] #cudnnBNForward!#344(::Nothing, ::Int64, ::Int64, ::Float32, ::Bool, ::typeof(CuArrays.CUDNN.cudnnBNForward!), ::CuArrays.CuArray{Float32,4,Nothing}, ::CuArrays.CuArray{Float32,1,Nothing}, ::CuArrays.CuArray{Float32,1,Nothing}, ::CuArrays.CuArray{Float32,4,CuArrays.CuArray{Float32,2,Nothing}}, ::CuArrays.CuArray{Float32,1,Nothing}, ::CuArrays.CuArray{Float32,1,Nothing}, ::Float32) at C:\Users\shi\.julia\packages\CuArrays\ZYCpV\src\dnn\batchnorm.jl:62
 [3] #cudnnBNForward! at .\none:0 [inlined]
 [4] #batchnorm#343 at C:\Users\shi\.julia\packages\CuArrays\ZYCpV\src\dnn\batchnorm.jl:26 [inlined]
 [5] #batchnorm at .\none:0 [inlined]
 [6] #batchnorm#342(::Nothing, ::Int64, ::Int64, ::Float32, ::Bool, ::typeof(CuArrays.CUDNN.batchnorm), ::CuArrays.CuArray{Float32,1,Nothing}, ::CuArrays.CuArray{Float32,1,Nothing}, ::CuArrays.CuArray{Float32,2,Nothing}, ::CuArrays.CuArray{Float32,1,Nothing}, ::CuArrays.CuArray{Float32,1,Nothing}, ::Float32) at C:\Users\shi\.julia\packages\CuArrays\ZYCpV\src\dnn\batchnorm.jl:17
 [7] BatchNorm at .\none:0 [inlined]
 [8] BatchNorm at C:\Users\shi\.julia\packages\Flux\oX9Pi\src\cuda\cudnn.jl:4 [inlined]
 [9] applychain at C:\Users\shi\.julia\packages\Flux\oX9Pi\src\layers\basic.jl:30 [inlined] (repeats 2 times)
 [10] (::Chain{Tuple{Dense{typeof(identity),CuArrays.CuArray{Float32,2,Nothing},CuArrays.CuArray{Float32,1,Nothing}},BatchNorm{typeof(relu),CuArrays.CuArray{Float32,1,Nothing},CuArrays.CuArray{Float32,1,Nothing},Float32},Dense{typeof(identity),CuArrays.CuArray{Float32,2,Nothing},CuArrays.CuArray{Float32,1,Nothing}},BatchNorm{typeof(identity),CuArrays.CuArray{Float32,1,Nothing},CuArrays.CuArray{Float32,1,Nothing},Float32},typeof(softmax)}})(::CuArrays.CuArray{Float32,2,Nothing}) at C:\Users\shi\.julia\packages\Flux\oX9Pi\src\layers\basic.jl:32
 [11] top-level scope at REPL[5]:1

@CarloLucibello
Copy link
Member

The problem is fixed on last release (v0.11)

julia> using Flux

julia> m = Chain(
                              Dense(28^2, 64),
                              BatchNorm(64, relu),
                              Dense(64, 10),
                              BatchNorm(10),
                              softmax) |> gpu

Chain(Dense(784, 64), BatchNorm(64, λ = relu), Dense(64, 10), BatchNorm(10), softmax)

julia> 

julia> m(gpu(rand(Float32, 28^2,10)))
10×10 Array{Float32,2}:
 0.0152502  0.0234013  0.0166635    0.0192587  0.0273978  0.0432131
 0.127888   0.223488   0.17404       0.130349   0.163127   0.181742
 0.0216491  0.0193224  0.0131356     0.0190182  0.0254123  0.0227821
 0.0560169  0.0569448  0.0406374     0.0340359  0.0550951  0.0430422
 0.141302   0.154263   0.18579       0.240547   0.137368   0.166967
 0.116726   0.0991394  0.14152      0.0637739  0.0638257  0.0526835
 0.095143   0.136513   0.0597864     0.114565   0.234256   0.0559822
 0.161634   0.132457   0.137533      0.166024   0.101083   0.101018
 0.0762043  0.0652032  0.127047      0.0997341  0.0523317  0.17962
 0.188187   0.0892672  0.103847      0.112695   0.140104   0.152949

julia> m(gpu(rand(Float64, 28^2,10)))
10×10 Array{Float32,2}:
 0.035228   0.0292133  0.0406775    0.0225865  0.0204292  0.0347961
 0.185115   0.2619     0.105639      0.297724   0.0867104  0.169069
 0.031076   0.0267011  0.0238993     0.026509   0.0134919  0.0347641
 0.070163   0.0277017  0.0405046     0.0581834  0.0222371  0.0426887
 0.277579   0.135808   0.195828      0.118733   0.225566   0.169406
 0.0576468  0.0849154  0.140865     0.0701037  0.0580264  0.0819516
 0.112284   0.0472886  0.0543785     0.0754303  0.074338   0.100449
 0.0742676  0.0656631  0.126945      0.104323   0.228286   0.103572
 0.0760779  0.235139   0.157998      0.146371   0.15624    0.075284
 0.0805631  0.0856701  0.113265      0.0800346  0.114675   0.18802

@terasakisatoshi
Copy link

Thank you!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants