This repository has been archived by the owner on Mar 12, 2021. It is now read-only.
This repository has been archived by the owner on Mar 12, 2021. It is now read-only.
mapreduce
(sum
, prod
, etc.) fail in some cases when given a dims
argument. #583
Closed
Description
openedon Feb 3, 2020
Describe the bug
mapreduce(f, op, A...; dims = dims)
and friends (sum(f, A; dims = dims)
, prod(f, A; dims = dims)
...) fail for many (but not all) functions f
when a dims ≠ :
argument is given.
To Reproduce
The Minimal Working Example (MWE) for this bug:
julia> x = cu(rand(3,3))
3×3 CuArray{Float32,2,Nothing}:
0.849469 0.625782 0.38785
0.877458 0.295448 0.0183218
0.285424 0.496025 0.0742507
julia> sum(abs, x, dims=1) #Okay also when f = abs2
1×3 CuArray{Float32,2,Nothing}:
2.01235 1.41725 0.480422
julia> sum(cos, x) #This is fine when dims = :
7.8284926f0
julia> sum(cos, x, dims = 1) #Fails for f ∈ (sin, sqrt, ...) as well...
┌ Warning: calls to Base intrinsics might be GPU incompatible
│ exception =
│ You called cos(x::T) where T<:Union{Float32, Float64} in Base.Math at special/trig.jl:100, maybe you intended to call cos(x::Float32) in CUDAnative at /home/troels/.julia/packages/CUDAnative/KWTMt/src/device/cuda/math.jl:6 instead?
│ Stacktrace:
│ [1] cos at special/trig.jl:100
│ [2] mapreducedim_kernel_parallel at /home/troels/.julia/packages/CuArrays/OiLYC/src/mapreduce.jl:20
└ @ CUDAnative ~/.julia/packages/CUDAnative/KWTMt/src/compiler/irgen.jl:111
┌ Warning: calls to Base intrinsics might be GPU incompatible
│ exception =
│ You called cos(x::T) where T<:Union{Float32, Float64} in Base.Math at special/trig.jl:100, maybe you intended to call cos(x::Float32) in CUDAnative at /home/troels/.julia/packages/CUDAnative/KWTMt/src/device/cuda/math.jl:6 instead?
│ Stacktrace:
│ [1] cos at special/trig.jl:100
│ [2] mapreducedim_kernel_parallel at /home/troels/.julia/packages/CuArrays/OiLYC/src/mapreduce.jl:20
└ @ CUDAnative ~/.julia/packages/CUDAnative/KWTMt/src/compiler/irgen.jl:111
ERROR: LLVM error: Cannot select: 0x690cc50: i64,glue = sube Constant:i64<0>, 0x690cb80, 0x690cbe8:1
0x6909ec8: i64 = Constant<0>
0x690cb80: i64 = add 0x6909d28, 0x690cb18
0x6909d28: i64 = add 0x690a4e0, 0x690a208
0x690a4e0: i64 = mul 0x6909d90, 0x690a0d0
0x6909d90: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %13
0x690bce0: i64 = Register %13
0x690a0d0: i64 = or 0x690aa28, Constant:i64<4503599627370496>
0x690aa28: i64 = and 0x690c020, Constant:i64<4503599627370495>
0x690c020: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %0
0x6909df8: i64 = Register %0
0x690a680: i64 = Constant<4503599627370495>
0x6909cc0: i64 = Constant<4503599627370496>
0x690a208: i64 = mulhu 0x690be80, 0x690a0d0
0x690be80: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %14
0x690a750: i64 = Register %14
0x690a0d0: i64 = or 0x690aa28, Constant:i64<4503599627370496>
0x690aa28: i64 = and 0x690c020, Constant:i64<4503599627370495>
0x690c020: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %0
0x6909df8: i64 = Register %0
0x690a680: i64 = Constant<4503599627370495>
0x6909cc0: i64 = Constant<4503599627370496>
0x690cb18: i64 = select 0x690cab0, Constant:i64<1>, 0x690a7b8
0x690cab0: i1 = setcc 0x690c978, 0x690c910, setult:ch
0x690c978: i64 = add 0x690a5b0, 0x690c910
0x690a5b0: i64 = mul 0x690be80, 0x690a0d0
0x690be80: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %14
0x690a750: i64 = Register %14
0x690a0d0: i64 = or 0x690aa28, Constant:i64<4503599627370496>
0x690aa28: i64 = and 0x690c020, Constant:i64<4503599627370495>
0x690c020: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %0
0x6909df8: i64 = Register %0
0x690a680: i64 = Constant<4503599627370495>
0x6909cc0: i64 = Constant<4503599627370496>
0x690c910: i64 = mulhu 0x690a138, 0x690a0d0
0x690a138: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %15
0x690c088: i64 = Register %15
0x690a0d0: i64 = or 0x690aa28, Constant:i64<4503599627370496>
0x690aa28: i64 = and 0x690c020, Constant:i64<4503599627370495>
0x690c020: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %0
0x6909df8: i64 = Register %0
0x690a680: i64 = Constant<4503599627370495>
0x6909cc0: i64 = Constant<4503599627370496>
0x690c910: i64 = mulhu 0x690a138, 0x690a0d0
0x690a138: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %15
0x690c088: i64 = Register %15
0x690a0d0: i64 = or 0x690aa28, Constant:i64<4503599627370496>
0x690aa28: i64 = and 0x690c020, Constant:i64<4503599627370495>
0x690c020: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %0
0x6909df8: i64 = Register %0
0x690a680: i64 = Constant<4503599627370495>
0x6909cc0: i64 = Constant<4503599627370496>
0x690ab60: i64 = Constant<1>
0x690a7b8: i64 = zero_extend 0x690c9e0
0x690c9e0: i1 = setcc 0x690c978, 0x690a5b0, setult:ch
0x690c978: i64 = add 0x690a5b0, 0x690c910
0x690a5b0: i64 = mul 0x690be80, 0x690a0d0
0x690be80: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %14
0x690a750: i64 = Register %14
0x690a0d0: i64 = or 0x690aa28, Constant:i64<4503599627370496>
0x690aa28: i64 = and 0x690c020, Constant:i64<4503599627370495>
0x690c020: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %0
0x690a680: i64 = Constant<4503599627370495>
0x6909cc0: i64 = Constant<4503599627370496>
0x690c910: i64 = mulhu 0x690a138, 0x690a0d0
0x690a138: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %15
0x690c088: i64 = Register %15
0x690a0d0: i64 = or 0x690aa28, Constant:i64<4503599627370496>
0x690aa28: i64 = and 0x690c020, Constant:i64<4503599627370495>
0x690c020: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %0
0x690a680: i64 = Constant<4503599627370495>
0x6909cc0: i64 = Constant<4503599627370496>
0x690a5b0: i64 = mul 0x690be80, 0x690a0d0
0x690be80: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %14
0x690a750: i64 = Register %14
0x690a0d0: i64 = or 0x690aa28, Constant:i64<4503599627370496>
0x690aa28: i64 = and 0x690c020, Constant:i64<4503599627370495>
0x690c020: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %0
0x6909df8: i64 = Register %0
0x690a680: i64 = Constant<4503599627370495>
0x6909cc0: i64 = Constant<4503599627370496>
0x690cbe8: i64,glue = subc Constant:i64<0>, 0x690c978
0x6909ec8: i64 = Constant<0>
0x690c978: i64 = add 0x690a5b0, 0x690c910
0x690a5b0: i64 = mul 0x690be80, 0x690a0d0
0x690be80: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %14
0x690a750: i64 = Register %14
0x690a0d0: i64 = or 0x690aa28, Constant:i64<4503599627370496>
0x690aa28: i64 = and 0x690c020, Constant:i64<4503599627370495>
0x690c020: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %0
0x6909df8: i64 = Register %0
0x690a680: i64 = Constant<4503599627370495>
0x6909cc0: i64 = Constant<4503599627370496>
0x690c910: i64 = mulhu 0x690a138, 0x690a0d0
0x690a138: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %15
0x690c088: i64 = Register %15
0x690a0d0: i64 = or 0x690aa28, Constant:i64<4503599627370496>
0x690aa28: i64 = and 0x690c020, Constant:i64<4503599627370495>
0x690c020: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %0
0x6909df8: i64 = Register %0
0x690a680: i64 = Constant<4503599627370495>
0x6909cc0: i64 = Constant<4503599627370496>
In function: julia_paynehanek_18536
Stacktrace:
[1] handle_error(::Cstring) at /home/troels/.julia/packages/LLVM/DAnFH/src/core/context.jl:103
[2] macro expansion at /home/troels/.julia/packages/LLVM/DAnFH/src/base.jl:18 [inlined]
[3] LLVMTargetMachineEmitToMemoryBuffer at /home/troels/.julia/packages/LLVM/DAnFH/lib/6.0/libLLVM_h.jl:2726 [inlined]
[4] emit(::LLVM.TargetMachine, ::LLVM.Module, ::LLVM.API.LLVMCodeGenFileType) at /home/troels/.julia/packages/LLVM/DAnFH/src/targetmachine.jl:42
[5] mcgen(::CUDAnative.CompilerJob, ::LLVM.Module, ::LLVM.Function) at /home/troels/.julia/packages/CUDAnative/KWTMt/src/compiler/mcgen.jl:87
[6] macro expansion at /home/troels/.julia/packages/TimerOutputs/7Id5J/src/TimerOutput.jl:228 [inlined]
[7] macro expansion at /home/troels/.julia/packages/CUDAnative/KWTMt/src/compiler/driver.jl:209 [inlined]
[8] macro expansion at /home/troels/.julia/packages/TimerOutputs/7Id5J/src/TimerOutput.jl:228 [inlined]
[9] #codegen#154(::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::typeof(CUDAnative.codegen), ::Symbol, ::CUDAnative.CompilerJob) at /home/troels/.julia/packages/CUDAnative/KWTMt/src/compiler/driver.jl:206
[10] #codegen at ./none:0 [inlined]
[11] #compile#153(::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::typeof(CUDAnative.compile), ::Symbol, ::CUDAnative.CompilerJob) at /home/troels/.julia/packages/CUDAnative/KWTMt/src/compiler/driver.jl:52
[12] #compile at ./none:0 [inlined]
[13] #compile#152 at /home/troels/.julia/packages/CUDAnative/KWTMt/src/compiler/driver.jl:33 [inlined]
[14] #compile at ./none:0 [inlined] (repeats 2 times)
[15] macro expansion at /home/troels/.julia/packages/CUDAnative/KWTMt/src/execution.jl:393 [inlined]
[16] #cufunction#198(::Nothing, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(CUDAnative.cufunction), ::typeof(CuArrays.mapreducedim_kernel_parallel), ::Type{Tuple{typeof(cos),typeof(Base.add_sum),CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global},CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global},CartesianIndices{2,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}},Int64,Int64}}) at /home/troels/.julia/packages/CUDAnative/KWTMt/src/execution.jl:360
[17] cufunction(::Function, ::Type) at /home/troels/.julia/packages/CUDAnative/KWTMt/src/execution.jl:360
[18] macro expansion at /home/troels/.julia/packages/CuArrays/OiLYC/src/mapreduce.jl:61 [inlined]
[19] macro expansion at ./gcutils.jl:91 [inlined]
[20] _mapreducedim!(::Function, ::Function, ::CuArray{Float32,2,Nothing}, ::CuArray{Float32,2,Nothing}) at /home/troels/.julia/packages/CuArrays/OiLYC/src/mapreduce.jl:58
[21] mapreducedim!(::Function, ::Function, ::CuArray{Float32,2,Nothing}, ::CuArray{Float32,2,Nothing}) at ./reducedim.jl:274
[22] _mapreduce_dim(::Function, ::Function, ::NamedTuple{(),Tuple{}}, ::CuArray{Float32,2,Nothing}, ::Int64) at ./reducedim.jl:317
[23] mapreduce_impl at /home/troels/.julia/packages/GPUArrays/dhirJ/src/host/mapreduce.jl:78 [inlined]
[24] #mapreduce#29 at /home/troels/.julia/packages/GPUArrays/dhirJ/src/host/mapreduce.jl:64 [inlined]
[25] #mapreduce at ./none:0 [inlined]
[26] _sum at ./reducedim.jl:679 [inlined]
[27] #sum#588 at ./reducedim.jl:653 [inlined]
[28] (::Base.var"#kw##sum")(::NamedTuple{(:dims,),Tuple{Int64}}, ::typeof(sum), ::Function, ::CuArray{Float32,2,Nothing}) at ./none:0
[29] top-level scope at REPL[4]:1
Environment details
Details on Julia:
julia> versioninfo()
Julia Version 1.3.1
Commit 2d5741174c (2019-12-30 21:36 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Julia packages:
(v1.3) pkg> st CuArrays
Status `~/.julia/environments/v1.3/Project.toml`
[79e6a3ab] Adapt v1.0.0
[fa961155] CEnum v0.2.0
[3895d2a7] CUDAapi v2.1.0 #master (https://github.com/JuliaGPU/CUDAapi.jl.git)
[c5f51814] CUDAdrv v5.0.1 #master (https://github.com/JuliaGPU/CUDAdrv.jl.git)
[be33ccc6] CUDAnative v2.9.1 #master (https://github.com/JuliaGPU/CUDAnative.jl.git)
[3a865a2d] CuArrays v1.7.0 #master (https://github.com/JuliaGPU/CuArrays.jl.git)
[864edb3b] DataStructures v0.17.9
[0c68f7d7] GPUArrays v2.0.1 #master (https://github.com/JuliaGPU/GPUArrays.jl.git)
[1914dd2f] MacroTools v0.5.3
[872c559c] NNlib v0.6.4
[189a3867] Reexport v0.2.0
CUDA: toolkit and driver version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Fri_Feb__8_19:08:17_PST_2019
Cuda compilation tools, release 10.1, V10.1.105
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment