Skip to content
This repository has been archived by the owner on Mar 12, 2021. It is now read-only.
This repository has been archived by the owner on Mar 12, 2021. It is now read-only.

mapreduce (sum, prod, etc.) fail in some cases when given a dims argument. #583

Closed

Description

Describe the bug
mapreduce(f, op, A...; dims = dims) and friends (sum(f, A; dims = dims), prod(f, A; dims = dims)...) fail for many (but not all) functions f when a dims ≠ : argument is given.

To Reproduce
The Minimal Working Example (MWE) for this bug:

julia> x = cu(rand(3,3))
3×3 CuArray{Float32,2,Nothing}:
 0.849469  0.625782  0.38785  
 0.877458  0.295448  0.0183218
 0.285424  0.496025  0.0742507

julia> sum(abs, x, dims=1) #Okay also when f = abs2
1×3 CuArray{Float32,2,Nothing}:
 2.01235  1.41725  0.480422

julia> sum(cos, x) #This is fine when dims = :
7.8284926f0

julia> sum(cos, x, dims = 1) #Fails for f ∈ (sin, sqrt, ...) as well...
┌ Warning: calls to Base intrinsics might be GPU incompatible
│   exception =
│    You called cos(x::T) where T<:Union{Float32, Float64} in Base.Math at special/trig.jl:100, maybe you intended to call cos(x::Float32) in CUDAnative at /home/troels/.julia/packages/CUDAnative/KWTMt/src/device/cuda/math.jl:6 instead?
│    Stacktrace:
│     [1] cos at special/trig.jl:100
│     [2] mapreducedim_kernel_parallel at /home/troels/.julia/packages/CuArrays/OiLYC/src/mapreduce.jl:20
└ @ CUDAnative ~/.julia/packages/CUDAnative/KWTMt/src/compiler/irgen.jl:111
┌ Warning: calls to Base intrinsics might be GPU incompatible
│   exception =
│    You called cos(x::T) where T<:Union{Float32, Float64} in Base.Math at special/trig.jl:100, maybe you intended to call cos(x::Float32) in CUDAnative at /home/troels/.julia/packages/CUDAnative/KWTMt/src/device/cuda/math.jl:6 instead?
│    Stacktrace:
│     [1] cos at special/trig.jl:100
│     [2] mapreducedim_kernel_parallel at /home/troels/.julia/packages/CuArrays/OiLYC/src/mapreduce.jl:20
└ @ CUDAnative ~/.julia/packages/CUDAnative/KWTMt/src/compiler/irgen.jl:111
ERROR: LLVM error: Cannot select: 0x690cc50: i64,glue = sube Constant:i64<0>, 0x690cb80, 0x690cbe8:1
  0x6909ec8: i64 = Constant<0>
  0x690cb80: i64 = add 0x6909d28, 0x690cb18
    0x6909d28: i64 = add 0x690a4e0, 0x690a208
      0x690a4e0: i64 = mul 0x6909d90, 0x690a0d0
        0x6909d90: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %13
          0x690bce0: i64 = Register %13
        0x690a0d0: i64 = or 0x690aa28, Constant:i64<4503599627370496>
          0x690aa28: i64 = and 0x690c020, Constant:i64<4503599627370495>
            0x690c020: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %0
              0x6909df8: i64 = Register %0
            0x690a680: i64 = Constant<4503599627370495>
          0x6909cc0: i64 = Constant<4503599627370496>
      0x690a208: i64 = mulhu 0x690be80, 0x690a0d0
        0x690be80: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %14
          0x690a750: i64 = Register %14
        0x690a0d0: i64 = or 0x690aa28, Constant:i64<4503599627370496>
          0x690aa28: i64 = and 0x690c020, Constant:i64<4503599627370495>
            0x690c020: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %0
              0x6909df8: i64 = Register %0
            0x690a680: i64 = Constant<4503599627370495>
          0x6909cc0: i64 = Constant<4503599627370496>
    0x690cb18: i64 = select 0x690cab0, Constant:i64<1>, 0x690a7b8
      0x690cab0: i1 = setcc 0x690c978, 0x690c910, setult:ch
        0x690c978: i64 = add 0x690a5b0, 0x690c910
          0x690a5b0: i64 = mul 0x690be80, 0x690a0d0
            0x690be80: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %14
              0x690a750: i64 = Register %14
            0x690a0d0: i64 = or 0x690aa28, Constant:i64<4503599627370496>
              0x690aa28: i64 = and 0x690c020, Constant:i64<4503599627370495>
                0x690c020: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %0
                  0x6909df8: i64 = Register %0
                0x690a680: i64 = Constant<4503599627370495>
              0x6909cc0: i64 = Constant<4503599627370496>
          0x690c910: i64 = mulhu 0x690a138, 0x690a0d0
            0x690a138: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %15
              0x690c088: i64 = Register %15
            0x690a0d0: i64 = or 0x690aa28, Constant:i64<4503599627370496>
              0x690aa28: i64 = and 0x690c020, Constant:i64<4503599627370495>
                0x690c020: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %0
                  0x6909df8: i64 = Register %0
                0x690a680: i64 = Constant<4503599627370495>
              0x6909cc0: i64 = Constant<4503599627370496>
        0x690c910: i64 = mulhu 0x690a138, 0x690a0d0
          0x690a138: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %15
            0x690c088: i64 = Register %15
          0x690a0d0: i64 = or 0x690aa28, Constant:i64<4503599627370496>
            0x690aa28: i64 = and 0x690c020, Constant:i64<4503599627370495>
              0x690c020: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %0
                0x6909df8: i64 = Register %0
              0x690a680: i64 = Constant<4503599627370495>
            0x6909cc0: i64 = Constant<4503599627370496>
      0x690ab60: i64 = Constant<1>
      0x690a7b8: i64 = zero_extend 0x690c9e0
        0x690c9e0: i1 = setcc 0x690c978, 0x690a5b0, setult:ch
          0x690c978: i64 = add 0x690a5b0, 0x690c910
            0x690a5b0: i64 = mul 0x690be80, 0x690a0d0
              0x690be80: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %14
                0x690a750: i64 = Register %14
              0x690a0d0: i64 = or 0x690aa28, Constant:i64<4503599627370496>
                0x690aa28: i64 = and 0x690c020, Constant:i64<4503599627370495>
                  0x690c020: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %0

                  0x690a680: i64 = Constant<4503599627370495>
                0x6909cc0: i64 = Constant<4503599627370496>
            0x690c910: i64 = mulhu 0x690a138, 0x690a0d0
              0x690a138: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %15
                0x690c088: i64 = Register %15
              0x690a0d0: i64 = or 0x690aa28, Constant:i64<4503599627370496>
                0x690aa28: i64 = and 0x690c020, Constant:i64<4503599627370495>
                  0x690c020: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %0

                  0x690a680: i64 = Constant<4503599627370495>
                0x6909cc0: i64 = Constant<4503599627370496>
          0x690a5b0: i64 = mul 0x690be80, 0x690a0d0
            0x690be80: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %14
              0x690a750: i64 = Register %14
            0x690a0d0: i64 = or 0x690aa28, Constant:i64<4503599627370496>
              0x690aa28: i64 = and 0x690c020, Constant:i64<4503599627370495>
                0x690c020: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %0
                  0x6909df8: i64 = Register %0
                0x690a680: i64 = Constant<4503599627370495>
              0x6909cc0: i64 = Constant<4503599627370496>
  0x690cbe8: i64,glue = subc Constant:i64<0>, 0x690c978
    0x6909ec8: i64 = Constant<0>
    0x690c978: i64 = add 0x690a5b0, 0x690c910
      0x690a5b0: i64 = mul 0x690be80, 0x690a0d0
        0x690be80: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %14
          0x690a750: i64 = Register %14
        0x690a0d0: i64 = or 0x690aa28, Constant:i64<4503599627370496>
          0x690aa28: i64 = and 0x690c020, Constant:i64<4503599627370495>
            0x690c020: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %0
              0x6909df8: i64 = Register %0
            0x690a680: i64 = Constant<4503599627370495>
          0x6909cc0: i64 = Constant<4503599627370496>
      0x690c910: i64 = mulhu 0x690a138, 0x690a0d0
        0x690a138: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %15
          0x690c088: i64 = Register %15
        0x690a0d0: i64 = or 0x690aa28, Constant:i64<4503599627370496>
          0x690aa28: i64 = and 0x690c020, Constant:i64<4503599627370495>
            0x690c020: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %0
              0x6909df8: i64 = Register %0
            0x690a680: i64 = Constant<4503599627370495>
          0x6909cc0: i64 = Constant<4503599627370496>
In function: julia_paynehanek_18536
Stacktrace:
 [1] handle_error(::Cstring) at /home/troels/.julia/packages/LLVM/DAnFH/src/core/context.jl:103
 [2] macro expansion at /home/troels/.julia/packages/LLVM/DAnFH/src/base.jl:18 [inlined]
 [3] LLVMTargetMachineEmitToMemoryBuffer at /home/troels/.julia/packages/LLVM/DAnFH/lib/6.0/libLLVM_h.jl:2726 [inlined]
 [4] emit(::LLVM.TargetMachine, ::LLVM.Module, ::LLVM.API.LLVMCodeGenFileType) at /home/troels/.julia/packages/LLVM/DAnFH/src/targetmachine.jl:42
 [5] mcgen(::CUDAnative.CompilerJob, ::LLVM.Module, ::LLVM.Function) at /home/troels/.julia/packages/CUDAnative/KWTMt/src/compiler/mcgen.jl:87
 [6] macro expansion at /home/troels/.julia/packages/TimerOutputs/7Id5J/src/TimerOutput.jl:228 [inlined]
 [7] macro expansion at /home/troels/.julia/packages/CUDAnative/KWTMt/src/compiler/driver.jl:209 [inlined]
 [8] macro expansion at /home/troels/.julia/packages/TimerOutputs/7Id5J/src/TimerOutput.jl:228 [inlined]
 [9] #codegen#154(::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::typeof(CUDAnative.codegen), ::Symbol, ::CUDAnative.CompilerJob) at /home/troels/.julia/packages/CUDAnative/KWTMt/src/compiler/driver.jl:206
 [10] #codegen at ./none:0 [inlined]
 [11] #compile#153(::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::typeof(CUDAnative.compile), ::Symbol, ::CUDAnative.CompilerJob) at /home/troels/.julia/packages/CUDAnative/KWTMt/src/compiler/driver.jl:52
 [12] #compile at ./none:0 [inlined]
 [13] #compile#152 at /home/troels/.julia/packages/CUDAnative/KWTMt/src/compiler/driver.jl:33 [inlined]
 [14] #compile at ./none:0 [inlined] (repeats 2 times)
 [15] macro expansion at /home/troels/.julia/packages/CUDAnative/KWTMt/src/execution.jl:393 [inlined]
 [16] #cufunction#198(::Nothing, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(CUDAnative.cufunction), ::typeof(CuArrays.mapreducedim_kernel_parallel), ::Type{Tuple{typeof(cos),typeof(Base.add_sum),CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global},CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global},CartesianIndices{2,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}},Int64,Int64}}) at /home/troels/.julia/packages/CUDAnative/KWTMt/src/execution.jl:360
 [17] cufunction(::Function, ::Type) at /home/troels/.julia/packages/CUDAnative/KWTMt/src/execution.jl:360
 [18] macro expansion at /home/troels/.julia/packages/CuArrays/OiLYC/src/mapreduce.jl:61 [inlined]
 [19] macro expansion at ./gcutils.jl:91 [inlined]
 [20] _mapreducedim!(::Function, ::Function, ::CuArray{Float32,2,Nothing}, ::CuArray{Float32,2,Nothing}) at /home/troels/.julia/packages/CuArrays/OiLYC/src/mapreduce.jl:58
 [21] mapreducedim!(::Function, ::Function, ::CuArray{Float32,2,Nothing}, ::CuArray{Float32,2,Nothing}) at ./reducedim.jl:274
 [22] _mapreduce_dim(::Function, ::Function, ::NamedTuple{(),Tuple{}}, ::CuArray{Float32,2,Nothing}, ::Int64) at ./reducedim.jl:317
 [23] mapreduce_impl at /home/troels/.julia/packages/GPUArrays/dhirJ/src/host/mapreduce.jl:78 [inlined]
 [24] #mapreduce#29 at /home/troels/.julia/packages/GPUArrays/dhirJ/src/host/mapreduce.jl:64 [inlined]
 [25] #mapreduce at ./none:0 [inlined]
 [26] _sum at ./reducedim.jl:679 [inlined]
 [27] #sum#588 at ./reducedim.jl:653 [inlined]
 [28] (::Base.var"#kw##sum")(::NamedTuple{(:dims,),Tuple{Int64}}, ::typeof(sum), ::Function, ::CuArray{Float32,2,Nothing}) at ./none:0
 [29] top-level scope at REPL[4]:1

Environment details
Details on Julia:

julia> versioninfo()
Julia Version 1.3.1
Commit 2d5741174c (2019-12-30 21:36 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)

Julia packages:

(v1.3) pkg> st CuArrays
    Status `~/.julia/environments/v1.3/Project.toml`
  [79e6a3ab] Adapt v1.0.0
  [fa961155] CEnum v0.2.0
  [3895d2a7] CUDAapi v2.1.0 #master (https://github.com/JuliaGPU/CUDAapi.jl.git)
  [c5f51814] CUDAdrv v5.0.1 #master (https://github.com/JuliaGPU/CUDAdrv.jl.git)
  [be33ccc6] CUDAnative v2.9.1 #master (https://github.com/JuliaGPU/CUDAnative.jl.git)
  [3a865a2d] CuArrays v1.7.0 #master (https://github.com/JuliaGPU/CuArrays.jl.git)
  [864edb3b] DataStructures v0.17.9
  [0c68f7d7] GPUArrays v2.0.1 #master (https://github.com/JuliaGPU/GPUArrays.jl.git)
  [1914dd2f] MacroTools v0.5.3
  [872c559c] NNlib v0.6.4
  [189a3867] Reexport v0.2.0

CUDA: toolkit and driver version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Fri_Feb__8_19:08:17_PST_2019
Cuda compilation tools, release 10.1, V10.1.105
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions