Skip to content

Add support to using CuVector{Int} and CuVector{UnitRange} as indexes for another CuArray #1222

Open

Description

As discussed in the forum, I'd like to use a vector of ranges as indices to apply an operation over segments of another array, but it seems this is not currently supported.

Something like this:

using CUDA
CUDA.allowscalar(false)

# example: sum only a part of an array
rangesum(x, r::UnitRange) = sum(x[r])

# broadcast over the ranges
rangesum(x, rr::AbstractVector{UnitRange}) = map(r -> rangesum(x, r), rr)

x = collect(1:100) |> cu
r = [1:10, 33:37, 50:80]

# this works fine if r is on cpu
rangesum(x, r) # results in a Vector{Int}

# but it fails if r in on the gpu
rangesum(x, cu(r))

The last line throws:

ERROR: InvalidIRError: compiling kernel broadcast_kernel(CUDA.CuKernelContext, CuDeviceVector{Int64, 1}, Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}}, var"#1#2"{CuDeviceVector{Int64, 1}}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{UnitRange{Int64}, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to print_to_string(xs...) in Base at strings/io.jl:124)
Stacktrace:
  [1] string
    @ ./strings/io.jl:174
  [2] throw_checksize_error
    @ ./multidimensional.jl:881
  [3] _unsafe_getindex
    @ ./multidimensional.jl:845
  [4] _getindex
    @ ./multidimensional.jl:832
  [5] getindex
    @ ./abstractarray.jl:1170
  [6] rangesum
    @ ./REPL[3]:2
  [7] #1
    @ ./REPL[4]:2
  [8] _broadcast_getindex_evalf
    @ ./broadcast.jl:648
  [9] _broadcast_getindex
    @ ./broadcast.jl:621
 [10] getindex
    @ ./broadcast.jl:575
 [11] broadcast_kernel
    @ ~/.julia/packages/GPUArrays/3sW6s/src/host/broadcast.jl:59
Reason: unsupported call through a literal pointer (call to )
Stacktrace:
  [1] Array
    @ ./boot.jl:448
  [2] Array
    @ ./boot.jl:457
  [3] similar
    @ ./abstractarray.jl:750
  [4] similar
    @ ./abstractarray.jl:740
  [5] _unsafe_getindex
    @ ./multidimensional.jl:844
  [6] _getindex
    @ ./multidimensional.jl:832
  [7] getindex
    @ ./abstractarray.jl:1170
  [8] rangesum
    @ ./REPL[3]:2
  [9] #1
    @ ./REPL[4]:2
 [10] _broadcast_getindex_evalf
    @ ./broadcast.jl:648
 [11] _broadcast_getindex
    @ ./broadcast.jl:621
 [12] getindex
    @ ./broadcast.jl:575
 [13] broadcast_kernel
    @ ~/.julia/packages/GPUArrays/3sW6s/src/host/broadcast.jl:59
Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#17", Tuple{CUDA.CuKernelContext, CuDeviceVector{Int64, 1}, Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}}, var"#1#2"{CuDeviceVector{Int64, 1}}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{UnitRange{Int64}, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}}, args::LLVM.Module)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/9rK1I/src/validation.jl:111
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/9rK1I/src/driver.jl:333 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/SSeq1/src/TimerOutput.jl:252 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/GPUCompiler/9rK1I/src/driver.jl:331 [inlined]
  [5] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module; strip::Bool, validate::Bool, format::LLVM.API.LLVMCodeGenFileType)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/9rK1I/src/utils.jl:62
  [6] cufunction_compile(job::GPUCompiler.CompilerJob)
    @ CUDA ~/.julia/packages/CUDA/Xt3hr/src/compiler/execution.jl:326
  [7] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/9rK1I/src/cache.jl:89
  [8] cufunction(f::GPUArrays.var"#broadcast_kernel#17", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceVector{Int64, 1}, Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}}, var"#1#2"{CuDeviceVector{Int64, 1}}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{UnitRange{Int64}, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}; name::Nothing, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA ~/.julia/packages/CUDA/Xt3hr/src/compiler/execution.jl:297
  [9] cufunction(f::GPUArrays.var"#broadcast_kernel#17", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceVector{Int64, 1}, Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}}, var"#1#2"{CuDeviceVector{Int64, 1}}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{UnitRange{Int64}, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}})
    @ CUDA ~/.julia/packages/CUDA/Xt3hr/src/compiler/execution.jl:291
 [10] macro expansion
    @ ~/.julia/packages/CUDA/Xt3hr/src/compiler/execution.jl:102 [inlined]
 [11] #launch_heuristic#234
    @ ~/.julia/packages/CUDA/Xt3hr/src/gpuarrays.jl:17 [inlined]
 [12] copyto!
    @ ~/.julia/packages/GPUArrays/3sW6s/src/host/broadcast.jl:65 [inlined]
 [13] copyto!
    @ ./broadcast.jl:936 [inlined]
 [14] copy
    @ ~/.julia/packages/GPUArrays/3sW6s/src/host/broadcast.jl:47 [inlined]
 [15] materialize(bc::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Nothing, var"#1#2"{CuArray{Int64, 1, CUDA.Mem.DeviceBuffer}}, Tuple{CuArray{UnitRange{Int64}, 1, CUDA.Mem.DeviceBuffer}}})
    @ Base.Broadcast ./broadcast.jl:883
 [16] map(::Function, ::CuArray{UnitRange{Int64}, 1, CUDA.Mem.DeviceBuffer})
    @ GPUArrays ~/.julia/packages/GPUArrays/3sW6s/src/host/broadcast.jl:90
 [17] rangesum(x::CuArray{Int64, 1, CUDA.Mem.DeviceBuffer}, rr::CuArray{UnitRange{Int64}, 1, CUDA.Mem.DeviceBuffer})
    @ Main ./REPL[4]:2
 [18] top-level scope
    @ REPL[8]:2
 [19] top-level scope
    @ ~/.julia/packages/CUDA/Xt3hr/src/initialization.jl:52

I would like to do this entire operation GPU-wise as a part of a bigger computation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions