Skip to content

Performance regression with StaticArray broadcast #30124

Closed
@mohamed82008

Description

@mohamed82008

EDIT by @maleadt: see #30124 (comment) for a non-CUDAnative specific reproducer.

The following code works on Julia 1.0.2 and fails in Julia Commit 8a4f20b (2018-11-19 01:50 UTC). The issue first started at JuliaGPU/CUDAnative.jl#291, and @maleadt found the causing commit to be 9e98386. If you replace the multiplication line by the next commented one, it works fine.

using CUDAnative, CuArrays, StaticArrays

Ks = [SMatrix{1, 1, Float64}(rand(1,1))] |> CuArrays.CuArray
fs = [SVector{1, Float64}(rand(1))] |> CuArrays.CuArray

function kernel(fs::AbstractVector{TV}, Ks) where {N, T, TV<:SVector{N,T}}
    i = (blockIdx().x-1) * blockDim().x + threadIdx().x
    m = 2.0
    if i == 1
	    fs[i] = m * (Ks[i] * fs[i])
	    #fs[i] = SVector{1,T}((m,)) .* (Ks[i] * fs[i])
    end
    return
end

@cuda blocks=1 threads=1 kernel(fs,  Ks)

#=
ERROR: InvalidIRError: compiling kernel(CuDeviceArray{SArray{Tuple{3},Float64,1,3},1,CUDAnative.AS.Global}, CuDeviceArray{SArray{Tuple{3,3},Float64,2,9},1,CUDAnative.AS.Global}) resulted in invalid LLVM IR
Reason: unsupported call to the Julia runtime (call to jl_invoke)
Stacktrace:
 [1] Type at broadcast.jl:141
 [2] result_style at broadcast.jl:397
 [3] combine_styles at broadcast.jl:390
 [4] broadcasted at broadcast.jl:1162
 [5] broadcast at broadcast.jl:702
 [6] * at /home/mohd/.julia/packages/StaticArrays/WmJnA/src/linalg.jl:25
 [7] kernel at REPL[18]:5
Stacktrace:
 [1] check_ir(::CUDAnative.CompilerContext, ::LLVM.Module) at /home/mohd/.julia/packages/CUDAnative/5H3Dk/src/compiler/validation.jl:123
 [2] #compile#80(::Bool, ::Function, ::CUDAnative.CompilerContext) at /home/mohd/.julia/packages/CUDAnative/5H3Dk/src/compiler/driver.jl:74
 [3] compile at /home/mohd/.julia/packages/CUDAnative/5H3Dk/src/compiler/driver.jl:49 [inlined]
 [4] #compile#79(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::CUDAdrv.CuDevice, ::Any, ::Any) at /home/mohd/.julia/packages/CUDAnative/5H3Dk/src/compiler/driver.jl:28
 [5] compile at /home/mohd/.julia/packages/CUDAnative/5H3Dk/src/compiler/driver.jl:16 [inlined]
 [6] macro expansion at /home/mohd/.julia/packages/CUDAnative/5H3Dk/src/execution.jl:255 [inlined]
 [7] #cufunction#91(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(cufunction), ::typeof(kernel), ::Type{Tuple{CuDeviceArray{SArray{Tuple{3},Float64,1,3},1,CUDAnative.AS.Global},CuDeviceArray{SArray{Tuple{3,3},Float64,2,9},1,CUDAnative.AS.Global}}}) at /home/mohd/.julia/packages/CUDAnative/5H3Dk/src/execution.jl:230
 [8] cufunction(::Function, ::Type) at /home/mohd/.julia/packages/CUDAnative/5H3Dk/src/execution.jl:230
 [9] top-level scope at /home/mohd/.julia/packages/CUDAnative/5H3Dk/src/execution.jl:198
 [10] top-level scope at gcutils.jl:87
 [11] top-level scope at /home/mohd/.julia/packages/CUDAnative/5H3Dk/src/execution.jl:195
=#

My versioninfo() is:

Julia Version 1.1.0-DEV.681
Commit 8a4f20b887 (2018-11-19 01:50 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)

Good luck!

Metadata

Metadata

Assignees

No one assigned

    Labels

    broadcastApplying a function over a collectionperformanceMust go fasterregressionRegression in behavior compared to a previous version

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions