Skip to content

Overlay methods disabling IR interpreter breaks const-prop of GPU-incompatible code #384

Closed
@Tuebel

Description

@Tuebel

Describe the bug

When comparing irrational numbers, the kernel compilation throws an InvalidIRError.
This does not happen in Julia 1.8, only in Julia 1.9.

To reproduce

The Minimal Working Example (MWE) for this bug:

julia> A = 4 * CUDA.rand(3,3)
3×3 CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}:
 2.82133   2.05078   3.8729
 3.14324   0.218199  3.05251
 0.227112  3.68884   3.91693

julia> A .< π
ERROR: InvalidIRError: compiling kernel #broadcast_kernel#17(CUDA.CuKernelContext, CuDeviceMatrix{Bool, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(<), Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Irrational{:π}}}, Int64) resulted in invalid LLVM IR
Reason: unsupported call through a literal pointer (call to mpfr_signbit)

However, explicit conversion works:

julia> A .< Float32.(CUDA.fill(π,3,3))
3×3 CuArray{Bool, 2, CUDA.Mem.DeviceBuffer}:
 0  1  1
 1  1  0
 1  0  0

I am using a temp environment where only CUDA is installed:

(jl_bFolVV) pkg> status
Status `/tmp/jl_bFolVV/Project.toml`
  [052768ef] CUDA v3.12.1

Expected behavior

I get the expected result without errors in Julia 1.8:

julia> A = 4 * CUDA.rand(3,3)
3×3 CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}:
 0.459628  1.38646  3.23488
 3.10404   1.11073  3.11335
 1.95598   3.8854   2.79495

julia> A .< π
3×3 CuArray{Bool, 2, CUDA.Mem.DeviceBuffer}:
 1  1  0
 1  1  1
 1  0  1

Version info

Details on Julia:

julia> versioninfo()
Julia Version 1.9.0-beta2
Commit 7daffeecb8c (2022-12-29 07:45 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 24 × AMD Ryzen 9 3900X 12-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, znver2)
  Threads: 1 on 24 virtual cores
Environment:
  LD_LIBRARY_PATH = /opt/ros/noetic/lib

Details on CUDA:

julia> CUDA.versioninfo()
CUDA toolkit 11.7, artifact installation
NVIDIA driver 515.86.1, for CUDA 11.7
CUDA driver 11.7

Libraries: 
- CUBLAS: 11.10.1
- CURAND: 10.2.10
- CUFFT: 10.7.2
- CUSOLVER: 11.3.5
- CUSPARSE: 11.7.3
- CUPTI: 17.0.0
- NVML: 11.0.0+515.86.1
- CUDNN: 8.30.2 (for CUDA 11.5.0)
- CUTENSOR: 1.4.0 (for CUDA 11.5.0)

Toolchain:
- Julia: 1.9.0-beta2
- LLVM: 14.0.6
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2, 7.3, 7.4, 7.5
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86

1 device:
  0: NVIDIA GeForce RTX 3080 (sm_86, 1.470 GiB / 10.000 GiB available)

Additional context
Full Stacktrace:

  [1] signbit
    @ ./mpfr.jl:811
  [2] _cpynansgn
    @ ./mpfr.jl:338
  [3] Float32
    @ ./mpfr.jl:344
  [4] Float32
    @ ./mpfr.jl:346
  [5] JuliaGPU/CUDA.jl#888
    @ ./irrationals.jl:70
  [6] #setprecision#25
    @ ./mpfr.jl:964
  [7] setprecision
    @ ./mpfr.jl:960
  [8] Type
    @ ./irrationals.jl:69
  [9] <
    @ ./irrationals.jl:96
 [10] _broadcast_getindex_evalf
    @ ./broadcast.jl:683
 [11] _broadcast_getindex
    @ ./broadcast.jl:656
 [12] getindex
    @ ./broadcast.jl:610
 [13] broadcast_kernel
    @ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:57
Reason: unsupported call through a literal pointer (call to ijl_rethrow)
Stacktrace:
 [1] rethrow
   @ ./error.jl:61
 [2] #setprecision#25
   @ ./mpfr.jl:966
 [3] setprecision
   @ ./mpfr.jl:960
 [4] Type
   @ ./irrationals.jl:69
 [5] <
   @ ./irrationals.jl:96
 [6] _broadcast_getindex_evalf
   @ ./broadcast.jl:683
 [7] _broadcast_getindex
   @ ./broadcast.jl:656
 [8] getindex
   @ ./broadcast.jl:610
 [9] broadcast_kernel
   @ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:57
Reason: unsupported call to an unknown function (call to ijl_excstack_state)
Stacktrace:
 [1] #setprecision#25
   @ ./mpfr.jl:963
 [2] setprecision
   @ ./mpfr.jl:960
 [3] Type
   @ ./irrationals.jl:69
 [4] <
   @ ./irrationals.jl:96
 [5] _broadcast_getindex_evalf
   @ ./broadcast.jl:683
 [6] _broadcast_getindex
   @ ./broadcast.jl:656
 [7] getindex
   @ ./broadcast.jl:610
 [8] broadcast_kernel
   @ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:57
Reason: unsupported call to an unknown function (call to julia.except_enter)
Stacktrace:
 [1] #setprecision#25
   @ ./mpfr.jl:963
 [2] setprecision
   @ ./mpfr.jl:960
 [3] Type
   @ ./irrationals.jl:69
 [4] <
   @ ./irrationals.jl:96
 [5] _broadcast_getindex_evalf
   @ ./broadcast.jl:683
 [6] _broadcast_getindex
   @ ./broadcast.jl:656
 [7] getindex
   @ ./broadcast.jl:610
 [8] broadcast_kernel
   @ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:57
Reason: unsupported call to an unknown function (call to ijl_pop_handler)
Stacktrace:
 [1] #setprecision#25
   @ ./mpfr.jl:964
 [2] setprecision
   @ ./mpfr.jl:960
 [3] Type
   @ ./irrationals.jl:69
 [4] <
   @ ./irrationals.jl:96
 [5] _broadcast_getindex_evalf
   @ ./broadcast.jl:683
 [6] _broadcast_getindex
   @ ./broadcast.jl:656
 [7] getindex
   @ ./broadcast.jl:610
 [8] broadcast_kernel
   @ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:57
Reason: unsupported call through a literal pointer (call to mpfr_custom_get_size)
Stacktrace:
  [1] _
    @ ./mpfr.jl:112
  [2] #BigFloat#1
    @ ./irrationals.jl:209
  [3] BigFloat (repeats 2 times)
    @ ./irrationals.jl:208
  [4] JuliaGPU/CUDA.jl#888
    @ ./irrationals.jl:70
  [5] #setprecision#25
    @ ./mpfr.jl:964
  [6] setprecision
    @ ./mpfr.jl:960
  [7] Type
    @ ./irrationals.jl:69
  [8] <
    @ ./irrationals.jl:96
  [9] _broadcast_getindex_evalf
    @ ./broadcast.jl:683
 [10] _broadcast_getindex
    @ ./broadcast.jl:656
 [11] getindex
    @ ./broadcast.jl:610
 [12] broadcast_kernel
    @ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:57
Reason: unsupported call through a literal pointer (call to ijl_alloc_string)
Stacktrace:
  [1] _string_n
    @ ./strings/string.jl:90
  [2] _
    @ ./mpfr.jl:115
  [3] #BigFloat#1
    @ ./irrationals.jl:209
  [4] BigFloat (repeats 2 times)
    @ ./irrationals.jl:208
  [5] JuliaGPU/CUDA.jl#888
    @ ./irrationals.jl:70
  [6] #setprecision#25
    @ ./mpfr.jl:964
  [7] setprecision
    @ ./mpfr.jl:960
  [8] Type
    @ ./irrationals.jl:69
  [9] <
    @ ./irrationals.jl:96
 [10] _broadcast_getindex_evalf
    @ ./broadcast.jl:683
 [11] _broadcast_getindex
    @ ./broadcast.jl:656
 [12] getindex
    @ ./broadcast.jl:610
 [13] broadcast_kernel
    @ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:57
Reason: unsupported call through a literal pointer (call to mpfr_const_pi)
Stacktrace:
  [1] #BigFloat#1
    @ ./irrationals.jl:210
  [2] BigFloat (repeats 2 times)
    @ ./irrationals.jl:208
  [3] JuliaGPU/CUDA.jl#888
    @ ./irrationals.jl:70
  [4] #setprecision#25
    @ ./mpfr.jl:964
  [5] setprecision
    @ ./mpfr.jl:960
  [6] Type
    @ ./irrationals.jl:69
  [7] <
    @ ./irrationals.jl:96
  [8] _broadcast_getindex_evalf
    @ ./broadcast.jl:683
  [9] _broadcast_getindex
    @ ./broadcast.jl:656
 [10] getindex
    @ ./broadcast.jl:610
 [11] broadcast_kernel
    @ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:57
Reason: unsupported call through a literal pointer (call to mpfr_get_flt)
Stacktrace:
  [1] Float32
    @ ./mpfr.jl:344
  [2] Float32
    @ ./mpfr.jl:346
  [3] JuliaGPU/CUDA.jl#888
    @ ./irrationals.jl:70
  [4] #setprecision#25
    @ ./mpfr.jl:964
  [5] setprecision
    @ ./mpfr.jl:960
  [6] Type
    @ ./irrationals.jl:69
  [7] <
    @ ./irrationals.jl:96
  [8] _broadcast_getindex_evalf
    @ ./broadcast.jl:683
  [9] _broadcast_getindex
    @ ./broadcast.jl:656
 [10] getindex
    @ ./broadcast.jl:610
 [11] broadcast_kernel
    @ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:57
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code with Cthulhu.jl
Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#17", Tuple{CUDA.CuKernelContext, CuDeviceMatrix{Bool, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(<), Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Irrational{:π}}}, Int64}}}, args::LLVM.Module)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/qdoh1/src/validation.jl:141
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/qdoh1/src/driver.jl:418 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/LHjFw/src/TimerOutput.jl:253 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/GPUCompiler/qdoh1/src/driver.jl:416 [inlined]
  [5] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module; strip::Bool, validate::Bool, format::LLVM.API.LLVMCodeGenFileType)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/qdoh1/src/utils.jl:83
  [6] cufunction_compile(job::GPUCompiler.CompilerJob, ctx::LLVM.ThreadSafeContext)
    @ CUDA ~/.julia/packages/CUDA/Ey3w2/src/compiler/execution.jl:354
  [7] JuliaGPU/CUDA.jl#224
    @ ~/.julia/packages/CUDA/Ey3w2/src/compiler/execution.jl:347 [inlined]
  [8] LLVM.ThreadSafeContext(f::CUDA.var"#224#225"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#17", Tuple{CUDA.CuKernelContext, CuDeviceMatrix{Bool, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(<), Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Irrational{:π}}}, Int64}}}})
    @ LLVM ~/.julia/packages/LLVM/9gCXO/src/executionengine/ts_module.jl:14
  [9] JuliaContext(f::CUDA.var"#224#225"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#17", Tuple{CUDA.CuKernelContext, CuDeviceMatrix{Bool, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(<), Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Irrational{:π}}}, Int64}}}})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/qdoh1/src/driver.jl:74
 [10] cufunction_compile(job::GPUCompiler.CompilerJob)
    @ CUDA ~/.julia/packages/CUDA/Ey3w2/src/compiler/execution.jl:346
 [11] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/qdoh1/src/cache.jl:90
 [12] cufunction(f::GPUArrays.var"#broadcast_kernel#17", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceMatrix{Bool, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(<), Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Irrational{:π}}}, Int64}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA ~/.julia/packages/CUDA/Ey3w2/src/compiler/execution.jl:299
 [13] cufunction
    @ ~/.julia/packages/CUDA/Ey3w2/src/compiler/execution.jl:292 [inlined]
 [14] macro expansion
    @ ~/.julia/packages/CUDA/Ey3w2/src/compiler/execution.jl:102 [inlined]
 [15] #launch_heuristic#248
    @ ~/.julia/packages/CUDA/Ey3w2/src/gpuarrays.jl:17 [inlined]
 [16] _copyto!
    @ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:63 [inlined]
 [17] copyto!
    @ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:46 [inlined]
 [18] copy
    @ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:37 [inlined]
 [19] materialize(bc::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(<), Tuple{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Irrational{:π}}})
    @ Base.Broadcast ./broadcast.jl:873
 [20] top-level scope
    @ REPL[40]:1
 [21] top-level scope
    @ ~/.julia/packages/CUDA/Ey3w2/src/initialization.jl:52

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions