Closed
Description
Describe the bug
When comparing irrational numbers, the kernel compilation throws an InvalidIRError.
This does not happen in Julia 1.8, only in Julia 1.9.
To reproduce
The Minimal Working Example (MWE) for this bug:
julia> A = 4 * CUDA.rand(3,3)
3×3 CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}:
2.82133 2.05078 3.8729
3.14324 0.218199 3.05251
0.227112 3.68884 3.91693
julia> A .< π
ERROR: InvalidIRError: compiling kernel #broadcast_kernel#17(CUDA.CuKernelContext, CuDeviceMatrix{Bool, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(<), Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Irrational{:π}}}, Int64) resulted in invalid LLVM IR
Reason: unsupported call through a literal pointer (call to mpfr_signbit)
However, explicit conversion works:
julia> A .< Float32.(CUDA.fill(π,3,3))
3×3 CuArray{Bool, 2, CUDA.Mem.DeviceBuffer}:
0 1 1
1 1 0
1 0 0
I am using a temp environment where only CUDA is installed:
(jl_bFolVV) pkg> status
Status `/tmp/jl_bFolVV/Project.toml`
[052768ef] CUDA v3.12.1
Expected behavior
I get the expected result without errors in Julia 1.8:
julia> A = 4 * CUDA.rand(3,3)
3×3 CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}:
0.459628 1.38646 3.23488
3.10404 1.11073 3.11335
1.95598 3.8854 2.79495
julia> A .< π
3×3 CuArray{Bool, 2, CUDA.Mem.DeviceBuffer}:
1 1 0
1 1 1
1 0 1
Version info
Details on Julia:
julia> versioninfo()
Julia Version 1.9.0-beta2
Commit 7daffeecb8c (2022-12-29 07:45 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 24 × AMD Ryzen 9 3900X 12-Core Processor
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, znver2)
Threads: 1 on 24 virtual cores
Environment:
LD_LIBRARY_PATH = /opt/ros/noetic/lib
Details on CUDA:
julia> CUDA.versioninfo()
CUDA toolkit 11.7, artifact installation
NVIDIA driver 515.86.1, for CUDA 11.7
CUDA driver 11.7
Libraries:
- CUBLAS: 11.10.1
- CURAND: 10.2.10
- CUFFT: 10.7.2
- CUSOLVER: 11.3.5
- CUSPARSE: 11.7.3
- CUPTI: 17.0.0
- NVML: 11.0.0+515.86.1
- CUDNN: 8.30.2 (for CUDA 11.5.0)
- CUTENSOR: 1.4.0 (for CUDA 11.5.0)
Toolchain:
- Julia: 1.9.0-beta2
- LLVM: 14.0.6
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2, 7.3, 7.4, 7.5
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86
1 device:
0: NVIDIA GeForce RTX 3080 (sm_86, 1.470 GiB / 10.000 GiB available)
Additional context
Full Stacktrace:
[1] signbit
@ ./mpfr.jl:811
[2] _cpynansgn
@ ./mpfr.jl:338
[3] Float32
@ ./mpfr.jl:344
[4] Float32
@ ./mpfr.jl:346
[5] JuliaGPU/CUDA.jl#888
@ ./irrationals.jl:70
[6] #setprecision#25
@ ./mpfr.jl:964
[7] setprecision
@ ./mpfr.jl:960
[8] Type
@ ./irrationals.jl:69
[9] <
@ ./irrationals.jl:96
[10] _broadcast_getindex_evalf
@ ./broadcast.jl:683
[11] _broadcast_getindex
@ ./broadcast.jl:656
[12] getindex
@ ./broadcast.jl:610
[13] broadcast_kernel
@ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:57
Reason: unsupported call through a literal pointer (call to ijl_rethrow)
Stacktrace:
[1] rethrow
@ ./error.jl:61
[2] #setprecision#25
@ ./mpfr.jl:966
[3] setprecision
@ ./mpfr.jl:960
[4] Type
@ ./irrationals.jl:69
[5] <
@ ./irrationals.jl:96
[6] _broadcast_getindex_evalf
@ ./broadcast.jl:683
[7] _broadcast_getindex
@ ./broadcast.jl:656
[8] getindex
@ ./broadcast.jl:610
[9] broadcast_kernel
@ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:57
Reason: unsupported call to an unknown function (call to ijl_excstack_state)
Stacktrace:
[1] #setprecision#25
@ ./mpfr.jl:963
[2] setprecision
@ ./mpfr.jl:960
[3] Type
@ ./irrationals.jl:69
[4] <
@ ./irrationals.jl:96
[5] _broadcast_getindex_evalf
@ ./broadcast.jl:683
[6] _broadcast_getindex
@ ./broadcast.jl:656
[7] getindex
@ ./broadcast.jl:610
[8] broadcast_kernel
@ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:57
Reason: unsupported call to an unknown function (call to julia.except_enter)
Stacktrace:
[1] #setprecision#25
@ ./mpfr.jl:963
[2] setprecision
@ ./mpfr.jl:960
[3] Type
@ ./irrationals.jl:69
[4] <
@ ./irrationals.jl:96
[5] _broadcast_getindex_evalf
@ ./broadcast.jl:683
[6] _broadcast_getindex
@ ./broadcast.jl:656
[7] getindex
@ ./broadcast.jl:610
[8] broadcast_kernel
@ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:57
Reason: unsupported call to an unknown function (call to ijl_pop_handler)
Stacktrace:
[1] #setprecision#25
@ ./mpfr.jl:964
[2] setprecision
@ ./mpfr.jl:960
[3] Type
@ ./irrationals.jl:69
[4] <
@ ./irrationals.jl:96
[5] _broadcast_getindex_evalf
@ ./broadcast.jl:683
[6] _broadcast_getindex
@ ./broadcast.jl:656
[7] getindex
@ ./broadcast.jl:610
[8] broadcast_kernel
@ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:57
Reason: unsupported call through a literal pointer (call to mpfr_custom_get_size)
Stacktrace:
[1] _
@ ./mpfr.jl:112
[2] #BigFloat#1
@ ./irrationals.jl:209
[3] BigFloat (repeats 2 times)
@ ./irrationals.jl:208
[4] JuliaGPU/CUDA.jl#888
@ ./irrationals.jl:70
[5] #setprecision#25
@ ./mpfr.jl:964
[6] setprecision
@ ./mpfr.jl:960
[7] Type
@ ./irrationals.jl:69
[8] <
@ ./irrationals.jl:96
[9] _broadcast_getindex_evalf
@ ./broadcast.jl:683
[10] _broadcast_getindex
@ ./broadcast.jl:656
[11] getindex
@ ./broadcast.jl:610
[12] broadcast_kernel
@ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:57
Reason: unsupported call through a literal pointer (call to ijl_alloc_string)
Stacktrace:
[1] _string_n
@ ./strings/string.jl:90
[2] _
@ ./mpfr.jl:115
[3] #BigFloat#1
@ ./irrationals.jl:209
[4] BigFloat (repeats 2 times)
@ ./irrationals.jl:208
[5] JuliaGPU/CUDA.jl#888
@ ./irrationals.jl:70
[6] #setprecision#25
@ ./mpfr.jl:964
[7] setprecision
@ ./mpfr.jl:960
[8] Type
@ ./irrationals.jl:69
[9] <
@ ./irrationals.jl:96
[10] _broadcast_getindex_evalf
@ ./broadcast.jl:683
[11] _broadcast_getindex
@ ./broadcast.jl:656
[12] getindex
@ ./broadcast.jl:610
[13] broadcast_kernel
@ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:57
Reason: unsupported call through a literal pointer (call to mpfr_const_pi)
Stacktrace:
[1] #BigFloat#1
@ ./irrationals.jl:210
[2] BigFloat (repeats 2 times)
@ ./irrationals.jl:208
[3] JuliaGPU/CUDA.jl#888
@ ./irrationals.jl:70
[4] #setprecision#25
@ ./mpfr.jl:964
[5] setprecision
@ ./mpfr.jl:960
[6] Type
@ ./irrationals.jl:69
[7] <
@ ./irrationals.jl:96
[8] _broadcast_getindex_evalf
@ ./broadcast.jl:683
[9] _broadcast_getindex
@ ./broadcast.jl:656
[10] getindex
@ ./broadcast.jl:610
[11] broadcast_kernel
@ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:57
Reason: unsupported call through a literal pointer (call to mpfr_get_flt)
Stacktrace:
[1] Float32
@ ./mpfr.jl:344
[2] Float32
@ ./mpfr.jl:346
[3] JuliaGPU/CUDA.jl#888
@ ./irrationals.jl:70
[4] #setprecision#25
@ ./mpfr.jl:964
[5] setprecision
@ ./mpfr.jl:960
[6] Type
@ ./irrationals.jl:69
[7] <
@ ./irrationals.jl:96
[8] _broadcast_getindex_evalf
@ ./broadcast.jl:683
[9] _broadcast_getindex
@ ./broadcast.jl:656
[10] getindex
@ ./broadcast.jl:610
[11] broadcast_kernel
@ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:57
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code with Cthulhu.jl
Stacktrace:
[1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#17", Tuple{CUDA.CuKernelContext, CuDeviceMatrix{Bool, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(<), Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Irrational{:π}}}, Int64}}}, args::LLVM.Module)
@ GPUCompiler ~/.julia/packages/GPUCompiler/qdoh1/src/validation.jl:141
[2] macro expansion
@ ~/.julia/packages/GPUCompiler/qdoh1/src/driver.jl:418 [inlined]
[3] macro expansion
@ ~/.julia/packages/TimerOutputs/LHjFw/src/TimerOutput.jl:253 [inlined]
[4] macro expansion
@ ~/.julia/packages/GPUCompiler/qdoh1/src/driver.jl:416 [inlined]
[5] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module; strip::Bool, validate::Bool, format::LLVM.API.LLVMCodeGenFileType)
@ GPUCompiler ~/.julia/packages/GPUCompiler/qdoh1/src/utils.jl:83
[6] cufunction_compile(job::GPUCompiler.CompilerJob, ctx::LLVM.ThreadSafeContext)
@ CUDA ~/.julia/packages/CUDA/Ey3w2/src/compiler/execution.jl:354
[7] JuliaGPU/CUDA.jl#224
@ ~/.julia/packages/CUDA/Ey3w2/src/compiler/execution.jl:347 [inlined]
[8] LLVM.ThreadSafeContext(f::CUDA.var"#224#225"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#17", Tuple{CUDA.CuKernelContext, CuDeviceMatrix{Bool, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(<), Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Irrational{:π}}}, Int64}}}})
@ LLVM ~/.julia/packages/LLVM/9gCXO/src/executionengine/ts_module.jl:14
[9] JuliaContext(f::CUDA.var"#224#225"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#17", Tuple{CUDA.CuKernelContext, CuDeviceMatrix{Bool, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(<), Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Irrational{:π}}}, Int64}}}})
@ GPUCompiler ~/.julia/packages/GPUCompiler/qdoh1/src/driver.jl:74
[10] cufunction_compile(job::GPUCompiler.CompilerJob)
@ CUDA ~/.julia/packages/CUDA/Ey3w2/src/compiler/execution.jl:346
[11] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
@ GPUCompiler ~/.julia/packages/GPUCompiler/qdoh1/src/cache.jl:90
[12] cufunction(f::GPUArrays.var"#broadcast_kernel#17", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceMatrix{Bool, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(<), Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Irrational{:π}}}, Int64}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ CUDA ~/.julia/packages/CUDA/Ey3w2/src/compiler/execution.jl:299
[13] cufunction
@ ~/.julia/packages/CUDA/Ey3w2/src/compiler/execution.jl:292 [inlined]
[14] macro expansion
@ ~/.julia/packages/CUDA/Ey3w2/src/compiler/execution.jl:102 [inlined]
[15] #launch_heuristic#248
@ ~/.julia/packages/CUDA/Ey3w2/src/gpuarrays.jl:17 [inlined]
[16] _copyto!
@ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:63 [inlined]
[17] copyto!
@ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:46 [inlined]
[18] copy
@ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:37 [inlined]
[19] materialize(bc::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(<), Tuple{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Irrational{:π}}})
@ Base.Broadcast ./broadcast.jl:873
[20] top-level scope
@ REPL[40]:1
[21] top-level scope
@ ~/.julia/packages/CUDA/Ey3w2/src/initialization.jl:52