Description
Description
Basic arithmetic of transposed sparse Float16
matrices, including addition, throws an exception during kernel execution (InvalidIRError
). This happens when computing A + B'
expressions, even for matching element types and positions of non-zero elements.
This happens specifically when one side remains in the original order, while the other side is transposed. It is also specific to Float16
elements, as using Float32
or Float64
instead causes no such issues.
CUSPARSE.CuSparseMatrixCSR{T}
types are generally usable for T
being Float32
or Float64
, including the mentioned operation. As such, different behavior for Float16
specialization classifies as a bug.
Example
The following code:
using SparseArrays
using CUDA
T = Float16
A = CUSPARSE.CuSparseMatrixCSR{T}(sparse(T[0 1 0; 1 0 0; 0 0 1]))
B = CUSPARSE.CuSparseMatrixCSR{T}(sparse(T[0 1 0; 1 0 0; 0 0 1]))
CUDA.@allowscalar begin
@show A + B
@show A' + B'
@show A + B'
end
Throws an error while computing contents of the last @show
invocation. The crucial messages are:
A + B = Float16[0.0 2.0 0.0; 2.0 0.0 0.0; 0.0 0.0 2.0]
A' + B' = Float16[0.0 2.0 0.0; 2.0 0.0 0.0; 0.0 0.0 2.0]
ERROR: a exception was thrown during kernel execution.
(...)
ERROR: LoadError: InvalidIRError: compiling MethodInstance for CUDA.CUSPARSE.sparse_to_dense_broadcast_kernel(::Type{CUDA.CUSPARSE.CuSparseMatrixCSR{Float16, Int32}}, ::typeof(+), ::CuDeviceMatrix{Float16, 1}, ::CUDA.CUSPARSE.CuSparseDeviceMatrixCSR{Float16, Int32, 1}, ::LinearAlgebra.Adjoint{Float16, CUDA.CUSPARSE.CuSparseDeviceMatrixCSR{Float16, Int32, 1}}) resulted in invalid LLVM IR
Reason: unsupported call to an unknown function (call to julia.new_gc_frame)
Stacktrace:
[1] ntuple
@ ./ntuple.jl:49
[2] sparse_to_dense_broadcast_kernel
@ ~/.julia/packages/CUDA/htRwP/lib/cusparse/broadcast.jl:429
Computing A' + B
yields the same error. Setting T = Float32
or T = Float64
makes the snippet execute properly.
Similar behavior can be observed when the same matrix is used on both sides, i.e. on computing A + A'
. This breaks more common operations, such as LinearAlgebra.issymmetric
.
Expected behavior
The expected output, analogous to other types, would be:
A + B = Float16[0.0 2.0 0.0; 2.0 0.0 0.0; 0.0 0.0 2.0]
A' + B' = Float16[0.0 2.0 0.0; 2.0 0.0 0.0; 0.0 0.0 2.0]
A + B' = Float16[0.0 2.0 0.0; 2.0 0.0 0.0; 0.0 0.0 2.0]
Version info
Details on Julia:
Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 16 × Intel(R) Core(TM) i9-9960X CPU @ 3.10GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, skylake-avx512)
Threads: 1 default, 0 interactive, 1 GC (on 16 virtual cores)
Details on CUDA:
CUDA runtime 12.3, artifact installation
CUDA driver 12.2
NVIDIA driver 535.104.12
CUDA libraries:
- CUBLAS: 12.3.4
- CURAND: 10.3.4
- CUFFT: 11.0.12
- CUSOLVER: 11.5.4
- CUSPARSE: 12.2.0
- CUPTI: 21.0.0
- NVML: 12.0.0+535.104.12
Julia packages:
- CUDA: 5.2.0
- CUDA_Driver_jll: 0.7.0+1
- CUDA_Runtime_jll: 0.11.1+0
Toolchain:
- Julia: 1.10.2
- LLVM: 15.0.7
2 devices:
0: NVIDIA GeForce RTX 4090 (sm_89, 23.643 GiB / 23.988 GiB available)
1: NVIDIA GeForce RTX 4090 (sm_89, 23.642 GiB / 23.988 GiB available)