Skip to content

CUSPARSE: Float16 sparse arrays fail on arithmetic of opposite transpositions #2284

Open

Description

Description

Basic arithmetic of transposed sparse Float16 matrices, including addition, throws an exception during kernel execution (InvalidIRError). This happens when computing A + B' expressions, even for matching element types and positions of non-zero elements.

This happens specifically when one side remains in the original order, while the other side is transposed. It is also specific to Float16 elements, as using Float32 or Float64 instead causes no such issues.

CUSPARSE.CuSparseMatrixCSR{T} types are generally usable for T being Float32 or Float64, including the mentioned operation. As such, different behavior for Float16 specialization classifies as a bug.

Example

The following code:

using SparseArrays
using CUDA

T = Float16
A = CUSPARSE.CuSparseMatrixCSR{T}(sparse(T[0 1 0; 1 0 0; 0 0 1]))
B = CUSPARSE.CuSparseMatrixCSR{T}(sparse(T[0 1 0; 1 0 0; 0 0 1]))

CUDA.@allowscalar begin
    @show A + B
    @show A' + B'
    @show A + B'
end

Throws an error while computing contents of the last @show invocation. The crucial messages are:

A + B = Float16[0.0 2.0 0.0; 2.0 0.0 0.0; 0.0 0.0 2.0]
A' + B' = Float16[0.0 2.0 0.0; 2.0 0.0 0.0; 0.0 0.0 2.0]
ERROR: a exception was thrown during kernel execution.
(...)
ERROR: LoadError: InvalidIRError: compiling MethodInstance for CUDA.CUSPARSE.sparse_to_dense_broadcast_kernel(::Type{CUDA.CUSPARSE.CuSparseMatrixCSR{Float16, Int32}}, ::typeof(+), ::CuDeviceMatrix{Float16, 1}, ::CUDA.CUSPARSE.CuSparseDeviceMatrixCSR{Float16, Int32, 1}, ::LinearAlgebra.Adjoint{Float16, CUDA.CUSPARSE.CuSparseDeviceMatrixCSR{Float16, Int32, 1}}) resulted in invalid LLVM IR
Reason: unsupported call to an unknown function (call to julia.new_gc_frame)
Stacktrace:
 [1] ntuple
   @ ./ntuple.jl:49
 [2] sparse_to_dense_broadcast_kernel
   @ ~/.julia/packages/CUDA/htRwP/lib/cusparse/broadcast.jl:429

Computing A' + B yields the same error. Setting T = Float32 or T = Float64 makes the snippet execute properly.

Similar behavior can be observed when the same matrix is used on both sides, i.e. on computing A + A'. This breaks more common operations, such as LinearAlgebra.issymmetric.

Expected behavior

The expected output, analogous to other types, would be:

A + B = Float16[0.0 2.0 0.0; 2.0 0.0 0.0; 0.0 0.0 2.0]
A' + B' = Float16[0.0 2.0 0.0; 2.0 0.0 0.0; 0.0 0.0 2.0]
A + B' = Float16[0.0 2.0 0.0; 2.0 0.0 0.0; 0.0 0.0 2.0]

Version info

Details on Julia:

Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 16 × Intel(R) Core(TM) i9-9960X CPU @ 3.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake-avx512)
Threads: 1 default, 0 interactive, 1 GC (on 16 virtual cores)

Details on CUDA:

CUDA runtime 12.3, artifact installation
CUDA driver 12.2
NVIDIA driver 535.104.12

CUDA libraries: 
- CUBLAS: 12.3.4
- CURAND: 10.3.4
- CUFFT: 11.0.12
- CUSOLVER: 11.5.4
- CUSPARSE: 12.2.0
- CUPTI: 21.0.0
- NVML: 12.0.0+535.104.12

Julia packages: 
- CUDA: 5.2.0
- CUDA_Driver_jll: 0.7.0+1
- CUDA_Runtime_jll: 0.11.1+0

Toolchain:
- Julia: 1.10.2
- LLVM: 15.0.7

2 devices:
  0: NVIDIA GeForce RTX 4090 (sm_89, 23.643 GiB / 23.988 GiB available)
  1: NVIDIA GeForce RTX 4090 (sm_89, 23.642 GiB / 23.988 GiB available)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingcuda librariesStuff about CUDA library wrappers.help wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions