Description
Describe the bug
The similar
function does not return the correct type of array when applied to reshaped or adjointed CuSparseMatrix-based arrays (checked for CSR,CSC), but rather returns a nonsparse CPU array with appropriate size
and eltype
. This leads to methods that use similar
such as batched_mult
to incorrectly deduce the output matrix type and fallback to generic scalar indexing operations.
To reproduce
The Minimal Working Example (MWE) for this bug:
using CUDA, CUDA.CUSPARSE
a = CuSparseMatrixCSR(CuArray(ones(3,3)))
a = reshape(a,size(a)...,1) #3x3 CuSparseMatrixCSR{Float64, Int32}
z=similar(a) #3x3x1 Array{Float64,3}
Note that, specifically, batched_mul(a,b)
will work if b
is the wrapped CuSparseMatrix, but not if a
is. This is because batched_mul
specifically uses the first input as the input for similar
.
Manifest.toml
[[CUDA]]
deps = ["AbstractFFTs", "Adapt", "BFloat16s", "CEnum", "CompilerSupportLibraries_jll", "ExprTools", "GPUArrays", "GPUCompiler", "LLVM", "LazyArtifacts", "Libdl", "LinearAlgebra", "Logging", "Printf", "Random", "Random123", "RandomNumbers", "Reexport", "Requires", "SparseArrays", "SpecialFunctions", "TimerOutputs"]
git-tree-sha1 = "20bbe9217f06a0b44f261a0f058b77b09f53829f"
repo-rev = "master"
repo-url = "https://github.com/JuliaGPU/CUDA.jl"
uuid = "052768ef-5323-5732-b1bb-66c8b64840ba"
version = "3.4.1"
[[GPUArrays]]
deps = ["Adapt", "LinearAlgebra", "Printf", "Random", "Serialization", "Statistics"]
git-tree-sha1 = "69faa5f1c5706ca9ca067604acf797ee3a8ec6f6"
uuid = "0c68f7d7-f131-5f86-a1c3-88cf8149b2d7"
version = "8.1.1"
[[GPUCompiler]]
deps = ["ExprTools", "InteractiveUtils", "LLVM", "Libdl", "Logging", "TimerOutputs", "UUIDs"]
git-tree-sha1 = "2535b71c1031b6dbca5f22529dbfbe6725749749"
uuid = "61eb1bfa-7361-4325-ad38-22787b887f55"
version = "0.13.3"
[[LLVM]]
deps = ["CEnum", "LLVMExtra_jll", "Libdl", "Printf", "Unicode"]
git-tree-sha1 = "46092047ca4edc10720ecab437c42283cd7c44f3"
uuid = "929cbde3-209d-540e-8aea-75f648917ca0"
version = "4.6.0"
[[Adapt]]
deps = ["LinearAlgebra"]
git-tree-sha1 = "84918055d15b3114ede17ac6a7182f68870c16f7"
uuid = "79e6a3ab-5dfb-504d-930d-738a2a938a0e"
version = "3.3.1"
Expected behavior
For consistent use in all GPU functions (e.g. batched_mul
), similar()
on a wrapped CuSparseMatrix should return a nonsparse CuArray, and should not return a CPU array.
Version info
Details on Julia:
julia> versioninfo()
Julia Version 1.6.2
Commit 1b93d53fc4 (2021-07-14 15:36 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.1 (ORCJIT, ivybridge)
Details on CUDA:
error while getting version - needs admin privileges, which I do not have.
running on an NVIDIA Tesla K20Xm
CUDA toolkit 10.1, artifact installation
NVIDIA driver 418.96.0, for CUDA 10.1
Libraries:
- CUBLAS: 10.2.1
- CURAND: 10.1.1
- CUFFT: 10.1.1
- CUSOLVER: 10.2.0
- CUSPARSE: 10.3.0
- CUPTI: 12.0.0
- NVML: 10.0.0+418.96
ERROR: could not load library "d:\Users\username\.julia\artifacts\29e0f0db94f
0a4aa539ffc8593dd07f0f209965c\bin\cudnn_ops_infer64_8.dll"
Access is denied.