Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling similar(...) on AbstractCuSparseArray with dims #1184

Open
wants to merge 36 commits into
base: master
Choose a base branch
from

Conversation

birkmichael
Copy link
Contributor

Calling similar(var,dims) or similar(var,type,dims) on a variable var with typeof(var):<AbstractCuSparseArray will return a CuArray of eltype eltype(var) with dimensions given by dims. Previously, calling similar(reshape(var,dims)) would result in Base.abstractarray.jl and Base.reshapedarray.jl expanding the call to similar(reshape(var,dims),eltype,dims), which was not handled in this code and resulted in a call to Array(...) and not CuArray(...), forcing use of CPU and scalar indexing.

Calling similar() on a variable var of type :<AbstractCuSparseArray along with dims argument will return a CuArray of type eltype(var) with dimensions given by dims.
calling similar on AbstractCuSparseArray with dims
@maleadt
Copy link
Member

maleadt commented Oct 4, 2021

Thanks for the contribution! I'm not sure I fully understand the fix though, shouldn't similar on a sparse array yield another sparse array, not a CuArray?

julia> similar(sprand(2,2,0.5))
2×2 SparseMatrixCSC{Float64, Int64} with 3 stored entries:
 6.90942e-310  6.90965e-310
 6.90941e-310   ⋅ 

julia> similar(cu(sprand(2,2,0.5)))
2×2 CuSparseMatrixCSC{Float32, Int32} with 2 stored entries:
  ⋅    ⋅ 
 0.0  1.875

Furthermore, a wrapped array isn't a subtype of the original array:

julia> reshape(cu(sprand(2,2,0.2)), 2, 2, 1) isa CUSPARSE.AbstractCuSparseArray
false

I've also marked your PR as draft so that it doesn't kick off unnecessary CI. It would need some tests for that to be useful anyway.

@maleadt maleadt marked this pull request as draft October 4, 2021 16:00
@maleadt maleadt added bugfix This gets something working again. cuda array Stuff about CuArray. needs tests Tests are requested. labels Oct 4, 2021
lib/cusparse/array.jl Outdated Show resolved Hide resolved
@birkmichael
Copy link
Contributor Author

birkmichael commented Oct 4, 2021

Furthermore, a wrapped array isn't a subtype of the original array:

When similar is called on a reshaped array (CPU or GPU), parent is called to unwrap, and then similar is called on the unwrapped array. This is good behavior for GPU arrays as well. The problem is that when the call is made on the unwrapped array, eltype and dims are added as arguments, which is currently not handled by CuSparse, so that similar is called from Base and generates an Array rather than a CuArray.

shouldn't similar on a sparse array yield another sparse array, not a CuArray?

You're right. I thought I was mimicing julia's regular sparse behavior, but I just realized how and why similar(CPUSparseVar,dims::Dims{N>=3}) returns a CPUArray. I will fix my branch and resubmit.

Added more robust handling of applying similar on a CuSparseMatrixCSC/CSR, following the logic used in SparseArrays.jl
robust handling of similar CSC/CSR
two methods CSR were originally added as extensions to SparseArrays.jl, but SparseArrays.jl does not support CSR.
forgot Base. prefix to several instances of similar
used fill instead of CUDA.fill in matrix initialization
@birkmichael birkmichael marked this pull request as ready for review October 4, 2021 21:30
@codecov
Copy link

codecov bot commented Oct 4, 2021

Codecov Report

Attention: Patch coverage is 86.30137% with 10 lines in your changes missing coverage. Please review.

Project coverage is 80.50%. Comparing base (dae1e18) to head (3b64e56).
Report is 1125 commits behind head on master.

Files with missing lines Patch % Lines
lib/cusparse/array.jl 83.33% 10 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1184      +/-   ##
==========================================
+ Coverage   80.48%   80.50%   +0.02%     
==========================================
  Files         119      119              
  Lines        8393     8454      +61     
==========================================
+ Hits         6755     6806      +51     
- Misses       1638     1648      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

unclear what is supported in CUDA.jl for BSR,COO, so they remain untested.
Cu sparse Base.similar tests completed for CSC and CSR. No tests for BSR, COO, as it is unclear to what extent these are supported in CUDA.jl.
splatted 1D tuple dims when creating vector
@birkmichael
Copy link
Contributor Author

@maleadt So it looks like the tests failed, but not because of me. Some error in the comparison Array(dx[:])==x[:] in tests/cusparse.jl.

@maleadt
Copy link
Member

maleadt commented Oct 7, 2021

So it looks like the tests failed, but not because of me. Some error in the comparison Array(dx[:])==x[:] in tests/cusparse.jl.

CI on master is green, the failure is in tests/cusparse and this PR is changing stuff with sparse arrays, so...
What makes you say the failure is not related?

Added method for colon indexing a CuSparseMatrix, and a conversion from CuSparseMatrix to CuSparseMatrix to vector. Fixed show method of vectors to be same as for SparseArrays. CuSparseVector.dims is a 2-tuple instead of an integer, probably as a holdover from the base.show method that was written to fit both vectors and matrices, and needed to be able to run 'size(A)[2]'. That can probably be changed now, but I don't want to break anyone's code.
@birkmichael
Copy link
Contributor Author

@maleadt It looks like there were a bunch of tests that did indexing of a CuSparseMatrix and compared them to a CPU sparse matrix, a la Array(d_x[inds])==x[inds]. All of these worked in master because indexing with just a single colon or with ranges/vectors just turned the CuSparseMatrix into a dense CPU array, and then indexed into that (even before Array() was called). Now that similar is fixed to return the correct type of array, the lack of specialized methods for indexing CuSparseMatrix variables is causing errors. I'm thinking of solving all of these bugs just by converting to a sparse CPU array, letting SparseArrays do its magic, and then convert back to a CuSparseMatrix. It looks like these operations demand scalar indexing anyways, so it's not much of a performance loss. What do you think?

@maleadt
Copy link
Member

maleadt commented Oct 12, 2021

Copying an entire sparse array to the CPU when you only want to access a single element seems excessive... Scalar iteration is slow because it performs a memcpy for every element, that doesn't mean we should make it even slower by memcpying the entire array for each access.

@birkmichael
Copy link
Contributor Author

birkmichael commented Oct 12, 2021

It's not for a single element. Specifically, what exists now (and returns the right type):

  1. indexing with a single integer and a colon (both arrangements - row slicing and column slicing)
  2. indexing with two colons
  3. indexing with two integers
  4. indexing with a single colon - added in my PR, converts to a CPU sparse array, uses their algorithm to convert to a sparse vector (with scalar iteration), and then converts to a CuSparseVector.

What I'm talking about is indexing where one or both of the indices is a vector or range. In master, if one is a vector/range, you will get a cpu array. If both are, you will get an error (which is why the test failed). The simplest fix is by converting to CPU and back. Admittedly, this is probably not the most efficient way to go about adding those features, but I'm not sure there is a good method to do this kind of slicing without doing scalar iteration anyway, which is why I am proposing this method.

@birkmichael
Copy link
Contributor Author

All tests passed except those that fail master. Added a bunch of new tests (tests/cusparse/arrays.jl), so the coverage will probably improve. Anyway, now you can index and slice CuCSC and CuCSR matrices freely and they will stay the same type, rather than drop to the CPU. Any type of AbstractCuSparseMatrix can be turned into a CuSparseVector now. Everything is done using (almost only) vectorised methods without going through the CPU.

@@ -200,6 +200,59 @@ Base.similar(Mat::CuSparseMatrixCSC) = CuSparseMatrixCSC(copy(Mat.colPtr), copy(
Base.similar(Mat::CuSparseMatrixCSR) = CuSparseMatrixCSR(copy(Mat.rowPtr), copy(Mat.colVal), similar(nonzeros(Mat)), Mat.dims)
Base.similar(Mat::CuSparseMatrixBSR) = CuSparseMatrixBSR(copy(Mat.rowPtr), copy(Mat.colVal), similar(nonzeros(Mat)), Mat.blockDim, Mat.dir, nnz(Mat), Mat.dims)

## similar for CSC,CSR with dims, eltype arguments, adapted from SparseArrays.jl. NOTE: calling similar() on a wrapped CuSparseMatrixBSR/COO will result in a CuArray.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe wrap the NOTE on a new line.

But also: why does similar still result in a dense vector? #1184 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't - I spent a lot of time on that. I'll check.

Copy link
Member

@maleadt maleadt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I don't have the time for an in depth review right now, so some quick comments.


# Handles cases where CSC/CSR are reshaped to more than two dimensions, and all cases above for COO/BSR.
Base.similar(a::AbstractCuSparseArray, ::Type{T}, dims::Base.Dims{N}) where {T,N} =
CuArray{T,N}(undef, dims)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent?

Comment from above also applies.

r1 = convert(Int, x.colPtr[j])
r2 = convert(Int, x.colPtr[j+1]) - 1
CuSparseVector(rowvals(x)[r1:r2], nonzeros(x)[r1:r2], size(x, 1))
Base.getindex(A::CuSparseMatrixCSC,I::AbstractVector,J::AbstractVector) = CuSparseMatrixCSC(getindex(CuSparseMatrixCSR(getindex(A,:,J)),I,:))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe wrap these two methods for legibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you mean? Write a method for Base.getindex(A::CuSparseMatrixCSCR,I::AbstractVector,J::AbstractVector) that calls either of these two methods, depending on type?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, just line wrapping :-) i.e. adding a newline after the =

newcolptr = cumsum(diffVec)

indices = map((x,y)->range(x,stop=y),colptr[I_sorted],colptr[I_sorted.+1].-1)
@CUDA.allowscalar indices = vcat(indices...) #TODO: figure out how to avoid this step
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA module prefix shouldn't be needed.

But the scalar indexing is not good. where does it come from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm trying to do here is this: get an array indices of vectors, where indices[i] contains a vector of the indices of all elements in nzvals that are in the I_sorted[i] row (or column, depending on CSC or CSR) of the input matrix. After that, indexing becomes simple.

However, the map here returns a CuArray of ranges, and any indexing I do with that will cause a bug, so I need to turn the ranges into vectors first. If I had a method to get a CPU array of CuVectors from the map, instead of a CuArray of ranges, that would be unnecessary. Applying collect.(indices) gives me an "inline variables only" type of error because it tries to instantiate a CuArray of arrays of (sometimes) different sizes. Applying vcat(indices...) gives a scalar indexing error.

The other, more elegant method of solving this, is actually doing what I did in conversions.jl that you commented on - transforming colptr to a CPU array, in which case I can just get an array of arrays using a comprehension or a map.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the map here returns a CuArray of ranges, and any indexing I do with that will cause a bug, so I need to turn the ranges into vectors first.

Could you elaborate? And conversions to CPU data is extremely expensive and to be avoided if possible, so I'd rather look into fixing the bug you're encountering.

@@ -446,6 +535,7 @@ Base.copy(Mat::CuSparseMatrixCOO) = copyto!(similar(Mat), Mat)

# input/output


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some needless whitespace changes?

@@ -483,7 +573,6 @@ for (gpu, cpu) in [CuSparseMatrixCSC => SparseMatrixCSC,
end
end


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some needless whitespace changes?

@@ -0,0 +1,143 @@
using CUDA.CUSPARSE
using SparseArrays
using Test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need to import Test -- run select tests by doing julia -L test/setup.jl test/cusparse/array.jl

I = 1:m
nzval = nonzeros(Mat)
nzind = rowvals(Mat)
colptr = Array(getcolptr(Mat)) #TODO: figure out how to avoid this step
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems bad. there's no API call for such conversions? or maybe try doing the conversion on the GPU to avoid synchronizing during the copy to/from the CPU?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment Re: scalar indexing above - there I try doing it on the GPU and end up doing scalar indexing. Do you mean API calls to the NVidia CUDA library? I had a look now and couldn't find it, but I will give it a bit more effort later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix This gets something working again. cuda array Stuff about CuArray. needs tests Tests are requested.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants