-
Notifications
You must be signed in to change notification settings - Fork 252
Addition and multiplication over cuarray and cusparse #1120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## tb/sanitize #1120 +/- ##
================================================
+ Coverage 66.64% 78.42% +11.78%
================================================
Files 118 119 +1
Lines 7668 8110 +442
================================================
+ Hits 5110 6360 +1250
+ Misses 2558 1750 -808
Continue to review full report at Codecov.
|
@maleadt Is the CI working? |
Yes? Why wouldn't it be? compute-sanitizer seems to fine an issue with some conversion:
|
I take it this also fixes #528? |
Sure! It fixes #528. |
It seems to fail on v1.6 (debug)? |
Right, so there's something wrong with this PR or some of the kernels it calls. See the error log I posted. |
It seems like error occurs while freeing buffer?
|
No, that's unrelated. The error code only triggers there because it's the first synchronous call. The real issue is spotted by compute-sanitizer, and is the error I linked above. |
I still cannot find where |
It's the name of a kernel, probably an internal one in CUSPARSE. We don't currently have host backtraces (disabled due to a compute-santizer bug), so can't report the Julia function that called the kernel. Anyway, this is why I was looking into upgrading the CUDA toolkit used for these debug runs in #950, since maybe the issue is with CUDA (i.e. not us wrongly calling that kernel). |
COO + CSR or CSR + COO will trigger the bug, so I temporally turn down the tests. |
function Base.:(+)(A::CuSparseMatrixCSR, B::CuSparseMatrix) | ||
csrB = CuSparseMatrixCSR(B) | ||
return geam(one(eltype(A)), A, one(eltype(A)), csrB, 'O') | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the CuSparseMatrixCSR(B)
constructor and when B
is a CuSparseMatrixCOO
will trigger the bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like occur in cusparseXcoo2csr
.
It's ready for review. |
Rebased on top of the PR with CI fixes. |
Closes #1113