-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-17721][MLlib][ML] Fix for multiplying transposed SparseMatrix with SparseVector #15296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
k += 1 | ||
i += 1 | ||
} | ||
else if (xIndices(k) < Acols(i)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: pull the elses up onto the previous line. Does this affect the non-transposed case below too? I believe your change is indeed correct and the tests demonstrate that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
elses on previous line, done. I must admit I haven't gone through the non-transpose case in as much detail, but that does not seem affected by this bug. Should I perhaps add a test that shows the non-transpose case is/remains unaffected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would be ideal if it's not already covered, just to be sure. It ought to be a straightforward modification of a bit of test code you just added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check
ok to test |
Test build #66126 has finished for PR 15296 at commit
|
LGTM |
…with SparseVector ## What changes were proposed in this pull request? * changes the implementation of gemv with transposed SparseMatrix and SparseVector both in mllib-local and mllib (identical) * adds a test that was failing before this change, but succeeds with these changes. The problem in the previous implementation was that it only increments `i`, that is enumerating the columns of a row in the SparseMatrix, when the row-index of the vector matches the column-index of the SparseMatrix. In cases where a particular row of the SparseMatrix has non-zero values at column-indices lower than corresponding non-zero row-indices of the SparseVector, the non-zero values of the SparseVector are enumerated without ever matching the column-index at index `i` and the remaining column-indices i+1,...,indEnd-1 are never attempted. The test cases in this PR illustrate this issue. ## How was this patch tested? I have run the specific `gemv` tests in both mllib-local and mllib. I am currently still running `./dev/run-tests`. ## ___ As per instructions, I hereby state that this is my original work and that I license the work to the project (Apache Spark) under the project's open source license. Mentioning dbtsai, viirya and brkyvz whom I can see have worked/authored on these parts before. Author: Bjarne Fruergaard <bwahlgreen@gmail.com> Closes #15296 from bwahlgreen/bugfix-spark-17721. (cherry picked from commit 29396e7) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
@bwahlgreen Actually since mllib-local was not in 1.6 or 1.5, would you mind sending a backport PR against branch-1.6? Thank you! |
@@ -638,12 +638,16 @@ private[spark] object BLAS extends Serializable { | |||
val indEnd = Arows(rowCounter + 1) | |||
var sum = 0.0 | |||
var k = 0 | |||
while (k < xNnz && i < indEnd) { | |||
while (i < indEnd && k < xNnz) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: an unnecessary change?
LGTM although it was merged. |
…atrix with SparseVector Backport PR of changes relevant to mllib only, but otherwise identical to #15296 jkbradley Author: Bjarne Fruergaard <bwahlgreen@gmail.com> Closes #15311 from bwahlgreen/bugfix-spark-17721-1.6.
…atrix with SparseVector Backport PR of changes relevant to mllib only, but otherwise identical to #15296 jkbradley Author: Bjarne Fruergaard <bwahlgreen@gmail.com> Closes #15311 from bwahlgreen/bugfix-spark-17721-1.6. (cherry picked from commit 376545e) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
…atrix with SparseVector Backport PR of changes relevant to mllib only, but otherwise identical to apache#15296 jkbradley Author: Bjarne Fruergaard <bwahlgreen@gmail.com> Closes apache#15311 from bwahlgreen/bugfix-spark-17721-1.6. (cherry picked from commit 376545e)
What changes were proposed in this pull request?
The problem in the previous implementation was that it only increments
i
, that is enumerating the columns of a row in the SparseMatrix, when the row-index of the vector matches the column-index of the SparseMatrix. In cases where a particular row of the SparseMatrix has non-zero values at column-indices lower than corresponding non-zero row-indices of the SparseVector, the non-zero values of the SparseVector are enumerated without ever matching the column-index at indexi
and the remaining column-indices i+1,...,indEnd-1 are never attempted. The test cases in this PR illustrate this issue.How was this patch tested?
I have run the specific
gemv
tests in both mllib-local and mllib. I am currently still running./dev/run-tests
.___
As per instructions, I hereby state that this is my original work and that I license the work to the project (Apache Spark) under the project's open source license.
Mentioning @dbtsai, @viirya and @brkyvz whom I can see have worked/authored on these parts before.