Skip to content

[SPARK-17721][MLlib][ML] Fix for multiplying transposed SparseMatrix with SparseVector #15296

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

bwahlgreen
Copy link
Contributor

What changes were proposed in this pull request?

  • changes the implementation of gemv with transposed SparseMatrix and SparseVector both in mllib-local and mllib (identical)
  • adds a test that was failing before this change, but succeeds with these changes.

The problem in the previous implementation was that it only increments i, that is enumerating the columns of a row in the SparseMatrix, when the row-index of the vector matches the column-index of the SparseMatrix. In cases where a particular row of the SparseMatrix has non-zero values at column-indices lower than corresponding non-zero row-indices of the SparseVector, the non-zero values of the SparseVector are enumerated without ever matching the column-index at index i and the remaining column-indices i+1,...,indEnd-1 are never attempted. The test cases in this PR illustrate this issue.

How was this patch tested?

I have run the specific gemv tests in both mllib-local and mllib. I am currently still running ./dev/run-tests.

___

As per instructions, I hereby state that this is my original work and that I license the work to the project (Apache Spark) under the project's open source license.

Mentioning @dbtsai, @viirya and @brkyvz whom I can see have worked/authored on these parts before.

k += 1
i += 1
}
else if (xIndices(k) < Acols(i)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: pull the elses up onto the previous line. Does this affect the non-transposed case below too? I believe your change is indeed correct and the tests demonstrate that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elses on previous line, done. I must admit I haven't gone through the non-transpose case in as much detail, but that does not seem affected by this bug. Should I perhaps add a test that shows the non-transpose case is/remains unaffected?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would be ideal if it's not already covered, just to be sure. It ought to be a straightforward modification of a bit of test code you just added.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check

@bwahlgreen bwahlgreen changed the title [SPARK-17721][MLlib][ML] Fix for multiplying transposed SparseMatrix with SparseVector [WIP] [SPARK-17721][MLlib][ML] Fix for multiplying transposed SparseMatrix with SparseVector Sep 29, 2016
@jkbradley
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Sep 29, 2016

Test build #66126 has finished for PR 15296 at commit cdc3642.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

LGTM
I'll merge this with master and plan to backport it to branches for 2.0, 1.6, and 1.5.
Thanks!

asfgit pushed a commit that referenced this pull request Sep 29, 2016
…with SparseVector

## What changes were proposed in this pull request?

* changes the implementation of gemv with transposed SparseMatrix and SparseVector both in mllib-local and mllib (identical)
* adds a test that was failing before this change, but succeeds with these changes.

The problem in the previous implementation was that it only increments `i`, that is enumerating the columns of a row in the SparseMatrix, when the row-index of the vector matches the column-index of the SparseMatrix. In cases where a particular row of the SparseMatrix has non-zero values at column-indices lower than corresponding non-zero row-indices of the SparseVector, the non-zero values of the SparseVector are enumerated without ever matching the column-index at index `i` and the remaining column-indices i+1,...,indEnd-1 are never attempted. The test cases in this PR illustrate this issue.

## How was this patch tested?

I have run the specific `gemv` tests in both mllib-local and mllib. I am currently still running `./dev/run-tests`.

## ___
As per instructions, I hereby state that this is my original work and that I license the work to the project (Apache Spark) under the project's open source license.

Mentioning dbtsai, viirya and brkyvz whom I can see have worked/authored on these parts before.

Author: Bjarne Fruergaard <bwahlgreen@gmail.com>

Closes #15296 from bwahlgreen/bugfix-spark-17721.

(cherry picked from commit 29396e7)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
@jkbradley
Copy link
Member

@bwahlgreen Actually since mllib-local was not in 1.6 or 1.5, would you mind sending a backport PR against branch-1.6? Thank you!

@asfgit asfgit closed this in 29396e7 Sep 29, 2016
@@ -638,12 +638,16 @@ private[spark] object BLAS extends Serializable {
val indEnd = Arows(rowCounter + 1)
var sum = 0.0
var k = 0
while (k < xNnz && i < indEnd) {
while (i < indEnd && k < xNnz) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: an unnecessary change?

@viirya
Copy link
Member

viirya commented Sep 30, 2016

LGTM although it was merged.

asfgit pushed a commit that referenced this pull request Oct 2, 2016
…atrix with SparseVector

Backport PR of changes relevant to mllib only, but otherwise identical to #15296

jkbradley

Author: Bjarne Fruergaard <bwahlgreen@gmail.com>

Closes #15311 from bwahlgreen/bugfix-spark-17721-1.6.
asfgit pushed a commit that referenced this pull request Oct 2, 2016
…atrix with SparseVector

Backport PR of changes relevant to mllib only, but otherwise identical to #15296

jkbradley

Author: Bjarne Fruergaard <bwahlgreen@gmail.com>

Closes #15311 from bwahlgreen/bugfix-spark-17721-1.6.

(cherry picked from commit 376545e)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
zzcclp pushed a commit to zzcclp/spark that referenced this pull request Oct 8, 2016
…atrix with SparseVector

Backport PR of changes relevant to mllib only, but otherwise identical to apache#15296

jkbradley

Author: Bjarne Fruergaard <bwahlgreen@gmail.com>

Closes apache#15311 from bwahlgreen/bugfix-spark-17721-1.6.

(cherry picked from commit 376545e)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants