Skip to content

Commit 4db30b0

Browse files
committed
[SPARK-5507] Added documentation for BlockMatrix
1 parent 4d4cc76 commit 4db30b0

File tree

1 file changed

+75
-0
lines changed

1 file changed

+75
-0
lines changed

docs/mllib-data-types.md

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -296,6 +296,81 @@ backed by an RDD of its entries.
296296
The underlying RDDs of a distributed matrix must be deterministic, because we cache the matrix size.
297297
In general the use of non-deterministic RDDs can lead to errors.
298298

299+
### BlockMatrix
300+
301+
A `BlockMatrix` is a distributed matrix backed by an RDD of `MatrixBlock`s, where `MatrixBlock` is
302+
a tuple of `((Int, Int), Matrix)`, where the `(Int, Int)` is the index of the block, and `Matrix` is
303+
the sub-matrix at the given index with size `rowsPerBlock` x `colsPerBlock`.
304+
`BlockMatrix` supports methods such as `.add` and `.multiply` with another `BlockMatrix`.
305+
`BlockMatrix` also has a helper function `.validate` which can be used to debug whether the
306+
`BlockMatrix` is set up properly.
307+
308+
<div class="codetabs">
309+
<div data-lang="scala" markdown="1">
310+
311+
A [`BlockMatrix`](api/scala/index.html#org.apache.spark.mllib.linalg.distributed.BlockMatrix) can be
312+
most easily created from an `IndexedRowMatrix` or `CoordinateMatrix` using `.toBlockMatrix()`.
313+
`.toBlockMatrix()` will create blocks of size 1024 x 1024. Users may change the sizes of their blocks
314+
by supplying the values through `.toBlockMatrix(rowsPerBlock, colsPerBlock)`.
315+
316+
{% highlight scala %}
317+
import org.apache.spark.mllib.linalg.SingularValueDecomposition
318+
import org.apache.spark.mllib.linalg.distributed.{BlockMatrix, CoordinateMatrix, MatrixEntry}
319+
320+
val entries: RDD[MatrixEntry] = ... // an RDD of (i, j, v) matrix entries
321+
// Create a CoordinateMatrix from an RDD[MatrixEntry].
322+
val coordMat: CoordinateMatrix = new CoordinateMatrix(entries)
323+
// Transform the CoordinateMatrix to a BlockMatrix
324+
val matA: BlockMatrix = coordMat.toBlockMatrix().cache()
325+
326+
// validate whether the BlockMatrix is set up properly. Throws an Exception when it is not valid.
327+
// Nothing happens if it is valid.
328+
matA.validate
329+
330+
// Calculate A^T A.
331+
val AtransposeA = matA.transpose.multiply(matA)
332+
333+
// get SVD of 2 * A
334+
val A2 = matA.add(matA)
335+
val svd = A2.toIndexedRowMatrix().computeSVD(20, false, 1e-9)
336+
{% endhighlight %}
337+
</div>
338+
339+
<div data-lang="java" markdown="1">
340+
341+
A [`BlockMatrix`](api/scala/index.html#org.apache.spark.mllib.linalg.distributed.BlockMatrix) can be
342+
most easily created from an `IndexedRowMatrix` or `CoordinateMatrix` using `.toBlockMatrix()`.
343+
`.toBlockMatrix()` will create blocks of size 1024 x 1024. Users may change the sizes of their blocks
344+
by supplying the values through `.toBlockMatrix(rowsPerBlock, colsPerBlock)`.
345+
346+
{% highlight java %}
347+
import org.apache.spark.api.java.JavaRDD;
348+
import org.apache.spark.mllib.linalg.SingularValueDecomposition;
349+
import org.apache.spark.mllib.linalg.distributed.BlockMatrix;
350+
import org.apache.spark.mllib.linalg.distributed.CoordinateMatrix;
351+
import org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix;
352+
353+
JavaRDD<MatrixEntry> entries = ... // a JavaRDD of (i, j, v) Matrix Entries
354+
// Create a CoordinateMatrix from a JavaRDD<MatrixEntry>.
355+
CoordinateMatrix coordMat = new CoordinateMatrix(entries.rdd());
356+
// Transform the CoordinateMatrix to a BlockMatrix
357+
BlockMatrix matA = coordMat.toBlockMatrix().cache();
358+
359+
// validate whether the BlockMatrix is set up properly. Throws an Exception when it is not valid.
360+
// Nothing happens if it is valid.
361+
matA.validate();
362+
363+
// Calculate A^T A.
364+
BlockMatrix AtransposeA = matA.transpose().multiply(matA);
365+
366+
// get SVD of 2 * A
367+
BlockMatrix A2 = matA.add(matA);
368+
SingularValueDecomposition<IndexedRowMatrix, Matrix> svd =
369+
A2.toIndexedRowMatrix().computeSVD(20, false, 1e-9);
370+
{% endhighlight %}
371+
</div>
372+
</div>
373+
299374
### RowMatrix
300375

301376
A `RowMatrix` is a row-oriented distributed matrix without meaningful row indices, backed by an RDD

0 commit comments

Comments
 (0)