@@ -296,6 +296,81 @@ backed by an RDD of its entries.
296
296
The underlying RDDs of a distributed matrix must be deterministic, because we cache the matrix size.
297
297
In general the use of non-deterministic RDDs can lead to errors.
298
298
299
+ ### BlockMatrix
300
+
301
+ A ` BlockMatrix ` is a distributed matrix backed by an RDD of ` MatrixBlock ` s, where ` MatrixBlock ` is
302
+ a tuple of ` ((Int, Int), Matrix) ` , where the ` (Int, Int) ` is the index of the block, and ` Matrix ` is
303
+ the sub-matrix at the given index with size ` rowsPerBlock ` x ` colsPerBlock ` .
304
+ ` BlockMatrix ` supports methods such as ` .add ` and ` .multiply ` with another ` BlockMatrix ` .
305
+ ` BlockMatrix ` also has a helper function ` .validate ` which can be used to debug whether the
306
+ ` BlockMatrix ` is set up properly.
307
+
308
+ <div class =" codetabs " >
309
+ <div data-lang =" scala " markdown =" 1 " >
310
+
311
+ A [ ` BlockMatrix ` ] ( api/scala/index.html#org.apache.spark.mllib.linalg.distributed.BlockMatrix ) can be
312
+ most easily created from an ` IndexedRowMatrix ` or ` CoordinateMatrix ` using ` .toBlockMatrix() ` .
313
+ ` .toBlockMatrix() ` will create blocks of size 1024 x 1024. Users may change the sizes of their blocks
314
+ by supplying the values through ` .toBlockMatrix(rowsPerBlock, colsPerBlock) ` .
315
+
316
+ {% highlight scala %}
317
+ import org.apache.spark.mllib.linalg.SingularValueDecomposition
318
+ import org.apache.spark.mllib.linalg.distributed.{BlockMatrix, CoordinateMatrix, MatrixEntry}
319
+
320
+ val entries: RDD[ MatrixEntry] = ... // an RDD of (i, j, v) matrix entries
321
+ // Create a CoordinateMatrix from an RDD[ MatrixEntry] .
322
+ val coordMat: CoordinateMatrix = new CoordinateMatrix(entries)
323
+ // Transform the CoordinateMatrix to a BlockMatrix
324
+ val matA: BlockMatrix = coordMat.toBlockMatrix().cache()
325
+
326
+ // validate whether the BlockMatrix is set up properly. Throws an Exception when it is not valid.
327
+ // Nothing happens if it is valid.
328
+ matA.validate
329
+
330
+ // Calculate A^T A.
331
+ val AtransposeA = matA.transpose.multiply(matA)
332
+
333
+ // get SVD of 2 * A
334
+ val A2 = matA.add(matA)
335
+ val svd = A2.toIndexedRowMatrix().computeSVD(20, false, 1e-9)
336
+ {% endhighlight %}
337
+ </div >
338
+
339
+ <div data-lang =" java " markdown =" 1 " >
340
+
341
+ A [ ` BlockMatrix ` ] ( api/scala/index.html#org.apache.spark.mllib.linalg.distributed.BlockMatrix ) can be
342
+ most easily created from an ` IndexedRowMatrix ` or ` CoordinateMatrix ` using ` .toBlockMatrix() ` .
343
+ ` .toBlockMatrix() ` will create blocks of size 1024 x 1024. Users may change the sizes of their blocks
344
+ by supplying the values through ` .toBlockMatrix(rowsPerBlock, colsPerBlock) ` .
345
+
346
+ {% highlight java %}
347
+ import org.apache.spark.api.java.JavaRDD;
348
+ import org.apache.spark.mllib.linalg.SingularValueDecomposition;
349
+ import org.apache.spark.mllib.linalg.distributed.BlockMatrix;
350
+ import org.apache.spark.mllib.linalg.distributed.CoordinateMatrix;
351
+ import org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix;
352
+
353
+ JavaRDD<MatrixEntry > entries = ... // a JavaRDD of (i, j, v) Matrix Entries
354
+ // Create a CoordinateMatrix from a JavaRDD<MatrixEntry >.
355
+ CoordinateMatrix coordMat = new CoordinateMatrix(entries.rdd());
356
+ // Transform the CoordinateMatrix to a BlockMatrix
357
+ BlockMatrix matA = coordMat.toBlockMatrix().cache();
358
+
359
+ // validate whether the BlockMatrix is set up properly. Throws an Exception when it is not valid.
360
+ // Nothing happens if it is valid.
361
+ matA.validate();
362
+
363
+ // Calculate A^T A.
364
+ BlockMatrix AtransposeA = matA.transpose().multiply(matA);
365
+
366
+ // get SVD of 2 * A
367
+ BlockMatrix A2 = matA.add(matA);
368
+ SingularValueDecomposition<IndexedRowMatrix, Matrix> svd =
369
+ A2.toIndexedRowMatrix().computeSVD(20, false, 1e-9);
370
+ {% endhighlight %}
371
+ </div >
372
+ </div >
373
+
299
374
### RowMatrix
300
375
301
376
A ` RowMatrix ` is a row-oriented distributed matrix without meaningful row indices, backed by an RDD
0 commit comments