☕Implement of Parallel Matrix Multiplication Methods Using FOX Algorithm on Peking University's High-performance Computing System
- 规约计算 (Reduction)
- 拥有者计算原则 (Owner Computing Rule)
- 流水并行(Pipeline Parallelism):
- 在一个进程上,矩阵计算被划分为P个阶段 (P Supercomputing Steps in a Process)
- 数据并行 (Data Parallelism):
- 在每个进程上同时计算局部的矩阵乘积 (Local Matrix Multiplications are computing on every processess at the same Computing Step)
-
Mathematical Modeling of Matrix Multiplication
-
Time Complexity
-
Storage Complexity
-
Example Implementation in C Language
for (i = 0; i < n; i++)
for (j = 0; j < n; j++)
for (k = 0; k < n; k++)
C(i,j) = C(i,j) + A(i,k)*B(k,j);
- Basic Flow
- Matrix
's Dimension is
, and Matirx
's Dimension is a
.
- Compute Matrix
in parallel.
- Let
is the number of processors, and
be an integer such that it devides
and
.
- Create a Cartesian topology with process mesh
, and
,
.
- Denote
,
,
.
- Distribute
and
by blocks on p processess such that
is
block and
is
block, stored on process
.
- Details
-
Partitions of Matrices A, B and C. (Index syntax in Mathematical form: start from 1)
-
Data Distribution on the 2-D Cartesian Topology Processes Mesh (Index syntax in Mathematical formulars: start from 1)