Skip to content

Commit 1cd12dc

Browse files
authored
Merge pull request #13 from Xewar313/sparse-example
Add sparse matrix usage to example
2 parents c267492 + 4777d58 commit 1cd12dc

File tree

6 files changed

+244
-8
lines changed

6 files changed

+244
-8
lines changed

README.md

Lines changed: 20 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,7 @@ The distributed-ranges library provides data-structures, algorithms and views de
7171
Algorithms and data structures are designed to take the user off the need to worry about the technical details of their parallelism. An example would be the definition of a distributed vector in memory of multiple nodes connected using MPI.
7272

7373
```cpp
74-
dr::mhp::distributed_vector<double> dv(N);
74+
dr::mp::distributed_vector<double> dv(N);
7575
```
7676
7777
Such a vector, containing N elements, is automatically distributed among all the nodes involved in the calculation, with individual nodes storing an equal (if possible) amount of data.
@@ -82,12 +82,12 @@ In this way, many of the technical details related to the parallel execution of
8282
### Namespaces
8383
8484
General namespace used in the library is `dr::`
85-
For program using a single node with shared memory available for multiple CPUs and one or more GPUs, data structures and algorithms from `dr::shp::` namespace are provided.
86-
For distributed memory model, use the `dr::mhp::` namespace.
85+
For program using a single node with shared memory available for multiple CPUs and one or more GPUs, data structures and algorithms from `dr::sp::` namespace are provided.
86+
For distributed memory model, use the `dr::mp::` namespace.
8787
8888
### Data structures
8989
90-
Content of distributes-ranges' data structures is distributed over available nodes. For example, segments of `dr::mhp::distributed_vector` are located in memory of different nodes (mpi processes). Still, global view of the `distributed_vector` is uniform, with contiguous indices.
90+
Content of distributes-ranges' data structures is distributed over available nodes. For example, segments of `dr::mp::distributed_vector` are located in memory of different nodes (mpi processes). Still, global view of the `distributed_vector` is uniform, with contiguous indices.
9191
<!-- TODO: some pictures here -->
9292
9393
#### Halo concept
@@ -98,7 +98,7 @@ To support this situation, the concept of halo was introduced. A halo is an area
9898
9999
### Algorithms
100100
101-
Following algorithms are included in distributed-ranges, both in mhp and shp versions:
101+
Following algorithms are included in distributed-ranges, both in mp and sp versions:
102102
103103
```cpp
104104
copy()
@@ -151,16 +151,28 @@ The example shows the distributed nature of dr data structures. The distributed_
151151

152152
[./src/example4.cpp](src/example4.cpp)
153153

154-
This example illustrates adding two distributed,multidimensional arrays. Each array has two dimensions and is initialized by an `std::array`. The arrays are populated with sequential values using a distributed version of iota called `mhp::iota`. A for_each loop is the main part of the code, computing the sum on a specified number of nodes. It takes a lambda copy function along with two input arrays (a and b) and an output array (c) as parameters. The result is printed on a node 0.
154+
This example illustrates adding two distributed,multidimensional arrays. Each array has two dimensions and is initialized by an `std::array`. The arrays are populated with sequential values using a distributed version of iota called `mp::iota`. A for_each loop is the main part of the code, computing the sum on a specified number of nodes. It takes a lambda copy function along with two input arrays (a and b) and an output array (c) as parameters. The result is printed on a node 0.
155155

156156
### Example 5
157157

158158
[./src/example5.cpp](src/example5.cpp)
159159

160-
Example 5 outlines a method for calculating a 2D 5-point stencil with distributed multidimensional arrays, specifically utilizing `dr::mhp::distributed_mdarray`. Initially, it involves setting up key parameters like the radius for element exchange between nodes through `dr::mhp::halo`, and defining the start and end points of the array slice. The example's core is the `mhp::stencil_for_each` function, which applies a lambda function to two subsets of the array, designated as input and output. The `mdspan_stencil_op` lambda function conducts a simple calculation that involves adding together the values of an element and its adjacent elements and subsequently calculating their average. The `mhp::halo().exchange()` enables values to be shared across distinct nodes, making this process feasible. Ultimately, the outcomes of the calculation are neatly displayed on node 0 using mdspan(), resulting in a clear indication of the modifications made to the 2D array. This example is a practical demonstration of executing stencil operations on distributed arrays.
160+
Example 5 outlines a method for calculating a 2D 5-point stencil with distributed multidimensional arrays, specifically utilizing `dr::mp::distributed_mdarray`. Initially, it involves setting up key parameters like the radius for element exchange between nodes through `dr::mp::halo`, and defining the start and end points of the array slice. The example's core is the `mp::stencil_for_each` function, which applies a lambda function to two subsets of the array, designated as input and output. The `mdspan_stencil_op` lambda function conducts a simple calculation that involves adding together the values of an element and its adjacent elements and subsequently calculating their average. The `mp::halo().exchange()` enables values to be shared across distinct nodes, making this process feasible. Ultimately, the outcomes of the calculation are neatly displayed on node 0 using mdspan(), resulting in a clear indication of the modifications made to the 2D array. This example is a practical demonstration of executing stencil operations on distributed arrays.
161161

162162
### Example 6
163163

164164
[./src/example6.cpp](src/example6.cpp)
165165

166-
This example's code demonstrates a 2D pattern search in a distributed, multidimensional array (`mhp::distributed_mdarray<float, 2>`). It initializes a 2D array, populates it with `mhp::iota`, converts it to binary values using `mhp::transform` and defines a pattern of 2x2. A lambda function is used to scan the array and mark occurrences of the pattern in a separate array. The process is similar to the one demonstrated in example5.
166+
This example's code demonstrates a 2D pattern search in a distributed, multidimensional array (`mp::distributed_mdarray<float, 2>`). It initializes a 2D array, populates it with `mp::iota`, converts it to binary values using `mp::transform` and defines a pattern of 2x2. A lambda function is used to scan the array and mark occurrences of the pattern in a separate array. The process is similar to the one demonstrated in example5.
167+
168+
### Example 7
169+
170+
[./src/example7.cpp](src/example7.cpp)
171+
172+
This example showcases usage of `mp::distributed_sparse_matrix`. It retrieves data from `resources/example.mtx` file in root node, and distributes it between all nodes. The root node initializes vector and broadcasts it to every other node. After that, the `mp::gemv` operation is performed and result is returned to `std::vector<double>` in the root. Finally, the root prints the multiplied vector and the result.
173+
174+
### Example 8
175+
176+
[./src/example8.cpp](src/example8.cpp)
177+
178+
The example 8 is exactly the same as example 7, the only thing that is different is the initialization of the matrix data. Here the matrix is generated inside the code, has different shape and uses random values. Additionally, we print matrix data together with vector and result.

resources/example.mtx

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
%%MatrixMarket matrix coordinate real general
2+
25 25 49
3+
1 1 1.000e+00
4+
2 2 2.000e+00
5+
3 3 3.000e+00
6+
4 4 4.000e+00
7+
5 5 5.000e+00
8+
6 6 6.000e+00
9+
7 7 7.000e+00
10+
8 8 8.000e+00
11+
9 9 9.000e+00
12+
10 10 1.000e+01
13+
11 11 2.000e+01
14+
12 12 3.000e+01
15+
13 13 4.000e+01
16+
14 14 5.000e+01
17+
15 15 6.000e+01
18+
16 16 7.000e+01
19+
17 17 8.000e+01
20+
18 18 8.000e+01
21+
19 19 9.000e+01
22+
20 20 1.000e+02
23+
21 21 2.000e+02
24+
22 22 2.000e+02
25+
23 23 3.000e+02
26+
24 24 4.000e+02
27+
25 25 5.000e+02
28+
1 2 1.000e+00
29+
2 3 2.000e+00
30+
3 4 3.000e+00
31+
4 5 4.000e+00
32+
5 6 5.000e+00
33+
6 7 6.000e+00
34+
7 8 7.000e+00
35+
8 9 8.000e+00
36+
9 10 9.000e+00
37+
10 11 1.000e+01
38+
11 12 2.000e+01
39+
12 13 3.000e+01
40+
13 14 4.000e+01
41+
14 15 5.000e+01
42+
15 16 6.000e+01
43+
16 17 7.000e+01
44+
17 18 8.000e+01
45+
18 19 9.000e+01
46+
19 20 1.000e+01
47+
20 21 2.000e+01
48+
21 22 3.000e+01
49+
22 23 4.000e+01
50+
23 24 5.000e+01
51+
24 25 6.000e+01

scripts/build_run.sh

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,3 +16,5 @@ mpirun -n 3 ./build/src/example3
1616
mpirun -n 3 ./build/src/example4
1717
mpirun -n 3 ./build/src/example5
1818
mpirun -n 3 ./build/src/example6
19+
mpirun -n 3 ./build/src/example7
20+
mpirun -n 3 ./build/src/example8

src/CMakeLists.txt

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,3 +41,13 @@ add_executable(example6 example6.cpp)
4141

4242
target_compile_definitions(example6 INTERFACE DR_FORMAT)
4343
target_link_libraries(example6 DR::mpi fmt::fmt)
44+
45+
add_executable(example7 example7.cpp)
46+
47+
target_compile_definitions(example7 INTERFACE DR_FORMAT)
48+
target_link_libraries(example7 DR::mpi fmt::fmt)
49+
50+
add_executable(example8 example8.cpp)
51+
52+
target_compile_definitions(example8 INTERFACE DR_FORMAT)
53+
target_link_libraries(example8 DR::mpi fmt::fmt)

src/example7.cpp

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
// SPDX-FileCopyrightText: Intel Corporation
2+
//
3+
// SPDX-License-Identifier: BSD-3-Clause
4+
5+
#include <dr/mp.hpp>
6+
#include <fmt/core.h>
7+
#include <ranges>
8+
9+
/* Sparse band matrix vector multiplication */
10+
int main() {
11+
dr::mp::init(sycl::default_selector_v);
12+
using I = long;
13+
using V = double;
14+
15+
dr::views::csr_matrix_view<V, I> local_data;
16+
auto root = 0;
17+
if (root == dr::mp::rank()) {
18+
// x x 0 0 ... 0
19+
// 0 x x 0 ... 0
20+
// .............
21+
// 0 ... 0 0 x x
22+
auto source = "./resources/example.mtx";
23+
local_data = dr::read_csr<double, long>(source);
24+
}
25+
26+
dr::mp::distributed_sparse_matrix<
27+
V, I, dr::mp::MpiBackend,
28+
dr::mp::csr_eq_distribution<V, I, dr::mp::MpiBackend>>
29+
matrix(local_data, root);
30+
31+
dr::mp::broadcasted_vector<double> broadcasted_b;
32+
std::vector<double> b;
33+
if (root == dr::mp::rank()) {
34+
b.resize(matrix.shape().second);
35+
std::iota(b.begin(), b.end(), 1);
36+
37+
broadcasted_b.broadcast_data(matrix.shape().second, 0, b,
38+
dr::mp::default_comm());
39+
} else {
40+
broadcasted_b.broadcast_data(matrix.shape().second, 0,
41+
std::ranges::empty_view<V>(),
42+
dr::mp::default_comm());
43+
}
44+
std::vector<double> res(matrix.shape().first);
45+
gemv(root, res, matrix, broadcasted_b);
46+
47+
if (root == dr::mp::rank()) {
48+
fmt::print("Matrix imported from {}\n", "./resources/example.mtx");
49+
fmt::print("Input: ");
50+
for (auto x : b) {
51+
fmt::print("{} ", x);
52+
}
53+
fmt::print("\n");
54+
fmt::print("Matrix vector multiplication res: ");
55+
for (auto x : res) {
56+
fmt::print("{} ", x);
57+
}
58+
fmt::print("\n");
59+
}
60+
61+
if (root == dr::mp::default_comm().rank()) {
62+
dr::__detail::destroy_csr_matrix_view(local_data, std::allocator<V>{});
63+
}
64+
65+
dr::mp::finalize();
66+
67+
return 0;
68+
}

src/example8.cpp

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
// SPDX-FileCopyrightText: Intel Corporation
2+
//
3+
// SPDX-License-Identifier: BSD-3-Clause
4+
5+
#include <dr/mp.hpp>
6+
#include <fmt/core.h>
7+
#include <random>
8+
#include <ranges>
9+
10+
/* Sparse band matrix vector multiplication */
11+
int main() {
12+
dr::mp::init(sycl::default_selector_v);
13+
using I = long;
14+
using V = double;
15+
dr::views::csr_matrix_view<V, I> local_data;
16+
auto root = 0;
17+
if (root == dr::mp::rank()) {
18+
auto size = 10;
19+
auto nnz = 20;
20+
auto colInd = new I[nnz];
21+
auto rowInd = new I[size + 1];
22+
auto values = new V[nnz];
23+
std::uniform_real_distribution<double> unif(0, 1);
24+
std::default_random_engine re;
25+
// x x 0 0 ... 0
26+
// x x 0 0 ... 0
27+
// x 0 x 0 ... 0
28+
// x 0 0 x ... 0
29+
// .............
30+
// x ... 0 0 0 x
31+
for (auto i = 0; i <= size; i++) {
32+
rowInd[i] = i * 2; // two elements per row
33+
}
34+
for (auto i = 0; i < nnz; i++) {
35+
colInd[i] =
36+
(i % 2) * (std::max(i / 2, 1)); // column on 0 and diagonal (with
37+
// additional entry in first row)
38+
values[i] = unif(re);
39+
}
40+
41+
local_data = dr::views::csr_matrix_view<V, I>(values, rowInd, colInd,
42+
{size, size}, nnz, root);
43+
}
44+
45+
dr::mp::distributed_sparse_matrix<
46+
V, I, dr::mp::MpiBackend,
47+
dr::mp::csr_eq_distribution<V, I, dr::mp::MpiBackend>>
48+
matrix(local_data, root);
49+
50+
dr::mp::broadcasted_vector<double> broadcasted_b;
51+
std::vector<double> b;
52+
if (root == dr::mp::rank()) {
53+
b.resize(matrix.shape().second);
54+
std::iota(b.begin(), b.end(), 1);
55+
56+
broadcasted_b.broadcast_data(matrix.shape().second, 0, b,
57+
dr::mp::default_comm());
58+
} else {
59+
broadcasted_b.broadcast_data(matrix.shape().second, 0,
60+
std::ranges::empty_view<V>(),
61+
dr::mp::default_comm());
62+
}
63+
64+
std::vector<double> res(matrix.shape().first);
65+
gemv(root, res, matrix, broadcasted_b);
66+
67+
if (root == dr::mp::rank()) {
68+
fmt::print("Matrix with {} x {} and number of non-zero entries equal to {} "
69+
"and entries:\n",
70+
matrix.shape().first, matrix.shape().second, matrix.size());
71+
for (auto [i, v] : matrix) {
72+
auto [n, m] = i;
73+
fmt::print("Matrix entry <{}, {}, {}>\n", n, m, v);
74+
}
75+
fmt::print("Input: ");
76+
for (auto x : b) {
77+
fmt::print("{} ", x);
78+
}
79+
fmt::print("\n");
80+
fmt::print("Matrix vector multiplication res: ");
81+
for (auto x : res) {
82+
fmt::print("{} ", x);
83+
}
84+
fmt::print("\n");
85+
}
86+
87+
if (root == dr::mp::default_comm().rank()) {
88+
dr::__detail::destroy_csr_matrix_view(local_data, std::allocator<double>{});
89+
}
90+
dr::mp::finalize();
91+
92+
return 0;
93+
}

0 commit comments

Comments
 (0)