You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Computes the matrix-vector product sqrt(M)·v using a recursive algorithm.
3
-
For that, it requires a functor with a "dot" function that takes an output real* array and an input real* (both in device memory if compiled in CUDA mode or host memory otherwise) as:
4
-
```c++
5
-
virtualvoiddot(real* in_v, real * out_Mv) override;
6
-
```
7
-
The functor must inherit ```lanczos::MatrixDot``` (see example.cu).
8
-
This function must fill "out" with the result of performing the M·v dot product- > out = M·a_v.
9
-
If M has size NxN and the cost of the dot product is O(M). The total cost of the algorithm is O(m·M). Where m << N.
10
-
If M·v performs a dense M-V product, the cost of the algorithm would be O(m·N^2).
2
+
Computes the matrix-vector product sqrt(M)·v using a recursive algorithm.
3
+
For that, it requires a functor that takes an output real* array and an input real* (both in device memory if compiled in CUDA mode or host memory otherwise) as:
4
+
```c++
5
+
voidoperator()(real* in_v, real * out_Mv) override;
6
+
```
7
+
The functor can inherit ```lanczos::MatrixDot``` (see example.cu).
8
+
This function must fill "out" with the result of performing the M·v dot product- > out = M·a_v.
9
+
If M has size NxN and the cost of the dot product is O(M). The total cost of the algorithm is O(m·M). Where m << N.
10
+
If M·v performs a dense M-V product, the cost of the algorithm would be O(m·N^2).
11
11
12
12
This is a header-only library, although a shared library can be compiled instead.
13
13
14
-
## Usage:
14
+
## Usage:
15
15
16
-
See example.cu for an usage example that can be compiled to work in GPU or CPU mode instinctively.
17
-
See example.cpp for a CPU only example.
16
+
See example.cu for an usage example that can be compiled to work in GPU or CPU mode instinctively.
17
+
See example.cpp for a CPU only example.
18
18
19
-
Let us go through the remaining one, a GPU-only example.
19
+
Let us go through the remaining one, a GPU-only example.
20
20
21
21
Create the module:
22
22
```c++
@@ -30,8 +30,8 @@ Write a functor that computes the product between the original matrix and a give
30
30
structDiagonalMatrix: publiclanczos::MatrixDot{
31
31
int size;
32
32
DiagonalMatrix(int size): size(size){}
33
-
34
-
void dot(real* v, real* Mv) override{
33
+
34
+
void operator()(real* v, real* Mv) override{
35
35
//An example diagonal matrix
36
36
for(int i=0; i<size; i++){
37
37
Mv[i] = 2*v[i];
@@ -42,7 +42,7 @@ struct DiagonalMatrix: public lanczos::MatrixDot{
42
42
43
43
```
44
44
45
-
Provide the solver with an instance of the functor and the target vector:
45
+
Provide the solver with an instance of the functor and the target vector:
46
46
47
47
```c++
48
48
int size = 10;
@@ -55,7 +55,7 @@ Provide the solver with an instance of the functor and the target vector:
This library requires lapacke and cblas (can be replaced by MKL). In GPU mode CUDA is also needed.
76
-
Note, however, that the heavy-weight of this solver comes from the Matrix-vector multiplication that must provided by the user. The main benefit of the CUDA mode is not an increased performance of the internal library code, but the fact that the input/output arrays will live in the GPU (saving potential memory copies).
77
-
## Optional macros:
74
+
## Compilation:
75
+
This library requires lapacke and cblas (can be replaced by MKL). In GPU mode CUDA is also needed.
76
+
Note, however, that the heavy-weight of this solver comes from the Matrix-vector multiplication that must provided by the user. The main benefit of the CUDA mode is not an increased performance of the internal library code, but the fact that the input/output arrays will live in the GPU (saving potential memory copies).
77
+
## Optional macros:
78
78
79
-
**CUDA_ENABLED**: Will compile a GPU enabled shared library, the solver expects input/output arrays to be in the GPU and most of the computations will be carried out in the GPU. Requires a working CUDA environment.
80
-
**DOUBLE_PRECISION**: The library is compiled in single precision by default. This macro switches to double precision, making ```lanczos::real``` be a typedef to double.
81
-
**USE_MKL**: Will include mkl.h instead of lapacke and cblas. You will have to modify the compilation flags accordingly.
82
-
**SHARED_LIBRARY_COMPILATION**: The Makefile uses this macro to compile a shared library. By default, this library is header only.
79
+
**CUDA_ENABLED**: Will compile a GPU enabled shared library, the solver expects input/output arrays to be in the GPU and most of the computations will be carried out in the GPU. Requires a working CUDA environment.
80
+
**DOUBLE_PRECISION**: The library is compiled in single precision by default. This macro switches to double precision, making ```lanczos::real``` be a typedef to double.
81
+
**USE_MKL**: Will include mkl.h instead of lapacke and cblas. You will have to modify the compilation flags accordingly.
82
+
**SHARED_LIBRARY_COMPILATION**: The Makefile uses this macro to compile a shared library. By default, this library is header only.
83
83
84
-
See the Makefile for further instructions.
84
+
See the Makefile for further instructions.
85
85
86
86
## Python interface
87
87
@@ -91,13 +91,13 @@ See python/example.py for more information.
91
91
The root folder's Makefile will try to compile the python library as well. It expects pybind11 to be placed under the extern/ folder. Pybind11 is included as a submodule, so make sure to clone this repository with --recursive.
92
92
Note that the python wrapper can only be compiled in CPU mode.
93
93
94
-
## References:
94
+
## References:
95
+
96
+
[1] Krylov subspace methods for computing hydrodynamic interactions in Brownian dynamics simulations J. Chem. Phys. 137, 064106 (2012); doi: 10.1063/1.4742347
95
97
96
-
[1] Krylov subspace methods for computing hydrodynamic interactions in Brownian dynamics simulations J. Chem. Phys. 137, 064106 (2012); doi: 10.1063/1.4742347
97
-
98
-
## Some notes:
98
+
## Some notes:
99
99
100
-
From what I have seen, this algorithm converges to an error of ~1e-3 in a few steps (<5) and from that point a lot of iterations are needed to lower the error.
101
-
It usually achieves machine precision in under 50 iterations.
100
+
From what I have seen, this algorithm converges to an error of ~1e-3 in a few steps (<5) and from that point a lot of iterations are needed to lower the error.
101
+
It usually achieves machine precision in under 50 iterations.
102
102
103
-
If the matrix does not have a sqrt (not positive definite, not symmetric...) it will usually be reflected as a nan in the current error estimation. In this case an exception will be thrown.
103
+
If the matrix does not have a sqrt (not positive definite, not symmetric...) it will usually be reflected as a nan in the current error estimation. In this case an exception will be thrown.
0 commit comments