Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRS Arrays of Sparse Matrices #44

Open
JabirSS opened this issue Jan 8, 2019 · 16 comments
Open

CRS Arrays of Sparse Matrices #44

JabirSS opened this issue Jan 8, 2019 · 16 comments

Comments

@JabirSS
Copy link

JabirSS commented Jan 8, 2019

Dear @martinjrobins,
Thank you for your continuous development and support of Aboria.

I have been solving large systems of equations using Aboria and recently I found myself needing to use a different linear solver than Eigen since it's missing the Algebraic MultiGrid method which is a lot faster for my problem.
To do this, I created the sparse operator then used Eigen to assemble the matrix and obtain the compressed sparse row arrays I needed to pass to the external linear solver library (amgcl).
However, I noticed that the assemble command is quite slow (2x slower than the solution of the linear system) and runs in serial.

I would really appreciate any suggestions you may have to improve this. Currently is there any other way to get the CRS arrays directly from Aboria without resorting to Eigen?

@martinjrobins
Copy link
Collaborator

which assemble function are you using? There are two:

 template <typename MatrixType> void assemble(const MatrixType &matrix) const {

and

 template <typename Triplet>
  void assemble(std::vector<Triplet> &triplets, const size_t startI = 0,
                const size_t startJ = 0) const {

The 2nd returns a triplets vector, as needed to construct a sparse matrix in eigen. The 1st assumes you are passing in a dense eigen matrix. You could probably adapt the 2nd to instead return the CRS format required by amgcl?

@JabirSS
Copy link
Author

JabirSS commented Jan 8, 2019

I am using the first assemble function.
Eigen actually does a pretty good job in generating the CRS arrays, so I can live with it but the assemble function runs in serial and is currently the slowest part of the entire routine. Adding the #pragma omp parallel for didn't seem to help with either the first or 2nd function. Is it possible to get them to work in parallel?

@martinjrobins
Copy link
Collaborator

martinjrobins commented Jan 8, 2019 via email

@JabirSS
Copy link
Author

JabirSS commented Jan 8, 2019

Sorry but I am unable to get the second function to compile. Here's what I did:

auto myOperator = create_sparse_operator(nodes, nodes, radius, function);
Eigen::SparseMatrix<double> Matrix(N, N);

typedef Eigen::Triplet<double> T;
std::vector<T> tripletList;
tripletList.reserve(estimate);

myOperator.assemble(tripletList);
Matrix.setFromTriplets(tripletList.begin(), tripletList.end());

The compiler throws the following error for the assemble line:

template argument deduction/substitution failed:
‘std::vector<Eigen::Triplet<double> >’ is not derived from ‘Eigen::SparseMatrix<double, _Options, _StorageIndex>’


seems like it's still calling the wrong function?

@ghost
Copy link

ghost commented Jan 8, 2019

Sorry, I was looking at the code for the Kernel class, rather than the MatrixReplacement class. You might actually have been using setFromTriplets previously since the MatrixReplacement uses setFromTriplets if it's given a SparseMatrix, but this is still useful to try and see where the time cost comes from.

try this:

myOperator.get_first_kernel().assemble(tripletList);

@martinjrobins
Copy link
Collaborator

^^^^^^^^^^^
that was me (not @pints-bot), wrong log-in!

@JabirSS
Copy link
Author

JabirSS commented Jan 9, 2019

Unfortunately myOperator.get_first_kernel().assemble(tripletList); gives the same result in terms of performance as it also seems to run in serial.

@martinjrobins
Copy link
Collaborator

yes, this runs in serial. Its tricky, as unless you know how many non-zero values there are going to be in each row of the sparse matrix, it is going to be difficult to construct it in parallel. Do you absolutly need to construct the matrix? Is there any way you can use your multigrid solver in a matrix-free fashion?

@martinjrobins
Copy link
Collaborator

what you could do construct each row of the sparse matrix (i.e. a sequence of triplets) in parallel, and then have a serial portion at the end that concatenates them? If you could rewrite the 2nd function above to do this, then I'd (very) happily merge it into Aboria!

@JabirSS
Copy link
Author

JabirSS commented Jan 9, 2019 via email

@JabirSS
Copy link
Author

JabirSS commented Jan 11, 2019

Unfortunately the library I'm using doesn't support matrix free operations yet.
Honestly I couldn't figure out how the function in question works exactly, so I wrote one specific to my needs. It takes a reference to the array of triplets and fills in the matrix as follows:


void createMatrix(std::vector<Eigen::Triplet<double>> &tripletList)
{
	typedef Eigen::Triplet<double> T;
	std::vector<std::vector<T>> rowVals(nodes.size());
	double result;
	int nnz=0;  //number of non-zero elements
	int element_i;
#pragma omp parallel for private(element_i,result)
	for (element_i=0; element_i < nodes.size(); ++element_i) {
		auto i = container[element_i];
		for (auto j = euclidean_search(container.get_query(), get<position>(i), radius);
				j != false; ++j)
		{
			result =0;
			int element_j= &get<position>(*j) - get<position>(nodes).data();
                        result = someFunction(i,j)
			rowVals[element_i].push_back(T(element_i, element_j, result));
                        nnz++;
		}
	}
	tripletList.reserve(nnz);
	for(int row=0; row<nodes.size();++row)
		for(int col=0; col<rowVals[n].size();++col)
			tripletList.push_back(rowVals[row][col]);
}

I'm not sure if this is useful to you at all, since it is way too specific when compared to the function in Kernels.h but I thought you might be interested.
This function however runs about 3x faster than the original function, I'm sure better performance could be achieved if the last part where the triplets are concatenated is also parallized, but I found its cost negligible when compared to the rest of the routine.

@martinjrobins
Copy link
Collaborator

Great you were able to speed it up. This is useful, and along the lines of what I was thinking. Before putting it into Aboria, it would have to be able to handle matrix-valued kernels (i.e. a kernel function returning a matrix rather than a scalar). If you wish to do this, you are more than welcome of course (I can explain more about how this works), but I'd understand if you wish to move on with your own simulation!

@JabirSS
Copy link
Author

JabirSS commented Jan 11, 2019

Thank you, it would be great if you could explain the matrix-valued kernels or refer me to some reading materials, and I'll more than gladly write it down for addition to Aboria.
Also, how should one give credit to Aboria in a publication? Would citing the SoftwareX paper do?

@martinjrobins
Copy link
Collaborator

Unfortunately this isn't really documented anywhere. I was originally using the matrix-valued kernels for Boundary Element Simulations of stokes flow. This is a kernel based method that uses greens functions for the kernel, in this case the greens function for stokes flow, which returns a matrix of size 2x2, where 2 in this case is the number of spatial dimensions

Below is the current code for the assemble function that returns a vector of Triplets. It also takes two arguments, startI and startJ, there are the starting indicies of the matrix block that this kernel refers to (see https://martinjrobins.github.io/Aboria/aboria/evaluating_and_solving_kernel_op.html#aboria.evaluating_and_solving_kernel_op.block_operators)

The KernelSparse class defines a type Block (unrelated to the block operators that I mentioned in the previous paragraph), which is the type that is returned by the kernel function (i.e. the type returned by m_dx_function(dx, ai, bj)). For my stokes simulations Block is set to Eigen::Matrix<2,2,double>, for your simulation Block would simply be double. The class also has two constants BlockRows and BlockCols, which would be 2 and 2 for my stokes simulation, and 1 and 1 for yours.

To assemble the matrix, the code does an outer loop down the rows:

for (size_t i = 0; i < na; ++i) {

an inner loop down each non-zero block in that row

 for (auto pairj =
               euclidean_search(b.get_query(), get<position>(ai), radius);
           pairj != false; ++pairj) {

and then an inner-inner double loop for each element of that block

 for (size_t ii = 0; ii < BlockRows; ++ii) {
          for (size_t jj = 0; jj < BlockCols; ++jj) {

Does this make sense? You just have to use i, j, ii, jj, startI, and startJ to calculate the particular i,j index of each triplet added, which would be (i * BlockRows + ii + startI, j * BlockCols + jj + startJ)

template <typename Triplet>
void assemble(std::vector<Triplet> &triplets, const size_t startI = 0, const size_t startJ = 0) const {

    const RowElements &a = this->m_row_elements;
    const ColElements &b = this->m_col_elements;

    const size_t na = a.size();

    // sparse a x b block
    // std::cout << "sparse a x b block" << std::endl;
    for (size_t i = 0; i < na; ++i) {
      const_row_reference ai = a[i];
      const double radius = m_radius_function(ai);
      for (auto pairj =
               euclidean_search(b.get_query(), get<position>(ai), radius);
           pairj != false; ++pairj) {
        const_col_reference bj = *pairj;
        const_position_reference dx = pairj.dx();
        const size_t j = &get<position>(bj) - get<position>(b).data();
        const Block element = static_cast<Block>(m_dx_function(dx, ai, bj));
        for (size_t ii = 0; ii < BlockRows; ++ii) {
          for (size_t jj = 0; jj < BlockCols; ++jj) {
            triplets.push_back(Triplet(i * BlockRows + ii + startI,
                                       j * BlockCols + jj + startJ,
                                       element(ii, jj)));
          }
        }
      }
    }
  }

@martinjrobins
Copy link
Collaborator

Also, how should one give credit to Aboria in a publication? Would citing the SoftwareX paper do?

Yup, citing the SoftwareX paper would do it, thanks! I should put this in a CITATION.md file or something... If you want to send me the details of the publication (when its published) then I could start a list of publications using Aboria as well.

@JabirSS
Copy link
Author

JabirSS commented Jan 13, 2019

Thank you for the explanation. Yes, that makes perfect sense! I'll let you know how it goes.
The papers are either still under review or being written at the moment, but I will forward the details to you once they're approved and available online.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants