Closed
Description
For large matrices, the matrix square root is quite slow due to the cost of the Schur decomposition and of the (quasi)-triangular matrix square root. To speed up the latter part, the blocked Schur algorithm was developed:
- Deadman E., Higham N.J., Ralha R. (2013) Blocked Schur Algorithms for Computing the Matrix Square Root. In: Applied Parallel and Scientific Computing. PARA 2012. Lecture Notes in Computer Science, vol 7782. https://www.maths.manchester.ac.uk/~higham/papers/dhr13.pdf
Fig. 1 of the paper shows the substantial speed-ups from the blocked algorithm over the point algorithm (what we currently do).
A quick local test implementation shows that for a 1000x1000 matrix, the Schur decomposition takes 780ms, and the quasi-triangular square root takes an additional 600ms for the point method or 90ms for the blocked method. For a 2000x2000 matrix, the Schur decomposition takes 4.5s, and the quasi-triangular square roots takes an additional 9s for the point method or 400ms for the blocked method.
Metadata
Metadata
Assignees
Labels
No labels