Add the recursive blocked Schur algorithm for matrix square root #40239

sethaxen · 2021-03-27T21:34:42Z

Fixes JuliaLang/LinearAlgebra.jl#829 by implementing the recursive version of the blocked Schur algorithm for the matrix square root. The speed-ups come from greater use of Level 3 BLAS routines.
It also adds a recursive quasitriangular Sylvester solver, which for large matrices is much faster than LAPACK.trsyl!. sylvester should probably call this function, but that could be a future PR.

Benchmark

This benchmark compares the "point" algorithm (what we currently do) with the recursive algorithm (this PR). For a 4000x4000 upper triangular matrix, the recursive algorithm is nearly 2 orders of magnitude faster. This is a greater improvement than in the paper, where they saw an up to 6x speed-up; it seems that this recursive implementation is faster than the paper's.

using LinearAlgebra, BenchmarkTools, Random

ns =  vcat(1, 250:250:4000)

rng = MersenneTwister(42)
time_complex = map(ns) do n
    A = UpperTriangular(rand(rng, ComplexF64, n, n))
    if n ≤ 1000
        t = @belapsed sqrt($A)
    else
        t = @elapsed sqrt(A)
    end
    @show n, t
    t
end

time_real = map(ns) do n
    A = UpperTriangular(rand(rng, n, n))^2
    if n ≤ 1000
        t = @belapsed sqrt($A)
    else
        t = @elapsed sqrt(A)
    end
    @show n, t
    t
end

using Plots
kwargs = (label=["point" "recursive"], ylabel="time (s)", xlabel="n", legend=:topleft, yscale=:log10, linewidth=2)
plot(ns, [time_complex_point time_complex_recur]; kwargs...)
plot(ns, [time_real_point time_real_recur]; kwargs...)

sqrt(::UpperTriangular{ComplexF64})

sqrt(::UpperTriangular{Float64})

sethaxen · 2021-03-27T22:27:54Z

The recursive version is currently applied for matrices with size greater than (64, 64). These plots shows that the point version is still faster at size (65, 65), so unless we can speed up the recursive version, it probably should only be used for matrices with size greater or equal to (512,512).

This benchmark includes n in [64, 65, 128] (the dashed line marks n=65):

sqrt(::UpperTriangular{ComplexF64})

sqrt(::UpperTriangular{Float64})

oscardssmith · 2021-04-04T04:40:48Z

Does this have a noticeable effect on accuracy? If so, in which direction?

sethaxen · 2021-04-04T05:05:32Z

Does this have a noticeable effect on accuracy? If so, in which direction?

Good question. Not certain how to determine the answer. The existing test suite passes, and in Section 3 of the original paper they worked out that the blocked algorithms satisfy the same error bounds as the original point algorithm.

sethaxen · 2021-04-04T05:09:18Z

stdlib/LinearAlgebra/src/triangular.jl

+        try
+            _, scale = LAPACK.trsyl!('N', 'N', A, B, C)
+            rmul!(C, -inv(scale))
+        catch e
+            if !(catcherr && e isa LAPACKException)
+                throw(e)
+            end
+        end


For just about any large real triangular matrix A created as A=UpperTriangular(randn(n, n))^2, sqrt(A)^2 fails to be approximately equal to A (the top right terms explode). This is true for the point algorithm in 1.6 and on master as well. In the blocked version, LAPACK.trsyl! will throw a LAPACKException for these matrices, I guess indicating that it could not solve Sylvester's equation. But if that exception is caught, the result seems to be equivalent to what the point algorithm would have returned. Hence why we have this try/catch.

maybe a comment to justify ignoring LAPACKException? Should it get a warning when convergence fails?

I'll add a comment. RE a warning, perhaps? Not really certain what the conventions are in the stdlib regarding warnings.

While writing the tests for triangular.jl I realized how poorly conditioned UpperTriangular(randn(n, n)) is. It's an easy way to generate a triangular test matrix but it's often not representative for real world triangular matrices that are typically a result of a matrix factorization. E.g.

julia> mean(cond(lu(randn(100, 100)).U) for i in 1:10000) 2309.908474551785 julia> mean(cond(triu(randn(100, 100))) for i in 1:10000) 2.915729043528958e19

so I generally think it would be better to construct triangular test matrices from an LU.

A separate question is the handling of non-zero info values in LAPACK (or almost any other mathematical function defined in C/Fortran). I'm not sure we made the right decision when we made all of these throw instead of returning the exit status in some form. However, that is a very big issue to tackle so the your current solution is probably fine (although I generally find try/catch pretty crude).

oscardssmith · 2021-04-04T05:26:35Z

That all sounds good. It might be a good idea to run this on a few random matrices of various structure and compare with BigFloat to confirm.

sethaxen · 2021-04-04T06:07:46Z

That all sounds good. It might be a good idea to run this on a few random matrices of various structure and compare with BigFloat to confirm.

Are you thinking something like this?

n = 256
A = rand(ComplexF64, n, n)^2
T = schur(A).T
Tbig = Complex{BigFloat}.(T)
@test LinearAlgebra.sqrt_quasitriu(T) ≈ LinearAlgebra.sqrt_quasitriu(Tbig)

We unfortunately cannot check the real quasi-triangular case this way because schur and LAPACK.trsyl! are not implemented for BigFloat eltypes. So I think we could only test real and complex upper-triangular matrices this way.

sethaxen · 2021-04-09T10:52:30Z

For completeness, here's a benchmark of the blocked recursive Sylvester solver used here vs LAPACK.trsyl!:

using LinearAlgebra, BenchmarkTools, Random, Plots

ns =  vcat(1, 64, 65, 128, 256, 512, 768, 1024)

rng = MersenneTwister(42)
time_complex = map(ns) do n
    A = schur(rand(rng, ComplexF64, n, n)).T
    B = schur(rand(rng, ComplexF64, n, n)).T
    C = rand(rng, ComplexF64, n, n)
    told = @belapsed $(LAPACK.trsyl!)('N', 'N', $A, $B, C) setup=(C=copy($C)) samples=100
    tnew = @belapsed $(LinearAlgebra._sylvester_quasitriu!)($A, $B, C) setup=(C=copy($C)) samples=100
    @show n, told, tnew
    told, tnew
end

time_real = map(ns) do n
    A = schur(rand(rng, n, n)).T
    B = schur(rand(rng, n, n)).T
    C = rand(rng, n, n)
    told = @belapsed $(LAPACK.trsyl!)('N', 'N', $A, $B, C) setup=(C=copy($C)) samples=100
    tnew = @belapsed $(LinearAlgebra._sylvester_quasitriu!)($A, $B, C) setup=(C=copy($C)) samples=100
    @show n, told, tnew
    told, tnew
end

kwargs = (label=["trsyl!" "recursive"], ylabel="time (s)", xlabel="n", legend=:topleft, yscale=:log10, linewidth=2)
plot(ns, [first.(time_complex) last.(time_complex)]; kwargs...)

plot(ns, [first.(time_real) last.(time_real)]; kwargs...)

StefanKarpinski · 2021-04-13T15:50:57Z

@dkarrasch, you would be able to review this by any chance?

stdlib/LinearAlgebra/src/triangular.jl

stdlib/LinearAlgebra/test/triangular.jl

Co-authored-by: Mathieu Besançon <mathieu.besancon@gmail.com>

sethaxen · 2021-04-15T08:01:20Z

At @RalphAS's suggestion, I verified the correctness of the blocked square root and blocked Sylvester against unblocked BigFloat versions using GenericSchur.jl's Schur decomposition and upper triangular Sylvester solver:

using GenericSchur, LinearAlgebra, Test
n = 65

@testset for (T, Tbig) in ((ComplexF64, Complex{BigFloat}),)
    Abig = rand(Tbig, n, n)
    schurAbig = GenericSchur.gschur(Abig)
    sqrtAbig = schurAbig.Z * LinearAlgebra.sqrt_quasitriu(schurAbig.T, blockwidth=Inf) * schurAbig.Z'

    A = T.(Abig)
    schurA = schur(A)
    sqrtA = schurA.Z * LinearAlgebra.sqrt_quasitriu(schurA.T; blockwidth=16) * schurA.Z'

    @test sqrtA ≈ sqrtAbig
end

@testset for (T, Tbig) in ((ComplexF64, Complex{BigFloat}),)
    Abig = GenericSchur.gschur(rand(Tbig, n, n)).T
    Bbig = GenericSchur.gschur(rand(Tbig, n, n)).T
    Cbig = rand(Tbig, n, n)
    Xbig, scale = GenericSchur.trsylvester!(Abig, -Bbig, -copy(Cbig))
    rmul!(Xbig, inv(scale))

    A = T.(Abig)
    B = T.(Bbig)
    C = T.(Cbig)
    X = LinearAlgebra._sylvester_quasitriu!(A, B, copy(C); blockwidth=16)

    @test X ≈ Xbig
end

These tests pass.

sethaxen · 2021-05-03T19:33:59Z

@dkarrasch can you review this?

dkarrasch · 2021-05-03T19:37:50Z

@dkarrasch can you review this?

Sorry, I don't think I'm competent to review here.

stdlib/LinearAlgebra/src/triangular.jl

Co-authored-by: Mathieu Besançon <mathieu.besancon@gmail.com>

StefanKarpinski · 2021-06-15T16:26:02Z

@andreasnoack, would you by any change be able to review this? Or have a better idea who might?

andreasnoack

This looks good to me provided that all branches are exercised by the tests.

stdlib/LinearAlgebra/src/triangular.jl

sethaxen · 2021-06-16T20:54:12Z

I think one thing that still needs to be resolved here is how to handle #40239 (comment). The size threshold at which the blocked version is used in the paper (n=64) is quite a bit too low in this case. I was wondering if there was room for improvement so our cutoff would be similar to the paper's. If not, we should probably increase the threshold to something like n=256 or n=512.

oscardssmith · 2021-06-16T21:25:58Z

Imo, just change the cutoff for now. If we can lower the cutoff later, great, but they shouldn't block the already implimented improvements.

sethaxen · 2021-06-21T22:05:26Z

Alright, I chose a cutoff from the benchmark (256 for Real eltypes and 512 for Complex).

…iaLang#40239) Co-authored-by: Mathieu Besançon <mathieu.besancon@gmail.com>

sethaxen added 3 commits March 27, 2021 03:41

Add quasitriangular sylvester solver

606537e

Add blockwise recursive sqrt

33dc1fc

Test blockwise quasitriangular sqrt

c547c0f

dkarrasch added linear algebra Linear algebra performance Must go faster labels Mar 28, 2021

sethaxen closed this Apr 4, 2021

sethaxen reopened this Apr 4, 2021

sethaxen commented Apr 4, 2021

View reviewed changes

sethaxen added 3 commits April 5, 2021 01:28

Add test against BigFloat

51fd880

Merge branch 'master' into sqrtblock

25cd0bc

Merge branch 'master' into sqrtblock

8520ed4

matbesancon reviewed Apr 14, 2021

View reviewed changes

stdlib/LinearAlgebra/src/triangular.jl Outdated Show resolved Hide resolved

matbesancon reviewed Apr 14, 2021

View reviewed changes

stdlib/LinearAlgebra/test/triangular.jl Show resolved Hide resolved

sethaxen and others added 7 commits April 14, 2021 11:22

Update stdlib/LinearAlgebra/src/triangular.jl

710be33

Co-authored-by: Mathieu Besançon <mathieu.besancon@gmail.com>

Rename catcherr to raise

c263c52

Split cases into sub-methods

bc1b9e3

Add comment explaining raise

ccfcc8b

Add Float32 tests

4c3eb27

Explicitly test recursion not broken

22e3b2c

Test Float32

be7d274

matbesancon reviewed May 3, 2021

View reviewed changes

stdlib/LinearAlgebra/src/triangular.jl Outdated Show resolved Hide resolved

sethaxen and others added 2 commits May 3, 2021 16:58

Update stdlib/LinearAlgebra/src/triangular.jl

f73ffc3

Co-authored-by: Mathieu Besançon <mathieu.besancon@gmail.com>

Merge branch 'master' into sqrtblock

e7e6b1f

andreasnoack reviewed Jun 16, 2021

View reviewed changes

stdlib/LinearAlgebra/src/triangular.jl Show resolved Hide resolved

sethaxen added 3 commits June 21, 2021 23:17

Merge branch 'master' into sqrtblock

1c1516d

Choose thresholds from benchmark

4a59085

Set blockwidth in test

67e527e

oscardssmith added the merge me PR is reviewed. Merge when all tests are passing label Jun 22, 2021

dkarrasch merged commit 1810952 into JuliaLang:master Jul 1, 2021

oscardssmith removed the merge me PR is reviewed. Merge when all tests are passing label Jul 1, 2021

sethaxen deleted the sqrtblock branch July 1, 2021 19:34

johanmon pushed a commit to johanmon/julia that referenced this pull request Jul 5, 2021

Add the recursive blocked Schur algorithm for matrix square root (Jul…

a051bdb

…iaLang#40239) Co-authored-by: Mathieu Besançon <mathieu.besancon@gmail.com>

Uh oh!

Add the recursive blocked Schur algorithm for matrix square root #40239

Add the recursive blocked Schur algorithm for matrix square root #40239

Uh oh!

Conversation

sethaxen commented Mar 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark

Uh oh!

sethaxen commented Mar 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oscardssmith commented Apr 4, 2021

Uh oh!

sethaxen commented Apr 4, 2021

Uh oh!

sethaxen Apr 4, 2021

Choose a reason for hiding this comment

Uh oh!

matbesancon Apr 14, 2021

Choose a reason for hiding this comment

Uh oh!

sethaxen Apr 14, 2021

Choose a reason for hiding this comment

Uh oh!

andreasnoack Jun 16, 2021

Choose a reason for hiding this comment

Uh oh!

oscardssmith commented Apr 4, 2021

Uh oh!

sethaxen commented Apr 4, 2021

Uh oh!

sethaxen commented Apr 9, 2021

Uh oh!

StefanKarpinski commented Apr 13, 2021

Uh oh!

Uh oh!

Uh oh!

sethaxen commented Apr 15, 2021

Uh oh!

sethaxen commented May 3, 2021

Uh oh!

dkarrasch commented May 3, 2021

Uh oh!

Uh oh!

StefanKarpinski commented Jun 15, 2021

Uh oh!

andreasnoack left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sethaxen commented Jun 16, 2021

Uh oh!

oscardssmith commented Jun 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sethaxen commented Jun 21, 2021

Uh oh!

Uh oh!

sethaxen commented Mar 27, 2021 •

edited

Loading

sethaxen commented Mar 27, 2021 •

edited

Loading

oscardssmith commented Jun 16, 2021 •

edited

Loading