Closed
Description
See https://discourse.julialang.org/t/mul-matrix-matrix-uppertriangular-is-really-slow/69368. CC @Luapulu
mul!
doesn’t use BLAS when the second matrix is an AbstractTriangular
:
julia> A = rand(1000, 1000); B = rand(1000, 1000); dest = similar(A);
julia> @btime mul!($dest, $A, $B); # BLAS (fast)
50.609 ms (0 allocations: 0 bytes)
julia> @btime mul!($dest, UpperTriangular($A), $B); # BLAS (fast)
32.929 ms (0 allocations: 0 bytes)
julia> @btime mul!($dest, $A, UpperTriangular($B)); # generic (slow)
914.817 ms (6 allocations: 336 bytes)
However, the *
methods all use BLAS:
julia> @btime $A * $B; # BLAS (fast)
50.560 ms (2 allocations: 7.63 MiB)
julia> @btime UpperTriangular($A) * $B; # BLAS (fast)
33.183 ms (2 allocations: 7.63 MiB)
julia> @btime $A * UpperTriangular($B); # BLAS (fast)
27.190 ms (2 allocations: 7.63 MiB)