Closed
Description
On the current master branch, mul!
of small matrices can be 10x slower than 1.3 (and also allocates)
julia> versioninfo()
Julia Version 1.4.0-DEV.556
Commit 4800158ef5* (2019-12-03 21:18 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin19.0.0)
CPU: Intel(R) Core(TM) i5-8259U CPU @ 2.30GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-8.0.1 (ORCJIT, skylake)
test code:
using LinearAlgebra
using BenchmarkTools
ndim = 3
m1 = rand(ComplexF64,ndim,ndim);
m2 = rand(ComplexF64,ndim,ndim);
ou = rand(ComplexF64,ndim,ndim);
@btime mul!($ou, $m1, $m2);
With the release-1.3 version,
33.471 ns (0 allocations: 0 bytes)
on master:
394.428 ns (1 allocation: 16 bytes)
While it's nano-seconds, when one has mul!
in the inner-most/hot loop, this can easily translate to big performance degradation when most of the calculation is with such matrix productions. Larger matrices (e.g. ndim = 30
) appears unaffected (dispatched to BLAS?) 2.732 μs (0 allocations: 0 bytes)
on 1.3 and 2.774 μs (0 allocations: 0 bytes)
on master.