Open
Description
This example from Discourse shows a slowdown when broadcasting a moderately complicated expression, instead of broadcasting a function containing the same expression:
arrayfun!(C, A, B) = @. C = A^2 + B^2 + A * B + A / B - A * B - A / B + A * B + A / B - A * B - A / B
scalarfun(A::Real, B::Real) = A^2 + B^2 + A * B + A / B - A * B - A / B + A * B + A / B - A * B - A / B
let N = 151
A, B, C1, C2 = (rand(N,N,N).+1 for _ in 1:4)
@btime arrayfun!($C1, $A, $B)
@btime $C2 .= scalarfun.($A, $B)
C1 ≈ C2
end
# 17.306 ms (11 allocations: 352 bytes)
# 5.900 ms (0 allocations: 0 bytes)
The effect seems fairly robust, it's not particular to 3D arrays, nor to A^2
.
Replacing @.
with .+
etc. helps a bit (which according to #29120 removes n-ary +
, here n<=4):
arrayfun!(C, A, B) = C .= A.^2 .+ B.^2 .+ A .* B .+ A ./ B .- A .* B .- A ./ B .+ A .* B .+ A ./ B .- A .* B .- A ./ B
# 17.345 ms (0 allocations: 0 bytes)
Simpler expressions also have the slowdown but no allocation:
arrayfun!(C, A, B) = @. C = A^2 + B^2 + A * B + A / B
scalarfun(A::Real, B::Real) = A^2 + B^2 + A * B + A / B
# 3.148 ms (0 allocations: 0 bytes)
# 971.000 μs (0 allocations: 0 bytes)
Even simpler expressions like arrayfun!(C, A, B) = @. C = A^2 + B^2
show no slowdown at all.