Open
Description
I was surprised by this slowdown when writing back into x
, instead of into another array y
:
julia> f23(x) = ifelse(x>0, x^2, x^3);
julia> x = randn(Float32, 100, 100); y = similar(x);
julia> @btime $y .= f23.($x);
1.958 μs (0 allocations: 0 bytes)
julia> @btime $x .= f23.($x);
6.567 μs (0 allocations: 0 bytes)
julia> @btime f23.($x); # allocating; mean time 3.010 μs still faster than x .= case
2.236 μs (2 allocations: 39.11 KiB)
This is 1.7.0-rc2, but similar on 1.5 and master, and on other computers. I don't think it's a benchmarking artefact, it persists with evals=1 setup=(x=...; y=...)
. I don't think it's a hardware limit, since @turbo $x .= f23.($x)
with LoopVectorization.jl, or @.. $x = f23($x)
with FastBroadcast.jl, don't show this difference.
For comparison, it seems that map!
is never fast here, although map
is:
julia> @btime map!(f23, $y, $x);
8.055 μs (0 allocations: 0 bytes)
julia> @btime map!(f23, $x, $x);
8.055 μs (0 allocations: 0 bytes)
julia> @btime map(f23, $x);
1.917 μs (2 allocations: 39.11 KiB)