Open
Description
See https://discourse.julialang.org/t/why-is-a-multi-argument-inplace-map-much-faster-in-this-case-than-a-broadcast/91525/6, the following seems broadly reproducible across a range of platforms:
julia> A = rand(10, 1000); B = copy(A); C = zero(A);
julia> using BenchmarkTools
julia> @btime map!(+, $C, $A, $B);
8.947 μs (0 allocations: 0 bytes)
julia> @btime $C .= $A .+ $B;
11.281 μs (0 allocations: 0 bytes)
julia> versioninfo()
Julia Version 1.8.3
Commit 0434deb161e (2022-11-14 20:14 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 64 × AMD EPYC 7742 64-Core Processor
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, znver2)
Threads: 1 on 64 virtual cores
Environment:
LD_LIBRARY_PATH = /home/user/lib:/lib::/home/user/.local/lib
JULIA_DEPOT_PATH = /scratch/user/julia
JULIA_REVISE_POLL = 1
JULIA_NUM_PRECOMPILE_TASKS = 1
JULIA_EDITOR = vi
This difference goes away for nearly square matrices, and is minimal for tall matrices. On some platforms, broadcasting performs better for the tall and square cases. However, map!
seems to consistently do better for the wide case.