Inferior performance of inplace broadcasting over `map!` for multiple wide matrices

See https://discourse.julialang.org/t/why-is-a-multi-argument-inplace-map-much-faster-in-this-case-than-a-broadcast/91525/6, the following seems broadly reproducible across a range of platforms:
```julia
julia> A = rand(10, 1000); B = copy(A); C = zero(A);

julia> using BenchmarkTools

julia> @btime map!(+, $C, $A, $B);
  8.947 μs (0 allocations: 0 bytes)

julia> @btime $C .= $A .+ $B;
  11.281 μs (0 allocations: 0 bytes)

julia> versioninfo()
Julia Version 1.8.3
Commit 0434deb161e (2022-11-14 20:14 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 64 × AMD EPYC 7742 64-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, znver2)
  Threads: 1 on 64 virtual cores
Environment:
  LD_LIBRARY_PATH = /home/user/lib:/lib::/home/user/.local/lib
  JULIA_DEPOT_PATH = /scratch/user/julia
  JULIA_REVISE_POLL = 1
  JULIA_NUM_PRECOMPILE_TASKS = 1
  JULIA_EDITOR = vi
```

This difference goes away for nearly square matrices, and is minimal for tall matrices. On some platforms, broadcasting performs better for the tall and square cases. However, `map!` seems to consistently do better for the wide case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Inferior performance of inplace broadcasting over `map!` for multiple wide matrices #47873

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Inferior performance of inplace broadcasting over map! for multiple wide matrices #47873

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Inferior performance of inplace broadcasting over `map!` for multiple wide matrices #47873