Skip to content

Inferior performance of inplace broadcasting over map! for multiple wide matrices #47873

Open
@jishnub

Description

@jishnub

See https://discourse.julialang.org/t/why-is-a-multi-argument-inplace-map-much-faster-in-this-case-than-a-broadcast/91525/6, the following seems broadly reproducible across a range of platforms:

julia> A = rand(10, 1000); B = copy(A); C = zero(A);

julia> using BenchmarkTools

julia> @btime map!(+, $C, $A, $B);
  8.947 μs (0 allocations: 0 bytes)

julia> @btime $C .= $A .+ $B;
  11.281 μs (0 allocations: 0 bytes)

julia> versioninfo()
Julia Version 1.8.3
Commit 0434deb161e (2022-11-14 20:14 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 64 × AMD EPYC 7742 64-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, znver2)
  Threads: 1 on 64 virtual cores
Environment:
  LD_LIBRARY_PATH = /home/user/lib:/lib::/home/user/.local/lib
  JULIA_DEPOT_PATH = /scratch/user/julia
  JULIA_REVISE_POLL = 1
  JULIA_NUM_PRECOMPILE_TASKS = 1
  JULIA_EDITOR = vi

This difference goes away for nearly square matrices, and is minimal for tall matrices. On some platforms, broadcasting performs better for the tall and square cases. However, map! seems to consistently do better for the wide case.

Metadata

Metadata

Assignees

No one assigned

    Labels

    arrays[a, r, r, a, y, s]broadcastApplying a function over a collectionperformanceMust go faster

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions