Skip to content

broadcast!(f, x, x) slower, prevents SIMD? #43153

Open
@mcabbott

Description

@mcabbott

I was surprised by this slowdown when writing back into x, instead of into another array y:

julia> f23(x) = ifelse(x>0, x^2, x^3);

julia> x = randn(Float32, 100, 100); y = similar(x);

julia> @btime $y .= f23.($x);
  1.958 μs (0 allocations: 0 bytes)

julia> @btime $x .= f23.($x);
  6.567 μs (0 allocations: 0 bytes)

julia> @btime f23.($x);  # allocating; mean time 3.010 μs still faster than x .= case
  2.236 μs (2 allocations: 39.11 KiB)

This is 1.7.0-rc2, but similar on 1.5 and master, and on other computers. I don't think it's a benchmarking artefact, it persists with evals=1 setup=(x=...; y=...). I don't think it's a hardware limit, since @turbo $x .= f23.($x) with LoopVectorization.jl, or @.. $x = f23($x) with FastBroadcast.jl, don't show this difference.

For comparison, it seems that map! is never fast here, although map is:

julia> @btime map!(f23, $y, $x);
  8.055 μs (0 allocations: 0 bytes)

julia> @btime map!(f23, $x, $x);
  8.055 μs (0 allocations: 0 bytes)

julia> @btime map(f23, $x);
  1.917 μs (2 allocations: 39.11 KiB)

Metadata

Metadata

Assignees

No one assigned

    Labels

    broadcastApplying a function over a collectioncompiler:simdinstruction-level vectorizationperformanceMust go faster

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions