Skip to content

Broadcasting is much slower than a for loop #28126

@YingboMa

Description

@YingboMa

Here is a minimal working example.

julia> using BenchmarkTools

julia> function foo(a::Vector{T}, b::Vector{T}, c::Vector{T}, d::Vector{T}, e::Vector{T}) where T
           @. a = b + 0.1 * (0.2c + 0.3d + 0.4e)
           nothing
       end
foo (generic function with 1 method)

julia> function goo(a::Vector{T}, b::Vector{T}, c::Vector{T}, d::Vector{T}, e::Vector{T}) where T
           @assert length(a) == length(b) == length(c) == length(d) == length(e)
           @inbounds for i in eachindex(a)
               a[i] = b[i] + 0.1 * (0.2c[i] + 0.3d[i] + 0.4e[i])
           end
           nothing
       end
goo (generic function with 1 method)

julia> a,b,c,d,e=(rand(1000) for i in 1:5)
Base.Generator{UnitRange{Int64},getfield(Main, Symbol("##9#10"))}(getfield(Main, Symbol("##9#10"))(), 1:5)

julia> @btime foo($a,$b,$c,$d,$e)
  1.277 μs (0 allocations: 0 bytes)

julia> @btime goo($a,$b,$c,$d,$e)
  345.568 ns (0 allocations: 0 bytes)

julia> versioninfo()
Julia Version 0.7.0-beta2.12
Commit a878341 (2018-07-15 15:57 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, skylake)
Environment:
  JULIA_PKG3_PRECOMPILE = 1

Metadata

Metadata

Assignees

No one assigned

    Labels

    broadcastApplying a function over a collectioncompiler:simdinstruction-level vectorizationperformanceMust go fasterregressionRegression in behavior compared to a previous version

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions