Skip to content

SROA for array allocation #48808

Open
Open
@oxinabox

Description

@oxinabox

Consider the situation where you have a function that does two things, and returns both outputs, but then in the outer function you only use one of them.
We would like the optimizer to remove the work that is being done for the part you don't use.
And it can do that on nightly, but not it is a subpart of a function that is being called (not even an inlined one). Only if the function being eleminated is directly being called.

Consider the following code:

using BenchmarkTools
# Effect inference can't seem to workout that this doesn't throw
# It does work out that it is effect_free, and terminates_globally
Base.@assume_effects :nothrow foo(x::Int) = [x]
bar(x) = foo(x), 2x

function bar_last_manual(x)
    _ = foo(x)
    return 2x
end
@ballocated bar_last_manual(1)
@code_warntype optimize=true bar_last_manual(1)


function bar_last_inlined(x)
    _, ret = @inline bar(x)
    return ret
end
@ballocated bar_last_inlined(1)
@code_warntype optimize=true bar_last_inlined(1)

we can see that bar_last_manual acts as expected, it eliminated the allocation leaving just the multiplication

julia> @ballocated bar_last_manual(1)
0

julia> @code_warntype optimize=true bar_last_manual(1)
MethodInstance for bar_last_manual(::Int64)
  from bar_last_manual(x) @ Main REPL[18]:1
Arguments
  #self#::Core.Const(bar_last_manual)
  x::Int64
Body::Int64
1%1 = Base.mul_int(2, x)::Int64
└──      return %1

On the other hand if we do it via an inlined called to foo, it does not:

julia> @ballocated bar_last_inlined(1)
64

julia> @code_warntype optimize=true bar_last_inlined(1)
MethodInstance for bar_last_inlined(::Int64)
  from bar_last_inlined(x) @ Main REPL[21]:1
Arguments
  #self#::Core.Const(bar_last_inlined)
  x::Int64
Locals
  val::Tuple{Vector{Int64}, Int64}
  @_4::Int64
  ret::Int64
Body::Int64
1%1 = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Vector{Int64}, svec(Any, Int64), 0, :(:ccall), Vector{Int64}, 1, 1))::Vector{Int64}
└──      goto #3 if not true
2 ─      Base.arrayset(false, %1, x, 1)
3 ┄      goto #4
4 ─      goto #5
5%6 = Base.mul_int(2, x)::Int64
└──      goto #6
6return %6

This is sad.

Solving this would make batched forward-mode in Diffractor much easier.
Because it would mean we can effectively just run the pushforward part of the frules if we don't use the primal output.

Metadata

Metadata

Assignees

No one assigned

    Labels

    compiler:optimizerOptimization passes (mostly in base/compiler/ssair/)featureIndicates new feature / enhancement requests

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions