Description
The version used to run this example is:
Julia Version 1.11.3
Commit d63aded (2025-01-21 19:42 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: macOS (arm64-apple-darwin24.0.0)
CPU: 8 × Apple M1 Pro
WORD_SIZE: 64
LLVM: libLLVM-16.0.6 (ORCJIT, apple-m1)
Threads: 1 default, 0 interactive, 1 GC (on 6 virtual cores)
The issue is the following: consider the function
using LinearAlgebra
using BenchmarkTools
M = [1.0 2.0;
3.0 4.0]
n = size(M,1)
I_n = Matrix{Float64}(I, n, n)
A_temp = zeros(n, n)
diff_temp = zeros(n, n)
function f_standard(args, M, I_n, A_temp, diff)
# Collect the scalar inputs into a vector
for i in eachindex(A_temp)
A_temp[i] = args[i]
end
# Compute the difference A*M - I
mul!(diff, A_temp, M)
@. diff -= I_n
return mapreduce(x -> x^2, +, diff)
end
As expected, no allocations.
@btime f_standard($(rand(4)), $(M), $(I_n), $(A_temp), $(diff_temp))
21.314 ns (0 allocations: 0 bytes)
However, consider the following modification of passing some of the arguments as kwargs
and changing the first argument into Varargs
:
function f_varargs_kwargs(args...; M, I_n, A_temp, diff)
# Collect the scalar inputs into a vector
for i in eachindex(A_temp)
A_temp[i] = args[i]
end
# Compute the difference A*M - I
mul!(diff, A_temp, M)
@. diff -= I_n
return mapreduce(x -> x^2, +, diff)
end
It does allocate
julia> @btime f_varargs_kwargs($(rand()), $(rand()), $(rand()), $(rand()); M = $(M), I_n = $(I_n), A_temp = $(A_temp), diff = $(diff_temp))
36.374 ns (5 allocations: 80 bytes)
However, the compiler seems to be specializing over all arguments:
m = @which f_varargs_kwargs(rand(), rand(), rand(), rand(); M = M, I_n = I_n, A_temp = A_temp, diff = diff_temp)
m.specializations
gives:
svec(MethodInstance for Core.kwcall(::@NamedTuple{M::Matrix{Float64}, I_n::Matrix{Float64}, A_temp::Matrix{Float64}, diff::Matrix{Float64}}, ::typeof(f_varargs_kwargs), ::Float64, ::Vararg{Float64}), MethodInstance for Core.kwcall(::@NamedTuple{M::Matrix{Float64}, I_n::Matrix{Float64}, A_temp::Matrix{Float64}, diff::Matrix{Float64}}, ::typeof(f_varargs_kwargs), ::Float64, ::Float64, ::Float64, ::Float64), nothing, nothing, nothing, nothing, nothing)
Also, weirdly, the problem seems to come from the line
@. diff -= I_n
Since commenting out this line
function f_varargs_kwargs_2(args...; M, I_n, A_temp, diff)
# Collect the scalar inputs into a vector
for i in eachindex(A_temp)
A_temp[i] = args[i]
end
# Compute the difference A*M - I
mul!(diff, A_temp, M)
#@. diff -= I_n
return mapreduce(x -> x^2, +, diff)
end
@btime f_varargs_kwargs_2($(rand()), $(rand()), $(rand()), $(rand()); M = $(M), I_n = $(I_n), A_temp = $(A_temp), diff = $(diff_temp))
gives no allocations:
12.929 ns (0 allocations: 0 bytes)
I posted this is the Julia discourse and people were also confused so I decided to post an issue about fusing failing here. @code_warntype
and @code_lowered
also assume full specialization in the allocating case.
Thanks in advance.