Open
Description
On the latest master (as of 3e7cf64), autovectorization fails when using AllocArrays.jl v0.1.1 (latest) for an array copy:
using AllocArrays
function mycopy(dest, src, iter)
# precondition: dest and src do not alias
# precondition: the iterators of dest and src are equal
for i in iter
@inbounds dest[i] = src[i]
end
return dest
end
b = BumperAllocator(2^30);
arr = rand(Float32, 50000000);
arr2 = similar(arr);
a = AllocArray(arr);
# this is vectorized
mycopy(arr2, arr, eachindex(arr2));
# this is not
with_allocator(b) do
c = similar(a)
mycopy(c, a, eachindex(c))
end;
We can observe the effect on performance:
julia> @benchmark mycopy(arr2, arr, eachindex(arr2))
BenchmarkTools.Trial: 339 samples with 1 evaluation per sample.
Range (min … max): 14.181 ms … 16.030 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 14.609 ms ┊ GC (median): 0.00%
Time (mean ± σ): 14.704 ms ± 349.402 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▁ ▁▁ ▂▄▅▄█▄▂▄▁▁ ▁ ▁
▄▄█▆▆██▇██████████▇▄█▇▆▄▆█▄▅▆▃▆▃▅▄▄▄▃▃▃▄▁▃▁▁▃▄▁▃▁▁▁▁▁▃▁▁▁▁▁▃ ▄
14.2 ms Histogram: frequency by time 16 ms <
Memory estimate: 16 bytes, allocs estimate: 1.
julia> @benchmark with_allocator(() -> mycopy(c, a, eachindex(c)), b) setup=(reset!(b);c=similar(a);)
BenchmarkTools.Trial: 213 samples with 1 evaluation per sample.
Range (min … max): 20.352 ms … 55.386 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 22.032 ms ┊ GC (median): 0.00%
Time (mean ± σ): 22.426 ms ± 2.889 ms ┊ GC (mean ± σ): 0.00% ± 0.00%
▁▆█▂▂▆▅▇▃
▄▅█████████▆▄▆▄▃▅▁▁▃▁▃▁▁▁▁▁▁▃▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▃ ▃
20.4 ms Histogram: frequency by time 33.6 ms <
Memory estimate: 304 bytes, allocs estimate: 12.
Here is the LLVM code for the base case: https://pastebin.com/gCbp1uEX
and for the AllocArrays case: https://pastebin.com/bKYmswnd