Replies: 10 comments 15 replies
-
Yes, you are working in the global scope. See the first point of https://docs.julialang.org/en/v1/manual/performance-tips/ try this: function f(n::Int)
m = Array{Float64}(undef,n,n)
for i in 1:n
for j in 1:n
@inbounds m[i,j] = log(1+abs(sin(i)*cos(j)))
end
end
return m
end
n=16000
@time m=f(n)
println(m[div(n,2),div(n,2)]) on my laptop:
NOTE: The first time you run it it will take longer to run because it compiles the function This variation function f1(n::Int)
m = Array{Float64}(undef, n, n)
sines = sin.(1:n)
cosines = cos.(1:n)
@inbounds for j in 1:n
for i in 1:n
m[i, j] = log(1.0 + abs(sines[i] * cosines[j]))
end
end
return m
end brings it to
This is actually faster mainly because of the reduced calls to |
Beta Was this translation helpful? Give feedback.
-
This vectorized implementation is actually also ok, although it uses more memory function f(n::Int)
return log.(1.0 .+ abs.(sin.(1:n) * cos.((1:n)')))
end
n=16000
@time m=f(n)
println(m[div(n,2),div(n,2)])
|
Beta Was this translation helpful? Give feedback.
-
I asked ChatGPT why the two have such different speed: There are a few reasons why the first function you provided (which uses broadcasting in Julia) is faster than the second (which uses nested loops): Vectorized Computations: In the first function, you are making use of Julia's broadcasting capabilities (. operator). Julia's broadcasting mechanism is well-optimized and allows for operations on arrays without the explicit use of loops. Broadcasting effectively transforms the operations into highly efficient looped code behind the scenes. It leverages the memory layout and cache efficiently. Memory Allocation: The first function directly computes the result without any preallocation, whereas in the second function, you first allocate an array of size n x n and then fill it in using a nested loop. This preallocation and then assignment can be slower than directly computing and placing values, especially when n is large. Inherent Parallelism: Broadcasting in Julia is designed to benefit from inherent parallelism. Under the hood, the compiler may opt to parallelize certain broadcasting operations if it detects that it can provide a speed-up, which can result in faster computations. Function Calls Overhead: In the loop-based version, you're making multiple function calls (sin, cos, abs, and log) in each iteration of the nested loops. Even though these function calls are fast, the overhead adds up when you have n^2 iterations. In the broadcasting version, the operations are more "batched" which can reduce this overhead. That being said, it's worth noting that modern CPUs are pretty good at executing tight loops, so the difference may not always be substantial, especially for small values of n. But as n grows, the benefits of the first function's approach should become more pronounced. |
Beta Was this translation helpful? Give feedback.
-
In your example the import numpy as np
import time
def f(n):
m = np.zeros([n,n])
i = np.arange(n)+1
j = np.arange(n)+1
for i in np.arange(n)+1:
for j in np.arange(n)+1:
m[i-1,j-1] = np.log(1+np.abs(np.sin(i)*np.cos(j)))
return m
n=16000
start = time.time()
m = f(n)
print(m[n//2-1,n//2-1])
end = time.time()
print(end - start) |
Beta Was this translation helpful? Give feedback.
-
Thank you. Timings improve with Julia inside a function (not global scope). Of course defining the vectors (as I did for numpy) is cheating for the purposes of this test and will also speed up fortran and C++.
|
Beta Was this translation helpful? Give feedback.
-
I was also getting this function to be very slow. After some research, I was directed to the LoopVectorization.@avx:
Anyway, I changed the code to:
And now get:
|
Beta Was this translation helpful? Give feedback.
-
This seems like weird behavior to me:
Why is it so much slower when using variables? |
Beta Was this translation helpful? Give feedback.
-
The following function gives way faster performances(on apple hardware, for
Intel looks for MKL.jl or IntelVectorMath.jl)
function test_vec_(v_n::Vector{Float64}, m::Array{Float64,2})
vsin = similar(v_n)
vcos = similar(v_n)
AppleAccelerate.sin!(vsin, v_n)
AppleAccelerate.cos!(vcos, v_n)
m .= 1.0 .+ abs.(vsin .* vcos')
AppleAccelerate.log!(m, m)
nothing
end
The math functions are not vectorized efficiently in Julia by default.
|
Beta Was this translation helpful? Give feedback.
-
Note that we can call the fortran from Julia: FORTRAN:
SHARED OBJECT:
JULIA:
|
Beta Was this translation helpful? Give feedback.
-
Possible vectorization with avx instructions:
using BenchmarkTools
using LoopVectorization
n = 16000
mr = rand(n, n)
mi = ones(n)
function test_sin(n, m::Matrix{Float64})
for j in axes(m, 2)
for i in axes(m, 1)
m[i, j] = log(1.0 + abs(sin(i) * cos(j)))
end
end
end
function test_sin_avx(n, m::Matrix{Float64})
@avx for j in axes(m, 2)
for i in axes(m, 1)
m[i, j] = log(1.0 + abs(sin(i) * cos(j)))
end
end
end
function test_sin_simd(n, m::Matrix{Float64})
@simd for j in axes(m, 2)
for i in axes(m, 1)
m[i, j] = log(1.0 + abs(sin(i) * cos(j)))
end
end
end
#AppleAccelerate.sin!(mr, mr)
@Btime test_sin(n, mr)
@Btime test_sin_avx(n, mr)
@Btime test_sin_simd(n, mr)
…-------------------------------------
julia> @Btime test_sin(n, mr)
4.799 s (0 allocations: 0 bytes)
julia> @Btime test_sin_avx(n, mr)
1.248 s (0 allocations: 0 bytes)
julia> @Btime test_sin_simd(n, mr)
4.856 s (0 allocations: 0 bytes)
On Thu, Aug 17, 2023 at 4:22 PM Jeff Candy ***@***.***> wrote:
Note that we can call the fortran from Julia:
FORTRAN:
module ffunc
contains
double precision function f(n)
double precision, dimension(:,:), allocatable :: m
integer :: n
allocate(m(n,n))
do i=1,n
do j=1,n
m(i,j) = log(1+abs(sin(i*1d0)*cos(j*1d0)))
enddo
enddo
f = m(n/2,n/2)
deallocate(m)
end function f
end module ffunc
SHARED OBJECT:
$ gfortran -o ffunc.so ffunc.f90 -shared -fPIC
JULIA:
m0 = Int32[16000]
result = @CCall "./ffunc.so".__ffunc_MOD_f(m0::Ptr{Int32})::Float64
println(result)
—
Reply to this email directly, view it on GitHub
<#396 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEESMZEDIMBFG5OMYCPOAS3XV2RTZANCNFSM6AAAAAA3TRC3NE>
.
You are receiving this because you commented.Message ID:
***@***.***
com>
|
Beta Was this translation helpful? Give feedback.
-
These results are unexpected. Am I doing something wrong in Julia?
Results are in in order of fastest to slowest (fortran and C++ compiled with -O3):
Beta Was this translation helpful? Give feedback.
All reactions