-
Notifications
You must be signed in to change notification settings - Fork 52
Open
Description
Found this out this week. Baseline on current master:
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 32.997 μs (0.00% GC)
median time: 34.839 μs (0.00% GC)
mean time: 36.525 μs (0.00% GC)
maximum time: 95.380 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 10
(a bit slower than http://www.juliarobotics.org/RigidBodyDynamics.jl/latest/benchmarks.html for some reason, by the way)
After using LinearAlgebra; BLAS.set_num_threads(1)
:
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 21.929 μs (0.00% GC)
median time: 22.224 μs (0.00% GC)
mean time: 23.062 μs (0.00% GC)
maximum time: 40.309 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 10
So multiple BLAS threads aren't doing us any favor (with the default OpenBLAS anyway), at least for fewer-than-Atlas degrees of freedom.
Not sure how to act on this since this is a global setting. At least BLAS.set_num_threads
is fast now (about 4 ns, no allocations), so it could be called before and after dynamics_solve!
. But is that thread safe? And how do we get the number of BLAS threads before calling BLAS.set_num_threads(1)
so that it can be reset to the original value? Maybe it should just be a performance tip?
Metadata
Metadata
Assignees
Labels
No labels