Skip to content

BLAS.set_num_threads(1) drastically improves dynamics! performance for Atlas benchmark #500

@tkoolen

Description

@tkoolen

Found this out this week. Baseline on current master:

BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     32.997 μs (0.00% GC)
  median time:      34.839 μs (0.00% GC)
  mean time:        36.525 μs (0.00% GC)
  maximum time:     95.380 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     10

(a bit slower than http://www.juliarobotics.org/RigidBodyDynamics.jl/latest/benchmarks.html for some reason, by the way)

After using LinearAlgebra; BLAS.set_num_threads(1):

BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     21.929 μs (0.00% GC)
  median time:      22.224 μs (0.00% GC)
  mean time:        23.062 μs (0.00% GC)
  maximum time:     40.309 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     10

So multiple BLAS threads aren't doing us any favor (with the default OpenBLAS anyway), at least for fewer-than-Atlas degrees of freedom.

Not sure how to act on this since this is a global setting. At least BLAS.set_num_threads is fast now (about 4 ns, no allocations), so it could be called before and after dynamics_solve!. But is that thread safe? And how do we get the number of BLAS threads before calling BLAS.set_num_threads(1) so that it can be reset to the original value? Maybe it should just be a performance tip?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions