Description
It is possible to set the number of threads used by OpenBLAS via openblas_set_num_threads
. For the "custom thread" solution this works quite well: Independent of what the application may do the set number of threads is used inside OpenBLAS.
However for OpenMP this is not the case: An application might want to have OpenBLAS use 4 of 16 threads while using OpenMP itself to schedule other work or run 4 OpenBLAS operations in parallel each using 4 threads (up to the runtime if that is even possible, but the first use case should be). Another use case would be that OpenBLAS should use only 4 threads (e.g. due to performance reasons, usual matrix size, ...) but the application wants to use OpenMP (at other times, so not in parallel to OpenBLAS) with all 16 threads.
Now OpenBLAS does something nasty: It uses the max number of openmp threads and sets the max number of used threads to that value. So it is impossible to use less than the number of OpenMP threads.
In code the problem is 2-fold:
- An application might set the number of threads for OpenBLAS to use but that changes the number of OpenMP threads: https://github.com/xianyi/OpenBLAS/blob/ff16329cb780396521e14de7e2ebd673fbab674a/driver/others/blas_server_omp.c#L93 This is questionable. If at all the number of OpenMP threads should only be increased, but not decreased. And even that: It might be declared an error to use more threads by OpenBLAS than are available in OpenMP or that the lower of those 2 values will be used.
- When an application sets the OpenBLAS threads and then increases the OpenMP threads, OpenBLAS will react on that, while it should not. The function
num_cpu_avail
which should only query does a modification: https://github.com/xianyi/OpenBLAS/blob/develop/common_thread.h#L154-L156
So for a first fix I'd suggest to make num_cpu_avail
return the lesser of blas_cpu_number
and openmp_nthreads
instead of setting anything.