Description
Similar to #2689, I'm having an issue where computing the principal_component
quality metric hangs at 0% on Linux when run as part of a script. Similar to that issue, it seems to require multiple parallel computations to occur; if I quit the hung process and re-run it, the PCA gets computed no problem and everything runs smoothly from there.
Unlike that issue, appending MKL_THREADING_LAYER=TBB
in front of the call to my script didn't help (at least, not when passed through SLURM).
Attached is my conda env export — you can see that BLAS / MKL is there, but when I followed chatGPT's advice to check if this was being used in numpy / scipy, nothing came up:
import numpy as np
import scipy
print("NumPy configuration:")
np.__config__.show()
print("\nSciPy configuration:")
scipy.__config__.show()
output was:
blas:
detection method: pkgconfig
found: true
include directory: /usr/local/include
lib directory: /usr/local/lib
name: openblas64
openblas configuration: USE_64BITINT=1 DYNAMIC_ARCH=1 DYNAMIC_OLDER= NO_CBLAS=
NO_LAPACK= NO_LAPACKE= NO_AFFINITY=1 USE_OPENMP= HASWELL MAX_THREADS=2
pc file directory: /usr/local/lib/pkgconfig
version: 0.3.23.dev
I'm going to start debugging by cloning my conda env, and trying to force the clone to not use mkl with conda install nomkl numpy scipy scikit-learn numexpr
(again, ht chatGPT). If that doesn't work, I guess it could copy #2689 and try switching to joblib
in certain parts of the code...other suggestions and ideas welcome :) thanks!