Skip to content

Hanging while computing principal_components #3331

Open
@jonahpearl

Description

@jonahpearl

Similar to #2689, I'm having an issue where computing the principal_component quality metric hangs at 0% on Linux when run as part of a script. Similar to that issue, it seems to require multiple parallel computations to occur; if I quit the hung process and re-run it, the PCA gets computed no problem and everything runs smoothly from there.

Unlike that issue, appending MKL_THREADING_LAYER=TBB in front of the call to my script didn't help (at least, not when passed through SLURM).

Attached is my conda env export — you can see that BLAS / MKL is there, but when I followed chatGPT's advice to check if this was being used in numpy / scipy, nothing came up:

import numpy as np
import scipy
print("NumPy configuration:")
np.__config__.show()
print("\nSciPy configuration:")
scipy.__config__.show()

output was:

  blas:
    detection method: pkgconfig
    found: true
    include directory: /usr/local/include
    lib directory: /usr/local/lib
    name: openblas64
    openblas configuration: USE_64BITINT=1 DYNAMIC_ARCH=1 DYNAMIC_OLDER= NO_CBLAS=
      NO_LAPACK= NO_LAPACKE= NO_AFFINITY=1 USE_OPENMP= HASWELL MAX_THREADS=2
    pc file directory: /usr/local/lib/pkgconfig
    version: 0.3.23.dev

I'm going to start debugging by cloning my conda env, and trying to force the clone to not use mkl with conda install nomkl numpy scipy scikit-learn numexpr (again, ht chatGPT). If that doesn't work, I guess it could copy #2689 and try switching to joblib in certain parts of the code...other suggestions and ideas welcome :) thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    concurrencyRelated to parallel processing

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions