Skip to content

PML UCX error #11669

Open
Open
@jonathandenman

Description

@jonathandenman

Hello,

Currently I am trying to run a simulation using OpenFoam-5.x in parallel on skylake, haswell, and EPYC128 nodes. I am currently using openmpi/4.1.4 module on the cluster at my school that uses SLURM architecture. The issue is not with my openfoam program, but with the openmpi module that I am using. I want to optimize the speed of my CFD runs so I would like to use UCX. This is the first go to setting that this version of Openmpi/4.1.4 is using. Because with the baseline "mpirun -np 8 pimpleFoam -parallel (executable)" I get the following error message.

No components were able to be opened in the pml framework.

This typically means that either no components of this type were
installed, or none of the installed components can be loaded.
Sometimes this means that shared libraries required by these
components are unable to be found/loaded.

  Host:      cn536
  Framework: pml
--------------------------------------------------------------------------
[cn536:1868530] PMIX ERROR: UNREACHABLE in file ../../../../../../../opal/mca/pmix/pmix3x/pmix/src/server/pmix_server.c at line 2198

when adding the additional parameter to my mpirun command "--mca pml ^ucx" full command is "mpirun -np 8 --mca pml ^ucx pimpleFoam -parallel " this error disappears, but the simulation doesn't run as fast as it could as it is using something besides UCX in the MPI process. I am not an expert with MPI or sure if this is the right place to ask. I am just wondering how I can get openmpi/4.1.4 running with the optimal settings for large highly parallelized programs.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions