Closed
Description
Background information
What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)
Open MPI repo revision: v3.1.0rc3
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
Configure command line:'--with-knem=/cluster/software/hpcx/2.0/knem' '--with-mxm=/cluster/software/hpcx/2.0/mxm' '--with-hcoll=/cluster/software/hpcx/2.0/hcoll' '--with-ucx=/cluster/software/hpcx/2.0/ucx' '--with-platform=contrib/platform/mellanox/optimized' '--with-pmix=/usr' '--with-hwloc=/usr' '--with-libevent=/usr'
C compiler: gcc
C compiler version: 7.2.0
pmix-1.2.3
hwloc-libs-1.11.2
libevent-2.0.21
Please describe the system on which you are running
-
Operating system/version: CentOS Linux release 7.4.1708 (Core)
-
Computer hardware: Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz
-
Network type: Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]
Details of the problem
mpirun
seems to have problems starting workers. I run mpirun ls
from within a SLURM allocation:
shell$ mpirun ls
[c11-1:139811] PMIX ERROR: BAD-PARAM in file src/dstore/pmix_esh.c at line 1185
[c11-1:139811] [[5046,0],0] ORTE_ERROR_LOG: Not found in file base/odls_base_default_fns.c at line 172
[c11-1:139811] [[5046,0],0] ORTE_ERROR_LOG: Not found in file base/plm_base_launch_support.c at line 550
--------------------------------------------------------------------------
An internal error has occurred in ORTE:
[[5046,0],0] FORCE-TERMINATE AT (null):1 - error base/plm_base_launch_support.c(551)
This is something that should be reported to the developers.
--------------------------------------------------------------------------
The above works for v3.0.1. Note that I'm compiling against a locally installed PMIx 1.2.3. Is this the problem?