-
Notifications
You must be signed in to change notification settings - Fork 929
Closed
Description
Background information
ompi/group: mpi group operations fails in multithreaded apps
What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
OMPI master
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
git clone
If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.
$ git submodule status
fefaed5 3rd-party/openpmix (v1.1.3-2832-gfefaed5)
477894f4720d822b15cab56eee7665107832921c 3rd-party/prrte (dev-30928-g477894f)
Please describe the system on which you are running
- Operating system/version: RHEL8
- Computer hardware: ppc64le
- Network type: IB
Details of the problem
Please describe, in detail, the problem that you are having, including the behavior you expect to see, the actual behavior that you are seeing, steps to reproduce the problem, etc. It is most helpful if you can attach a small program that a developer can use to reproduce your problem.
Build OMPI with UCX
git clone --recursive https://github.com/openucx/ucx.git
cd ucx
./autogen.sh
./contrib/configure-release-mt --without-java --prefix=$shared_dir/ucx-install --with-cuda=/usr/local/cuda
make -j40 install
git clone --recursive https://github.com/open-mpi/ompi.git ompi
cd ompi
./autogen.pl
./configure --disable-man-pages --enable-mca-no-build=btl-uct --enable-mpi1-compatibility --prefix $shared_dir/install
--with-cuda=/usr/local/cuda --with-ucx=$shared_dir/ucx-install
make -j40 install
Minimal test to recreate the issue:
https://raw.githubusercontent.com/AboorvaDevarajan/mpi-tests/main/group_mt.c
mpicc group_mt.c -o group_mt
mpirun -np 80 -host host1:40,host2:40 ./group_mt
not ident : 3
not ident : 3
not ident : 3
Here is a probable fix that resolves the issue:
#8547
Metadata
Metadata
Assignees
Labels
No labels