Skip to content

Commit d570ae1

Browse files
authored
HCOLL crash at ompi 5.0.5
1 parent 1114231 commit d570ae1

File tree

1 file changed

+8
-0
lines changed

1 file changed

+8
-0
lines changed

install_openmpi.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,3 +85,11 @@ For fortran support, may use --enable-mpi-fortran=all
8585
- `--lib-cuda-libdir` must point the dir containing libcuda.so
8686
- ./autogen.pl --force
8787
- ./configure --prefix=... --with-cuda=/usr/local/cuda --with-cuda-libdir=/usr/local/cuda/targets/x86_64-linux/lib/stubs --disable-dependency-tracking --disable-silent-rules --enable-shared --enable-fast-install --with-devel-headers --with-hwloc=internal --with-platform=contrib/platform/optimized --with-knem=... --with-hcoll=... --with-ucx=... --enable-mpi-compatibility CC=gcc CXX=g++ F77=gfortran F90=gfortran
88+
89+
## For version 5.0.* + UCX + CUDA + Mellanox
90+
- When there is a warning message: UCP deosn not support MPI_THREAD_MULTIPLE
91+
- When building ucx, add the option of --enable_mt
92+
- In the application code, feed MPI_THREAD_FUNNELED into MPI_Init()
93+
- Crash is found at MPI_Allreduce() -> mca_coll_hcoll_allreduce()
94+
- No solution is found yet
95+
- --mca coll ^hcoll will skip hcoll

0 commit comments

Comments
 (0)