Open
Description
This tracks current issues for ULT- and PartCom support in OMPI.
ULT Support:
- Compilation fails on current master @ HEAD when configured to --with-threads={qthreads,argobots} with
../../opal/mca/threads/qthreads/threads_qthreads.h:29:10: fatal error: qthread.h: No such file or directory 29 | #include "qthread.h"
Configure seems to be not setting -I/PATH_TO_ULT_LIB - OMPI hangs when configured to --with-threads=argobots and argobots uses multiple streams and --map-by ppr:1:node thus using ucx and libevent. Requires https://github.com/shintaro-iwasaki/libevent/commits/2.0.22-abt in libevent. We need to merge that branch into libevent soon. Check why Qthreads works.
- MPI_Init_thread hangs on current master @ HEAD when configured to --with-threads={qthreads,argobots}. Maybe this is an issue in instance/instance.h after merging sessions-related code changes.
- UCX fails asserts on POW9 and ARM: Discussed this with Howard. An assert is failing in src/ucp/core/ucp_worker.c. I'll add more info once the latest OMPI version compiles.
Partitioned Communication:
- Non equal numbers of send- and receive partitions leads to deadlock. Looking at a backtrace, it seems that the receiving rank waits for the last receive partition. MPI Partix is a reproducer when setting
DEFAULT_RECV_SEND_PARTITION_RATIO=1
to another value such as 2.
General improvements:
- Add ULT coverage to CI testing
Reproducers of all above:
git clone https://github.com/sandialabs/MPI-Partix.git
export PATH=$OMPI_INSTALL_PATH/bin
cmake .. -DCMAKE_CXX_COMPILER=mpicxx -DQthreads_ROOT=$QTHREADS_INSTALL_PATH -DPartix_ENABLE_QTHREADS=ON
mpirun -np 2 --map-by ppr:1:node ./bench1