Skip to content

List of issues with OMPI master+ULT+PartCom #10459

Open
@janciesko

Description

@janciesko

This tracks current issues for ULT- and PartCom support in OMPI.

ULT Support:

  • Compilation fails on current master @ HEAD when configured to --with-threads={qthreads,argobots} with
    ../../opal/mca/threads/qthreads/threads_qthreads.h:29:10: fatal error: qthread.h: No such file or directory 29 | #include "qthread.h" Configure seems to be not setting -I/PATH_TO_ULT_LIB
  • OMPI hangs when configured to --with-threads=argobots and argobots uses multiple streams and --map-by ppr:1:node thus using ucx and libevent. Requires https://github.com/shintaro-iwasaki/libevent/commits/2.0.22-abt in libevent. We need to merge that branch into libevent soon. Check why Qthreads works.
  • MPI_Init_thread hangs on current master @ HEAD when configured to --with-threads={qthreads,argobots}. Maybe this is an issue in instance/instance.h after merging sessions-related code changes.
  • UCX fails asserts on POW9 and ARM: Discussed this with Howard. An assert is failing in src/ucp/core/ucp_worker.c. I'll add more info once the latest OMPI version compiles.

Partitioned Communication:

  • Non equal numbers of send- and receive partitions leads to deadlock. Looking at a backtrace, it seems that the receiving rank waits for the last receive partition. MPI Partix is a reproducer when setting DEFAULT_RECV_SEND_PARTITION_RATIO=1 to another value such as 2.

General improvements:

  • Add ULT coverage to CI testing

Reproducers of all above:

git clone https://github.com/sandialabs/MPI-Partix.git
export PATH=$OMPI_INSTALL_PATH/bin
cmake ..  -DCMAKE_CXX_COMPILER=mpicxx -DQthreads_ROOT=$QTHREADS_INSTALL_PATH -DPartix_ENABLE_QTHREADS=ON
mpirun -np 2 --map-by ppr:1:node ./bench1

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions