Skip to content

osc/rdma: segfault in MPI_Compare_and_swap with flat MPI #9146

@s417-lama

Description

@s417-lama

Background information

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

The current master branch: 65ca64f

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

From a git clone, as follows:

$ git clone https://github.com/open-mpi/ompi.git
$ cd ompi/
$ git submodule update --init --recursive
$ ./autogen.pl
$ mkdir build
$ cd build/
$ ../configure --prefix=<install_path> --with-ucx=<path_to_ucx> --disable-man-pages
$ make -j
$ make install

UCX v1.10.1 was built from a tarball.

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

$ git submodule status
 256b1f5dec15386990b57c7fc4c7ecd67a6f1e27 3rd-party/openpmix (v1.1.3-3014-g256b1f5)
 53e80245ad007550aee18c3fd176e030a173a16b 3rd-party/prrte (dev-31257-g53e8024)

Please describe the system on which you are running

  • Operating system/version: Red Hat Enterprise Linux 7 (3.10.0-957.21.3.el7.x86_64)
  • Computer hardware: Intel Xeon Platinum 8280 (Cascadelake)
  • Network type: Intel Omni-Path

Details of the problem

When calling MPI_Compare_and_swap() in "flat MPI" model, where multiple nodes are used and multiple processes are running on each node, it causes segfault with rdma osc.

Segfault did not occur with a single node or with multiple nodes having one process per node.

Minimum code example to reproduce segfault:

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <mpi.h>

int main(int argc, char** argv) {
  MPI_Init(&argc, &argv);

  uint64_t* lock;

  MPI_Win win;
  MPI_Win_allocate(sizeof(uint64_t), 1, MPI_INFO_NULL, MPI_COMM_WORLD, &lock, &win);
  MPI_Win_lock_all(0, win);

  *lock = 0;

  MPI_Barrier(MPI_COMM_WORLD);

  const uint64_t one = 1;
  const uint64_t zero = 0;
  uint64_t result;
  MPI_Compare_and_swap(&one, &zero, &result, MPI_UINT64_T, 0, 0, win);
  MPI_Win_flush(0, win);

  printf("%ld\n", result);

  MPI_Barrier(MPI_COMM_WORLD);

  MPI_Win_unlock_all(win);
  MPI_Finalize();
  return 0;
}

This program first initializes lock as 0, and then all processes issue MPI_Compare_and_swap() to lock at rank 0.
Expected behavior is that only one process gets result = 0.

Running the above program with 4 processes on 2 nodes:

$ mpirun --mca osc rdma -n 4 -N 2 ./a.out

Output:

[cx0001:24799:0:24799] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x30)
[cx0001:24800:0:24800] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x30)
==== backtrace (tid:  24800) ====
 0 0x00000000000587b5 ucs_debug_print_backtrace()  <HOME>/ucx-1.10.1/build/src/ucs/../../../src/ucs/debug/debug.c:656
 1 0x00000000000b9e05 mca_btl_ofi_afop()  ???:0
 2 0x000000000023f176 ompi_osc_rdma_lock_all_atomic()  ???:0
 3 0x00000000000f81c6 MPI_Win_lock_all()  ???:0
 4 0x00000000004009f1 main()  test_cas.c:13
 5 0x00000000000223d5 __libc_start_main()  ???:0
 6 0x00000000004008e9 _start()  ???:0
=================================

a.out:24800 terminated with signal 11 at PC=2b59ed535e05 SP=7fffcfd3bb00.  Backtrace:
<ompi_install_path>/lib/libopen-pal.so.0(mca_btl_ofi_afop+0x105)[0x2b59ed535e05]
<ompi_install_path>/lib/libmpi.so.0(ompi_osc_rdma_lock_all_atomic+0x326)[0x2b59ecb77176]
<ompi_install_path>/lib/libmpi.so.0(PMPI_Win_lock_all+0x96)[0x2b59eca301c6]
./a.out[0x4009f1]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x2b59ed0d13d5]
./a.out[0x4008e9]
==== backtrace (tid:  24799) ====
 0 0x00000000000587b5 ucs_debug_print_backtrace()  <HOME>/ucx-1.10.1/build/src/ucs/../../../src/ucs/debug/debug.c:656
 1 0x00000000000b9e05 mca_btl_ofi_afop()  ???:0
 2 0x000000000023f176 ompi_osc_rdma_lock_all_atomic()  ???:0
 3 0x00000000000f81c6 MPI_Win_lock_all()  ???:0
 4 0x00000000004009f1 main()  test_cas.c:13
 5 0x00000000000223d5 __libc_start_main()  ???:0
 6 0x00000000004008e9 _start()  ???:0
=================================

a.out:24799 terminated with signal 11 at PC=2b3c2c582e05 SP=7ffe75a10190.  Backtrace:
<ompi_install_path>/lib/libopen-pal.so.0(mca_btl_ofi_afop+0x105)[0x2b3c2c582e05]
<ompi_install_path>/lib/libmpi.so.0(ompi_osc_rdma_lock_all_atomic+0x326)[0x2b3c2bbc4176]
<ompi_install_path>/lib/libmpi.so.0(PMPI_Win_lock_all+0x96)[0x2b3c2ba7d1c6]
./a.out[0x4009f1]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x2b3c2c11e3d5]
./a.out[0x4008e9]

Running with -n 4 -N 1 (one process per node) and -n 4 -N 4 (only one node) did not cause segfault.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions