-
Couldn't load subscription status.
- Fork 928
Description
Background information
On certain systems, __mmap and mmap point to the same function.
For example:
$ objdump -T /lib/x86_64-linux-gnu/libc-2.27.so | grep mmap
000000000011b9d0 w DF .text 00000000000000e0 GLIBC_2.2.5 mmap64
000000000011b9d0 g DF .text 00000000000000e0 GLIBC_PRIVATE __mmap
000000000011b9d0 w DF .text 00000000000000e0 GLIBC_2.2.5 mmap
This causes intercept_mmap to infinitely recurse when HAVE___mmap is defined because intercept_mmap directly calls __mmap:
#ifdef HAVE___MMAP
/* the darwin syscall returns an int not a long so call the underlying __mmap function */
result = __mmap (start, length, prot, flags, fd, offset);
#else
result = (void*)(intptr_t) memory_patcher_syscall(SYS_mmap, start, length, prot, flags, fd, offset);
#endif However, memory_patcher_component.c includes a comment that says that intercept_mmap calls __mmap in order to support darwin/Apple's internal mmap function, and no other code appears to use HAVE___MMAP. Configure sets opal_memory_patcher_happy=no for MacOS/Darwin, so modifying opal/mca/memory/patcher/configure.m4 to not define HAVE___MMAP should have no negative impact and would also prevent the infinite recursion.
Alternatively, perhaps the lines in configure.m4 and in memory_patcher_component.c could be removed instead of commented out.
What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)
master, top of tree: 3c45542
Note that the issue started appearing at: #6531
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
git clone https://github.com/open-mpi/ompi.git
Please describe the system on which you are running
-
Operating system/version: Ubuntu bionic (4.15.0-51-generic Improve error handling in openib component init and add_procs #55-Ubuntu SMP Wed May 15 14:27:21 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux)
-
libc version: 2.26 and later
Starting with commit fa872e1b6210e81e60d6029429f0a083b8eab26e -
libfabric: https://github.com/ofiwg/libfabric
master, commit 90c41980a5f4b32cdfeb45bb257d171a8b126e67 -
Computer hardware: x86
-
Network type: libfabric with a sockets provider
Details of the problem
To reproduce this problem, build on a system with libc-2.27 and libfabric then run osu_latency on two nodes using ofi with a sockets provider:
`which mpirun` --get-stack-traces -H ${HOST1},${HOST2} -x LD_LIBRARY_PATH -x PATH --mca btl ^openib,tcp,vader,sockets,ucx --mca mtl ^openib,tcp,vader,sockets,ucx --mca pml ^openib,tcp,vader,sockets,ucx --mca mtl_ofi_provider_include sockets --map-by ppr:1:node --bind-to none `which osu_latency`The problem will present as a segmentation fault.
gdb and bt will show infinite recursion like the following:
#104728 0x00007f878c5be7ae in _intercept_mmap (start=0x0, length=143360, prot=3, flags=34, fd=-1, offset=0) at memory_patcher_component.c:130
#104729 0x00007f878c5be81e in intercept_mmap (start=0x0, length=143360, prot=3, flags=34, fd=-1, offset=0) at memory_patcher_component.c:144
#104730 0x00007f878c5be7ae in _intercept_mmap (start=0x0, length=143360, prot=3, flags=34, fd=-1, offset=0) at memory_patcher_component.c:130
#104731 0x00007f878c5be81e in intercept_mmap (start=0x0, length=143360, prot=3, flags=34, fd=-1, offset=0) at memory_patcher_component.c:144
#104732 0x00007f878c5be7ae in _intercept_mmap (start=0x0, length=143360, prot=3, flags=34, fd=-1, offset=0) at memory_patcher_component.c:130
#104733 0x00007f878c5be81e in intercept_mmap (start=0x0, length=143360, prot=3, flags=34, fd=-1, offset=0) at memory_patcher_component.c:144
#104734 0x00007f878c5be7ae in _intercept_mmap (start=0x0, length=143360, prot=3, flags=34, fd=-1, offset=0) at memory_patcher_component.c:130