Skip to content

mmap infinite recurse in opal/mca/memory/patcher #6853

@hkuno

Description

@hkuno

Background information

On certain systems, __mmap and mmap point to the same function.

For example:

$ objdump -T  /lib/x86_64-linux-gnu/libc-2.27.so | grep mmap
000000000011b9d0  w   DF .text  00000000000000e0  GLIBC_2.2.5 mmap64
000000000011b9d0 g    DF .text  00000000000000e0  GLIBC_PRIVATE __mmap
000000000011b9d0  w   DF .text  00000000000000e0  GLIBC_2.2.5 mmap

This causes intercept_mmap to infinitely recurse when HAVE___mmap is defined because intercept_mmap directly calls __mmap:

#ifdef HAVE___MMAP
    /* the darwin syscall returns an int not a long so call the underlying __mmap function */        
    result = __mmap (start, length, prot, flags, fd, offset);
#else       
    result = (void*)(intptr_t) memory_patcher_syscall(SYS_mmap, start, length, prot, flags, fd, offset);
#endif  

However, memory_patcher_component.c includes a comment that says that intercept_mmap calls __mmap in order to support darwin/Apple's internal mmap function, and no other code appears to use HAVE___MMAP. Configure sets opal_memory_patcher_happy=no for MacOS/Darwin, so modifying opal/mca/memory/patcher/configure.m4 to not define HAVE___MMAP should have no negative impact and would also prevent the infinite recursion.

Alternatively, perhaps the lines in configure.m4 and in memory_patcher_component.c could be removed instead of commented out.

What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)

master, top of tree: 3c45542
Note that the issue started appearing at: #6531

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

git clone https://github.com/open-mpi/ompi.git

Please describe the system on which you are running


Details of the problem

To reproduce this problem, build on a system with libc-2.27 and libfabric then run osu_latency on two nodes using ofi with a sockets provider:

`which mpirun` --get-stack-traces  -H ${HOST1},${HOST2} -x LD_LIBRARY_PATH -x PATH --mca btl ^openib,tcp,vader,sockets,ucx --mca mtl ^openib,tcp,vader,sockets,ucx --mca pml ^openib,tcp,vader,sockets,ucx --mca mtl_ofi_provider_include sockets --map-by ppr:1:node --bind-to none `which osu_latency`

The problem will present as a segmentation fault.
gdb and bt will show infinite recursion like the following:

#104728 0x00007f878c5be7ae in _intercept_mmap (start=0x0, length=143360, prot=3, flags=34, fd=-1, offset=0) at memory_patcher_component.c:130
#104729 0x00007f878c5be81e in intercept_mmap (start=0x0, length=143360, prot=3, flags=34, fd=-1, offset=0) at memory_patcher_component.c:144
#104730 0x00007f878c5be7ae in _intercept_mmap (start=0x0, length=143360, prot=3, flags=34, fd=-1, offset=0) at memory_patcher_component.c:130
#104731 0x00007f878c5be81e in intercept_mmap (start=0x0, length=143360, prot=3, flags=34, fd=-1, offset=0) at memory_patcher_component.c:144
#104732 0x00007f878c5be7ae in _intercept_mmap (start=0x0, length=143360, prot=3, flags=34, fd=-1, offset=0) at memory_patcher_component.c:130
#104733 0x00007f878c5be81e in intercept_mmap (start=0x0, length=143360, prot=3, flags=34, fd=-1, offset=0) at memory_patcher_component.c:144
#104734 0x00007f878c5be7ae in _intercept_mmap (start=0x0, length=143360, prot=3, flags=34, fd=-1, offset=0) at memory_patcher_component.c:130

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions