Skip to content

Segmentation fault in simple program using MPI_Comm_accept()/connect() #4153

Closed
@awlauria

Description

@awlauria

Updated, new info 9/1/17

What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)

3.0.0rc4

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

git clone

Please describe the system on which you are running

Red Hat 7.3
Power and X86

Details of the problem

I have a simple test that hits a segmentation fault intermittently in all v3.0 rc's.
The tests seems to pass on the master branch, though it's not apparent as to where the bug got 'fixed'. I attached a sample testcase that will fail running on a single node with two tasks. It may not fail every time, but if you run it in a loop you will hit the segmentation fault after some runs. The location the segmentation fault will change, so below is an example stack-trace.

I can only get it to crash using an optimized build.

Running with valgrind doesn't show any heap corruption, even on the fail case. So it seems to be stack related, unless valgrind is missing something.

0  0x000010000050eb18 in raise () from /lib64/libc.so.6
#1  0x0000100000510c9c in abort () from /lib64/libc.so.6
#2  0x0000100000555784 in __libc_message () from /lib64/libc.so.6
#3  0x000010000055f800 in malloc_consolidate () from /lib64/libc.so.6
#4  0x0000100000561be4 in _int_malloc () from /lib64/libc.so.6
#5  0x00001000005645ec in malloc () from /lib64/libc.so.6
#6  0x000010000059fc58 in __alloc_dir () from /lib64/libc.so.6
#7  0x000010000059fdcc in __opendirat () from /lib64/libc.so.6
#8  0x000010000059fe30 in opendir () from /lib64/libc.so.6
#9  0x00001000001f1004 in opal_os_dirpath_is_empty () from /smpi_dev/awlauria/ompi/exports/lib/libopen-pal.so.40
#10 0x00001000000a8d10 in orte_session_dir_cleanup () from /smpi_dev/awlauria/ompi/exports/lib/libopen-rte.so.40
#11 0x00000000100018f4 in ?? ()
#12 0x0000000010001060 in main (argc=-1, argv=0x0) at allgather_inter.c:43

sample run:

`mpirun -np 2 ./simple_test

simple_test.zip

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions