Skip to content

Defect: some mpi implementations crash on shutdown when coarrays with allocatable components are used #762

Closed
@everythingfunctional

Description

@everythingfunctional
  • I am reporting a bug others will be able to reproduce and not asking a question or requesting a new feature.

System information including:

  • OpenCoarrays Version: 2.10.0
  • Fortran Compiler: gfortran 11.3 on MacOS, 12.0.1 on Linux, and 12.1.0 on Windows
  • C compiler used for building lib: gcc versions same as above
  • Installation method: various
  • All flags & options passed to the installer: only to specify mpi to use
  • Output of uname -a: Darwin Brads-MBP.tx.rr.com 21.4.0 Darwin Kernel Version 21.4.0: Fri Mar 18 00:45:05 PDT 2022; root:xnu-8020.101.4~15/RELEASE_X86_64 x86_64, Windows 10 installation in Parallels on same Mac, and Linux pop-os 5.17.15-76051715-generic #202206141358~1655919116~22.04~1db9e34 SMP PREEMPT Wed Jun 22 19 x86_64 x86_64 x86_64 GNU/Linux
  • MPI library being used: openmpi and MPICH 3.2 on MacOS and Linux, Intel MPI on Windows
  • Machine architecture and number of physical cores: 8 core i9 on Mac, 4 core Intel i5 on Linux
  • Version of CMake: 3.22.1 on Mac, 3.20.0-rc3 on Windows, and 3.22.1 on Linux

To help us debug your issue please explain:

What you were trying to do (and why)

Compile and execute the following program.

program hello_coarrays
    implicit none
    type :: array_type
        integer, allocatable :: values(:)
    end type
    type(array_type) :: array[*]
    allocate(array%values(2), source=0)
    array%values = this_image()
    print *, array%values
end program

What happened (include command output, screenshots, logs, etc.)

Homebrew install On MacOS

[Brads-MacBook-Pro:~/tmp/hello_coarrays] which caf
/Users/brad/Repositories/github/sourceryinstitute/OpenCoarrays/prerequisites/installations//opencoarrays/2.10.0/bin/caf
[Brads-MacBook-Pro:~/tmp/hello_coarrays] caf --version

OpenCoarrays Coarray Fortran Compiler Wrapper (caf version 2.10.0-11-gdfde1b9)
Copyright (C) 2015-2022 Sourcery Institute
Copyright (C) 2015-2022 Archaeologic Inc.

OpenCoarrays comes with NO WARRANTY, to the extent permitted by law.
You may redistribute copies of OpenCoarrays under the terms of the
BSD 3-Clause License.  For more information about these matters, see
the file named LICENSE that is distributed with OpenCoarrays.

[Brads-MacBook-Pro:~/tmp/hello_coarrays] caf hello_coarrays.f90 -o hello_coarrays
ld: warning: dylib (/usr/local/Cellar/gcc/11.3.0_2/lib/gcc/11/libgfortran.dylib) was built for newer macOS version (12.4) than being linked (12.3)
ld: warning: dylib (/usr/local/Cellar/gcc/11.3.0_2/lib/gcc/11/libquadmath.dylib) was built for newer macOS version (12.4) than being linked (12.3)
[Brads-MacBook-Pro:~/tmp/hello_coarrays] cafrun -n 4 ./hello_coarrays
           1           1
           1           1
           1           1
           1           1

Compiled from source and linked to MPICH compiled from source on MacOS

[Brads-MacBook-Pro:~/tmp/hello_coarrays] /usr/local/bin/caf --version

OpenCoarrays Coarray Fortran Compiler Wrapper (caf version 2.10.0)
Copyright (C) 2015-2022 Sourcery Institute
Copyright (C) 2015-2022 Archaeologic Inc.

OpenCoarrays comes with NO WARRANTY, to the extent permitted by law.
You may redistribute copies of OpenCoarrays under the terms of the
BSD 3-Clause License.  For more information about these matters, see
the file named LICENSE that is distributed with OpenCoarrays.

[Brads-MacBook-Pro:~/tmp/hello_coarrays] /usr/local/bin/caf hello_coarrays.f90 -o hello_coarrays
ld: warning: directory not found for option '-L/usr/local/Cellar/open-mpi/4.1.3/lib'
ld: warning: dylib (/usr/local/Cellar/gcc/11.3.0_2/lib/gcc/11/libgfortran.dylib) was built for newer macOS version (12.4) than being linked (12.3)
ld: warning: dylib (/usr/local/Cellar/gcc/11.3.0_2/lib/gcc/11/libquadmath.dylib) was built for newer macOS version (12.4) than being linked (12.3)
[Brads-MacBook-Pro:~/tmp/hello_coarrays] /usr/local/bin/cafrun -n 4 ./hello_coarrays
           3           3
           4           4
           1           1
           2           2
[Brads-MBP:18994] *** An error occurred in MPI_Win_detach
[Brads-MBP:18994] *** reported by process [238747649,1]
[Brads-MBP:18994] *** on win rdma window 5
[Brads-MBP:18994] *** MPI_ERR_UNKNOWN: unknown error
[Brads-MBP:18994] *** MPI_ERRORS_ARE_FATAL (processes in this win will now abort,
[Brads-MBP:18994] ***    and potentially your MPI job)
[Brads-MBP.tx.rr.com:18992] 3 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[Brads-MBP.tx.rr.com:18992] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
Error: Command:
   `/usr/local/bin/mpiexec -n 4 ./hello_coarrays`
failed to run.

Compiled from source on Windows and linked with Intel oneAPI MPI

C:\Users\brad\Repositories\GitHub\sourceryinstitute>opencoarrays-install\bin\caf --version

C:\Users\brad\Repositories\GitHub\sourceryinstitute>"C:/Program Files/Git/usr/bin/bash.exe" "C:\Users\brad\Repositories\GitHub\sourceryinstitute\opencoarrays-install\bin\caf" --version

OpenCoarrays Coarray Fortran Compiler Wrapper (caf version 2.10.0-11-gdfde1b9)
Copyright (C) 2015-2022 Sourcery Institute
Copyright (C) 2015-2022 Archaeologic Inc.

OpenCoarrays comes with NO WARRANTY, to the extent permitted by law.
You may redistribute copies of OpenCoarrays under the terms of the
BSD 3-Clause License.  For more information about these matters, see
the file named LICENSE that is distributed with OpenCoarrays.


C:\Users\brad\Repositories\GitHub\sourceryinstitute>vim hello_coarrays.f90

C:\Users\brad\Repositories\GitHub\sourceryinstitute>opencoarrays-install\bin\caf hello_coarrays.f90 -o hello_coarrays

C:\Users\brad\Repositories\GitHub\sourceryinstitute>"C:/Program Files/Git/usr/bin/bash.exe" "C:\Users\brad\Repositories\GitHub\sourceryinstitute\opencoarrays-install\bin\caf" hello_coarrays.f90 -o hello_coarrays

C:\Users\brad\Repositories\GitHub\sourceryinstitute>opencoarrays-install\bin\cafrun -n 4 .\hello_coarrays

C:\Users\brad\Repositories\GitHub\sourceryinstitute>"C:/Program Files/Git/usr/bin/bash.exe" "C:\Users\brad\Repositories\GitHub\sourceryinstitute\opencoarrays-install\bin\cafrun" -n 4 .\hello_coarrays
           4           4
           3           3
           1           1
           2           2

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x8ab1344a
#1  0x8ab09303
#2  0x8aaea201
#3  0x4a7c7ff7
#4  0x4c39209e
#5  0x4c341453
#6  0x4c390bcd
#7  0x4c396c3a
#8  0x4c3147b0
#9  0x4a7b9c9b
#10  0x8aad50b5
#11  0x8aac192d
#12  0x8aac13bd
#13  0x8aac14f5
#14  0x4b197033
#15  0x4c342650
#16  0xffffffff

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 1 PID 2052 RUNNING AT BRADRICHARD5FC1
=   EXIT STATUS: -1 (ffffffff)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 2 PID 5516 RUNNING AT BRADRICHARD5FC1
=   EXIT STATUS: -1 (ffffffff)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 3 PID 5048 RUNNING AT BRADRICHARD5FC1
=   EXIT STATUS: -1 (ffffffff)
===================================================================================
Error: Command:
   `C:/Program Files (x86)/Intel/oneAPI/mpi/latest/bin/mpiexec.exe -n 4 .\hello_coarrays`
failed to run.

Compiled from source on Linux and linked to system openmpi

(base) [pop-os:~/tmp/hello_coarrays] caf --version                           

OpenCoarrays Coarray Fortran Compiler Wrapper (caf version 2.10.0-11-gdfde1b9)
Copyright (C) 2015-2022 Sourcery Institute
Copyright (C) 2015-2022 Archaeologic Inc.

OpenCoarrays comes with NO WARRANTY, to the extent permitted by law.
You may redistribute copies of OpenCoarrays under the terms of the
BSD 3-Clause License.  For more information about these matters, see
the file named LICENSE that is distributed with OpenCoarrays.

(base) [pop-os:~/tmp/hello_coarrays] caf --show
/usr/bin/gfortran -I/home/brad/Repositories/GitHub/sourceryinstitute/OpenCoarrays/prerequisites/installations/opencoarrays/2.10.0/include/OpenCoarrays-2.10.0-11-gdfde1b9_GNU-12.0.1 -fcoarray=lib -L/usr/lib/x86_64-linux-gnu/openmpi/lib/fortran/gfortran ${@} /home/brad/Repositories/GitHub/sourceryinstitute/OpenCoarrays/prerequisites/installations/opencoarrays/2.10.0/lib/libcaf_mpi.a /usr/lib/x86_64-linux-gnu/libmpi_usempif08.so /usr/lib/x86_64-linux-gnu/libmpi_usempi_ignore_tkr.so /usr/lib/x86_64-linux-gnu/libmpi_mpifh.so /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so /usr/lib/x86_64-linux-gnu/libopen-rte.so /usr/lib/x86_64-linux-gnu/libopen-pal.so /usr/lib/x86_64-linux-gnu/libhwloc.so /usr/lib/x86_64-linux-gnu/libevent_core.so /usr/lib/x86_64-linux-gnu/libevent_pthreads.so /usr/lib/x86_64-linux-gnu/libm.so /usr/lib/x86_64-linux-gnu/libz.so
(base) [pop-os:~/tmp/hello_coarrays] caf hello_coarrays.f90 -o hello_coarrays
(base) [pop-os:~/tmp/hello_coarrays] cafrun -n 4 ./hello_coarrays            
           1           1
           2           2
           3           3
           4           4
[pop-os:84252] *** An error occurred in MPI_Win_detach
[pop-os:84252] *** reported by process [2910650369,0]
[pop-os:84252] *** on win rdma window 5
[pop-os:84252] *** MPI_ERR_UNKNOWN: unknown error
[pop-os:84252] *** MPI_ERRORS_ARE_FATAL (processes in this win will now abort,
[pop-os:84252] ***    and potentially your MPI job)
[pop-os:84243] 3 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[pop-os:84243] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
Error: Command:
   `/usr/bin/mpiexec -n 4 ./hello_coarrays`
failed to run.

Compiled from source and linked to MPICH compiled from source on Linux

(base) [pop-os:~/tmp/hello_coarrays] caf --version

OpenCoarrays Coarray Fortran Compiler Wrapper (caf version 2.10.0-11-gdfde1b9)
Copyright (C) 2015-2022 Sourcery Institute
Copyright (C) 2015-2022 Archaeologic Inc.

OpenCoarrays comes with NO WARRANTY, to the extent permitted by law.
You may redistribute copies of OpenCoarrays under the terms of the
BSD 3-Clause License.  For more information about these matters, see
the file named LICENSE that is distributed with OpenCoarrays.

(base) [pop-os:~/tmp/hello_coarrays] caf --show
/usr/bin/gfortran -I/home/brad/Repositories/GitHub/sourceryinstitute/OpenCoarrays/prerequisites/installations/opencoarrays/2.10.0/include/OpenCoarrays-2.10.0-11-gdfde1b9_GNU-12.0.1 -fcoarray=lib -Wl,-rpath -Wl,/home/brad/Repositories/GitHub/sourceryinstitute/OpenCoarrays/prerequisites/installations/lib -Wl,--enable-new-dtags ${@} /home/brad/Repositories/GitHub/sourceryinstitute/OpenCoarrays/prerequisites/installations/opencoarrays/2.10.0/lib/libcaf_mpi.a /home/brad/Repositories/GitHub/sourceryinstitute/OpenCoarrays/prerequisites/installations/lib/libmpifort.so /home/brad/Repositories/GitHub/sourceryinstitute/OpenCoarrays/prerequisites/installations/lib/libmpi.so
(base) [pop-os:~/tmp/hello_coarrays] caf hello_coarrays.f90 -o hello_coarrays
(base) [pop-os:~/tmp/hello_coarrays] cafrun -n 4 ./hello_coarrays
           1           1
           1           1
           1           1
           1           1

Compiled from source on Linux and linked with Intel oneAPI MPI

[pop-os:~/tmp/hello_coarrays] caf --show
/usr/bin/gfortran -I/home/brad/Repositories/GitHub/sourceryinstitute/OpenCoarrays/prerequisites/installations/opencoarrays/2.10.0/include/OpenCoarrays-2.10.0-11-gdfde1b9_GNU-12.0.1 -fcoarray=lib -Xlinker --enable-new-dtags -Xlinker -rpath -Xlinker /opt/intel/oneapi/mpi/2021.6.0/lib/release -Xlinker -rpath -Xlinker /opt/intel/oneapi/mpi/2021.6.0/lib -Xlinker --enable-new-dtags ${@} /home/brad/Repositories/GitHub/sourceryinstitute/OpenCoarrays/prerequisites/installations/opencoarrays/2.10.0/lib/libcaf_mpi.a /opt/intel/oneapi/mpi/2021.6.0/lib/libmpifort.so /opt/intel/oneapi/mpi/2021.6.0/lib/release/libmpi.so /usr/lib/x86_64-linux-gnu/librt.a /usr/lib/x86_64-linux-gnu/libpthread.a /usr/lib/x86_64-linux-gnu/libdl.a
[pop-os:~/tmp/hello_coarrays] caf hello_coarrays.f90 -o hello_coarrays
[pop-os:~/tmp/hello_coarrays] cafrun -n 4 ./hello_coarrays
           1           1
           2           2
           3           3
           4           4
double free or corruption (out)
free(): invalid pointer

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
free(): invalid pointer

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
free(): invalid pointer

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
#0  0x7f512fd47ae0 in ???
#1  0x7f512fd46c45 in ???
#2  0x7f512fb3e51f in ???
	at ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
#0  0x7ff00df47ae0 in ???
#1  0x7ff00df46c45 in ???
#0  0x7f4979f47ae0 in ???
#1  0x7f4979f46c45 in ???
#0  0x7f5023547ae0 in ???
#1  0x7f5023546c45 in ???
#2  0x7f4979d3e51f in ???
	at ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
#2  0x7ff00dd3e51f in ???
	at ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
#2  0x7f502333e51f in ???
	at ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
#3  0x7f512fb92a7c in __pthread_kill_implementation
	at ./nptl/pthread_kill.c:44
#4  0x7f512fb92a7c in __pthread_kill_internal
	at ./nptl/pthread_kill.c:78
#5  0x7f512fb92a7c in __GI___pthread_kill
	at ./nptl/pthread_kill.c:89
#6  0x7f512fb3e475 in __GI_raise
	at ../sysdeps/posix/raise.c:26
#7  0x7f512fb247f2 in __GI_abort
	at ./stdlib/abort.c:79
#3  0x7ff00dd92a7c in __pthread_kill_implementation
	at ./nptl/pthread_kill.c:44
#4  0x7ff00dd92a7c in __pthread_kill_internal
#3  0x7f4979d92a7c in __pthread_kill_implementation
	at ./nptl/pthread_kill.c:44
#4  0x7f4979d92a7c in __pthread_kill_internal
	at ./nptl/pthread_kill.c:78
#5  0x7ff00dd92a7c in __GI___pthread_kill
	at ./nptl/pthread_kill.c:89
	at ./nptl/pthread_kill.c:78
#5  0x7f4979d92a7c in __GI___pthread_kill
	at ./nptl/pthread_kill.c:89
#6  0x7f4979d3e475 in __GI_raise
	at ../sysdeps/posix/raise.c:26
#6  0x7ff00dd3e475 in __GI_raise
	at ../sysdeps/posix/raise.c:26
#3  0x7f5023392a7c in __pthread_kill_implementation
	at ./nptl/pthread_kill.c:44
#4  0x7f5023392a7c in __pthread_kill_internal
	at ./nptl/pthread_kill.c:78
#5  0x7f5023392a7c in __GI___pthread_kill
	at ./nptl/pthread_kill.c:89
#6  0x7f502333e475 in __GI_raise
	at ../sysdeps/posix/raise.c:26
#7  0x7f4979d247f2 in __GI_abort
	at ./stdlib/abort.c:79
#7  0x7ff00dd247f2 in __GI_abort
	at ./stdlib/abort.c:79
#8  0x7f512fb856f5 in __libc_message
	at ../sysdeps/posix/libc_fatal.c:155
#7  0x7f50233247f2 in __GI_abort
	at ./stdlib/abort.c:79
#8  0x7f4979d856f5 in __libc_message
	at ../sysdeps/posix/libc_fatal.c:155
#8  0x7ff00dd856f5 in __libc_message
	at ../sysdeps/posix/libc_fatal.c:155
#8  0x7f50233856f5 in __libc_message
	at ../sysdeps/posix/libc_fatal.c:155
#9  0x7f512fb9cd7b in malloc_printerr
	at ./malloc/malloc.c:5664
#10  0x7f512fb9eeef in _int_free
	at ./malloc/malloc.c:4588
#11  0x7f512fba14d2 in __GI___libc_free
	at ./malloc/malloc.c:3391
#12  0x555a6c656c68 in ???
#13  0x555a6c64fa48 in ???
#9  0x7f502339cd7b in malloc_printerr
	at ./malloc/malloc.c:5664
#10  0x7f502339eac3 in _int_free
	at ./malloc/malloc.c:4439
#9  0x7f4979d9cd7b in malloc_printerr
	at ./malloc/malloc.c:5664
#10  0x7f4979d9eac3 in _int_free
	at ./malloc/malloc.c:4439
#11  0x7f4979da14d2 in __GI___libc_free
	at ./malloc/malloc.c:3391
#12  0x5642223bcc68 in ???
#13  0x5642223b5a48 in ???
#11  0x7f50233a14d2 in __GI___libc_free
	at ./malloc/malloc.c:3391
#12  0x556567f1dc68 in ???
#13  0x556567f16a48 in ???
#9  0x7ff00dd9cd7b in malloc_printerr
	at ./malloc/malloc.c:5664
#10  0x7ff00dd9eac3 in _int_free
	at ./malloc/malloc.c:4439
#11  0x7ff00dda14d2 in __GI___libc_free
	at ./malloc/malloc.c:3391
#12  0x5607988c3c68 in ???
#13  0x5607988bca48 in ???
#14  0x7f512fb25d8f in __libc_start_call_main
	at ../sysdeps/nptl/libc_start_call_main.h:58
#15  0x7f512fb25e3f in __libc_start_main_impl
	at ../csu/libc-start.c:392
#16  0x555a6c64f584 in ???
#17  0xffffffffffffffff in ???
#14  0x7f4979d25d8f in __libc_start_call_main
	at ../sysdeps/nptl/libc_start_call_main.h:58
#15  0x7f4979d25e3f in __libc_start_main_impl
	at ../csu/libc-start.c:392
#16  0x5642223b5584 in ???
#17  0xffffffffffffffff in ???
#14  0x7f5023325d8f in __libc_start_call_main
	at ../sysdeps/nptl/libc_start_call_main.h:58
#15  0x7f5023325e3f in __libc_start_main_impl
	at ../csu/libc-start.c:392
#16  0x556567f16584 in ???
#17  0xffffffffffffffff in ???
#14  0x7ff00dd25d8f in __libc_start_call_main
	at ../sysdeps/nptl/libc_start_call_main.h:58
#15  0x7ff00dd25e3f in __libc_start_main_impl
	at ../csu/libc-start.c:392
#16  0x5607988bc584 in ???
#17  0xffffffffffffffff in ???

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 224054 RUNNING AT pop-os
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 1 PID 224055 RUNNING AT pop-os
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 2 PID 224056 RUNNING AT pop-os
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 3 PID 224057 RUNNING AT pop-os
=   KILLED BY SIGNAL: 6 (Aborted)
===================================================================================
Error: Command:
   `/opt/intel/oneapi/mpi/2021.6.0/bin/mpiexec -n 4 ./hello_coarrays`
failed to run.

What you expected to happen

Ideally, the mpi implementation shouldn't impact whether a crash occurs

Step-by-step reproduction instructions to reproduce the error/bug

Link/use an mpi implementation other than MPICH

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions