Open
Description
Background information
What version of Open MPI are you using?
4.1.2
Describe how Open MPI was installed
Downloaded from https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-4.1.2.tar.bz2, unpacked, and built with following script:
cd openmpi-4.1.2 &&
env \
FC=/opt/ulm/dublin/cmd/gfortran \
CC=/opt/ulm/dublin/cmd/gcc \
CXX=/opt/ulm/dublin/cmd/g++ \
./configure --prefix=/opt/ulm/dublin \
--disable-silent-rules \
--libdir=/opt/ulm/dublin/lib/amd64 \
--enable-wrapper-rpath &&
make DESTDIR=/home/pkgdev/dublin/openmpi/proto install
Please describe the system on which you are running
- Operating system/version: Solaris 11.4
- Computer hardware: Intel Xeon CPU E5-2650 v4
- Network type: 1 Gbit/s Ethernet
Details of the problem
Intermittently, even most simple MPI applications that are run locally with shared memory fail at MPI_Finalize() with errors as following:
--------------------------------------------------------------------------
A system call failed during shared memory initialization that should
not have. It is likely that your MPI job will now either abort or
experience performance degradation.
Local host: theon
System call: unlink(2) /tmp/ompi.theon.120/pid.12048/1/vader_segment.theon.120.675b0001.1
Error: No such file or directory (errno 2)
--------------------------------------------------------------------------
This happens even for most trivial test programs like the following:
#include <stdio.h>
#include <unistd.h>
#include <mpi.h>
int main(int argc, char** argv) {
MPI_Init(&argc, &argv);
int rank; MPI_Comm_rank(MPI_COMM_WORLD, &rank);
int nof_processes; MPI_Comm_size(MPI_COMM_WORLD, &nof_processes);
if (rank) {
MPI_Send(&rank, 1, MPI_INT, 0, 0, MPI_COMM_WORLD);
} else {
for (int i = 0; i + 1 < nof_processes; ++i) {
MPI_Status status;
int msg;
MPI_Recv(&msg, 1, MPI_INT, MPI_ANY_SOURCE,
0, MPI_COMM_WORLD, &status);
int count;
MPI_Get_count(&status, MPI_INT, &count);
if (count == 1) {
printf("%d\n", msg);
}
}
}
MPI_Finalize();
}
Just run mpirun multiple times and it will eventually fail:
theon$ mpicc -o mpi-test mpi-test.c
theon$ mpirun -np 4 mpi-test
3
1
2
theon$ mpirun -np 4 mpi-test
3
2
1
--------------------------------------------------------------------------
A system call failed during shared memory initialization that should
not have. It is likely that your MPI job will now either abort or
experience performance degradation.
Local host: theon
System call: unlink(2) /tmp/ompi.theon.120/pid.13340/1/vader_segment.theon.120.7c570001.1
Error: No such file or directory (errno 2)
--------------------------------------------------------------------------
theon$
We had the very same problem with Open MPI 4.1.1. We had no such problems with Open MPI 2.1.6.