Skip to content

Segmentation fault using openmpi 4.1.2 #11129

Open
@Huyuxi08

Description

@Huyuxi08

Background information

What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)

4.1.2

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Following these steps:

Downloaded OpenMPI 4.1.2 from https://www.open-mpi.org/software/ompi/v4.1/
cd openmpi-4.1.2
./configure --prefix=/home/software/huyxii/openmpi-4.1.2
make
make install

Please describe the system on which you are running

Operating system/version: CentOS Linux release 7.4.1708 (Core)

Details of the problem

mpirun could run successfully when running hello_c in the example directory:

mpirun -np 5 hello_c

Hello, world, I am 3 of 5, (Open MPI v4.1.2, package: Open MPI huyxii@mu01 Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021, 106)
Hello, world, I am 0 of 5, (Open MPI v4.1.2, package: Open MPI huyxii@mu01 Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021, 106)
Hello, world, I am 1 of 5, (Open MPI v4.1.2, package: Open MPI huyxii@mu01 Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021, 106)
Hello, world, I am 2 of 5, (Open MPI v4.1.2, package: Open MPI huyxii@mu01 Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021, 106)
Hello, world, I am 4 of 5, (Open MPI v4.1.2, package: Open MPI huyxii@mu01 Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021, 106)

When using mpiexec on some programs, I just get error like this:

mpiexec -np 5 maker -base Fn_Male maker_bopts.ctl maker_exe.ctl maker_opts.ctl --ignore_nfs_tmp

STATUS: Parsing control files...
STATUS: Processing and indexing input FASTA files...
[mu01:30587] *** Process received signal ***
[mu01:30587] Signal: 段错误 (11)
[mu01:30587] Signal code: Address not mapped (1)
[mu01:30587] Failing at address: 0x4b0
[mu01:30587] [ 0] /lib64/libpthread.so.0(+0xf6d0)[0x2b89f23636d0]
[mu01:30587] [ 1] /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x22)[0x2b89f12f3042]
[mu01:30587] [ 2] /lib64/libpthread.so.0(+0xf6d0)[0x2b89f23636d0]
[mu01:30587] [ 3] /lib64/libc.so.6(__poll+0x2d)[0x2b89f2663f0d]
[mu01:30587] [ 4] /home/huyxii/software/openmpi-4.1.2/lib/libopen-pal.so.40(+0x8a3a8)[0x2b89fce5d3a8]
[mu01:30587] [ 5] /home/huyxii/software/openmpi-4.1.2/lib/libopen-pal.so.40(opal_libevent2022_event_base_loop+0x196)[0x2b89fce53e76]
[mu01:30587] [ 6] /home/huyxii/software/openmpi-4.1.2/lib/libopen-pal.so.40(+0x3cfbe)[0x2b89fce0ffbe]
[mu01:30587] [ 7] /lib64/libpthread.so.0(+0x7e25)[0x2b89f235be25]
[mu01:30587] [ 8] /lib64/libc.so.6(clone+0x6d)[0x2b89f266ebad]
[mu01:30587] *** End of error message ***


Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.


[mu01:30592] *** Process received signal ***
[mu01:30592] Signal: 段错误 (11)
[mu01:30592] Signal code: Address not mapped (1)
[mu01:30592] Failing at address: 0x4b0
[mu01:30592] [ 0] /lib64/libpthread.so.0(+0xf6d0)[0x2b9c0e3996d0]
[mu01:30592] [ 1] /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x22)[0x2b9c0d329042]
[mu01:30592] [ 2] /lib64/libpthread.so.0(+0xf6d0)[0x2b9c0e3996d0]
[mu01:30592] [ 3] /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x0)[0x2b9c0d329020]
[mu01:30592] [ 4] /lib64/libpthread.so.0(+0xf6d0)[0x2b9c0e3996d0]
[mu01:30592] [ 5] /lib64/libc.so.6(__poll+0x2d)[0x2b9c0e699f0d]
[mu01:30592] [ 6] /home/huyxii/software/openmpi-4.1.2/lib/libopen-pal.so.40(+0x8a3a8)[0x2b9c18e933a8]
[mu01:30592] [ 7] /home/huyxii/software/openmpi-4.1.2/lib/libopen-pal.so.40(opal_libevent2022_event_base_loop+0x196)[0x2b9c18e89e76]
[mu01:30592] [ 8] /home/huyxii/software/openmpi-4.1.2/lib/libopen-pal.so.40(+0x3cfbe)[0x2b9c18e45fbe]
[mu01:30592] [ 9] /lib64/libpthread.so.0(+0x7e25)[0x2b9c0e391e25]
[mu01:30592] [10] /lib64/libc.so.6(clone+0x6d)[0x2b9c0e6a4bad]
[mu01:30592] *** End of error message ***
SIGTERM received
SIGTERM received
SIGTERM received

mpiexec noticed that process rank 1 with PID 0 on node mu01 exited on signal 11 (Segmentation fault).

Thanks in advance for any help.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions