Skip to content

v1.10 MTT fails on collective tests #1914

Closed
@thananon

Description

@thananon

We have been seeing a lot of fails on collectives in MTT for 2 days in a row.

From what I'm seeing is all the fails are coming from the collective operations. Unfortunately I am not able to reproduce any of this. So I look into the changes in 1.10 and the only commit that might be related to this issue is open-mpi/ompi-release@640bcf6

This is what most of the stacks look like. It also fails on other collective test as well.

[mpi008:15568] *** Process received signal ***
[mpi008:15568] Signal: Segmentation fault (11)
[mpi008:15568] Signal code: Address not mapped (1)
[mpi008:15568] Failing at address: 0x100000030
[mpi008:15569] *** Process received signal ***
[mpi008:15568] [ 0] /lib64/libpthread.so.0[0x3ca080f710]
[mpi008:15568] [ 1] /home/mpiteam/scratches/community/2016-07-26cron/dUiN/installs/WC2g/install/lib/libmpi.so.12(mca_pml_ob1_recv_req_start+0x19e)[0x2aaaaad9aeb3]
[mpi008:15568] [ 2] /home/mpiteam/scratches/community/2016-07-26cron/dUiN/installs/WC2g/install/lib/libmpi.so.12(mca_pml_ob1_irecv+0x318)[0x2aaaaad8e03d]
[mpi008:15568] [ 3] /home/mpiteam/scratches/community/2016-07-26cron/dUiN/installs/WC2g/install/lib/libmpi.so.12(mca_coll_inter_allgather_inter+0x176)[0x2aaaaacac2df]
[mpi008:15568] [ 4] /home/mpiteam/scratches/community/2016-07-26cron/dUiN/installs/WC2g/install/lib/libmpi.so.12(PMPI_Allgather+0x283)[0x2aaaaab4efc8]
[mpi008:15568] [ 5] collective/intercomm/allgather_gap_inter[0x4017a6]
[mpi008:15568] [ 6] collective/intercomm/allgather_gap_inter[0x4014fc]
[mpi008:15568] [ 7] /lib64/libc.so.6(__libc_start_main+0xfd)[0x3ca041ed1d]
[mpi008:15568] [ 8] collective/intercomm/allgather_gap_inter[0x401259]
[mpi008:15568] *** End of error message ***[mpi008:15569] Signal: Segmentation fault (11)
[mpi008:15569] Signal code: Address not mapped (1)
[mpi008:15569] Failing at address: 0x100000030[mpi008:15569] [ 0]

I will be happy to provide additional information if needed.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions