Skip to content

Treematch issues in master #4303

Closed
Closed
@rhc54

Description

@rhc54

The treematch topology component is segfaulting in master when running MTT:

$ mpirun --oversubscribe --bind-to none   -np 16  topology/distgraph1 
using graph layout 'deterministic complete graph'
testing MPI_Dist_graph_create_adjacent
testing MPI_Dist_graph_create w/ outgoing only
Using default
Using default
Using default
Using default
Using default
Using default
Using default
Using default
Using default
Using default
Using default
Using default
Using default
Using default
Using default
Using default
Using default
Using default
Using default
Using default
Using defaultdistgraph1: topo_treematch_dist_graph_create.c:656:
mca_topo_treematch_dist_graph_create: Assertion `(int)sol->k_length == size' failed.
[rhc001:14020] *** Process received signal ***
[rhc001:14020] Signal: Aborted (6)
[rhc001:14020] Signal code:  (-6)[rhc001:14020] [ 0] /lib64/libpthread.so.0(+0xf370)[0x7f385b8e8370]
[rhc001:14020] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x7f385b54d1d7]
[rhc001:14020] [ 2] /lib64/libc.so.6(abort+0x148)[0x7f385b54e8c8]
[rhc001:14020] [ 3] /lib64/libc.so.6(+0x2e146)[0x7f385b546146]
[rhc001:14020] [ 4] /lib64/libc.so.6(+0x2e1f2)[0x7f385b5461f2]
[rhc001:14020] [ 5]
/home/common/openmpi/build/foobar/lib/openmpi/mca_topo_treematch.so(mca_topo_treematch_dist_graph_create+0x21e9)[0x7f384396e702]
[rhc001:14020] [ 6]
/home/common/openmpi/build/foobar/lib/libmpi.so.0(PMPI_Dist_graph_create+0x44d)[0x7f385bb7a83e]
[rhc001:14020] [ 7] topology/distgraph1[0x40219e]
[rhc001:14020] [ 8] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f385b539b35]
[rhc001:14020] [ 9] topology/distgraph1[0x400fc9]
[rhc001:14020] *** End of error message ***-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
---------------------------------------------------------------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node rhc001 exited on signal 6 (Aborted).
--------------------------------------------------------------------------

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions