Skip to content

MTT failure on IBM #10969

Open
Open
@rhc54

Description

@rhc54

The following failure is being reported from both OMPI v5.0 and main branches:

[ip-172-31-10-245:11280] Unable to extract peer [[26806,1],132] nodeid from the modex.
[ip-172-31-10-245:11280] Unable to extract peer [[26806,1],133] nodeid from the modex.
[ip-172-31-10-245:11280] Unable to extract peer [[26806,1],134] nodeid from the modex.
[ip-172-31-10-245:11280] Unable to extract peer [[26806,1],135] nodeid from the modex.
[ip-172-31-10-245:11280] Unable to extract peer [[26806,1],136] nodeid from the modex.
[ip-172-31-10-245:11280] Unable to extract peer [[26806,1],137] nodeid from the modex.
....

PRRTE is providing the nodeid for every proc in the job as part of the initial job info - it is therefore not included in the modex. However, I cannot find the location where this error message is emitted, and so I don't know the precise function call that generated it.

Could someone please provide me with further info as to how this error is generated?

The command executed is: mpirun -n 144 topology/distgraph1 , if that helps (remember, I do not have access to the ompi-tests repository)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions