Open
Description
The following failure is being reported from both OMPI v5.0 and main branches:
[ip-172-31-10-245:11280] Unable to extract peer [[26806,1],132] nodeid from the modex.
[ip-172-31-10-245:11280] Unable to extract peer [[26806,1],133] nodeid from the modex.
[ip-172-31-10-245:11280] Unable to extract peer [[26806,1],134] nodeid from the modex.
[ip-172-31-10-245:11280] Unable to extract peer [[26806,1],135] nodeid from the modex.
[ip-172-31-10-245:11280] Unable to extract peer [[26806,1],136] nodeid from the modex.
[ip-172-31-10-245:11280] Unable to extract peer [[26806,1],137] nodeid from the modex.
....
PRRTE is providing the nodeid for every proc in the job as part of the initial job info - it is therefore not included in the modex. However, I cannot find the location where this error message is emitted, and so I don't know the precise function call that generated it.
Could someone please provide me with further info as to how this error is generated?
The command executed is: mpirun -n 144 topology/distgraph1
, if that helps (remember, I do not have access to the ompi-tests repository)