OMPI 4.0.1 TCP connection errors beyond 86 nodes

Thank you for taking the time to submit an issue!

## Background information
we have an OPA cluster of 288 nodes.  All nodes run same OS image, have passwordless ssh setup and firewall is disabled.  We run basic OSU osu_mbw_mr tests on 2, 4, ...86 nodes and tests complete successfully.  Once we hit 88+ nodes we get 

```
ORTE has lost communication with a remote daemon.

  HNP daemon   : [[63011,0],0] on node r1i2n13
  Remote daemon: [[63011,0],40] on node r1i3n17

This is usually due to either a failure of the TCP network
connection to the node, or possibly an internal failure of
the daemon itself. We cannot recover from this failure, and
therefore will terminate the job.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
ORTE does not know how to route a message to the specified daemon
located on the indicated node:

  my node:   r1i2n13
  target node:  r1i2n14

This is usually an internal programming error that should be
reported to the developers. In the meantime, a workaround may
be to set the MCA param routed=direct on the command line or
in your environment. We apologize for the problem.
```

### What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)
4.0.1


### Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

downloaded 4.0.1 from openmpi site, 

```
./configure --prefix=/store/dfaraj/SW/packages/ompi/4.0.1 CC=icc CXX=icpc FC=ifort --enable-orterun-prefix-by-default --enable-mpirun-prefix-by-default --with-psm2=/usr --without-verbs --without-psm --without-knem --without-slurm --without-ucx
```

### Please describe the system on which you are running

* Operating system/version: RH 7.6
* Computer hardware: dual socket Xeon nodes
* Network type: OPA 

-----------------------------

## Details of the problem

Please describe, in detail, the problem that you are having, including the behavior you expect to see, the actual behavior that you are seeing, steps to reproduce the problem, etc.  It is most helpful if you can attach a small program that a developer can use to reproduce your problem.

when we run:
n=86
```
mpirun -mca  -x PATH -x LD_LIBRARY_PATH  -np $((n)) -map-by ppr:1:node -hostfile myhosts ./osu_mbw_mr.ompi
```

it works fine, 
n=88
```
mpirun -mca  -x PATH -x LD_LIBRARY_PATH  -np $((n)) -map-by ppr:1:node -hostfile myhosts ./osu_mbw_mr.ompi
```

we get the tcp error described earlier.
if I do:
n=88
```
mpirun -mca  -x PATH -x LD_LIBRARY_PATH --mca  plm_rsh_no_tree_spawn 1 -np $((n)) -map-by ppr:1:node -hostfile myhosts ./osu_mbw_mr.ompi
```

it works.
if I set 
n=160
```
mpirun -mca  -x PATH -x LD_LIBRARY_PATH --mca  plm_rsh_no_tree_spawn 1 -np $((n)) -map-by ppr:1:node -hostfile myhosts ./osu_mbw_mr.ompi
```

it hangs, I dont think thu it is hanging, it is likely doing ssh to everyone and going so slow

**EDIT:** Put in proper verbatim markup

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OMPI 4.0.1 TCP connection errors beyond 86 nodes #6786

Background information

What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Please describe the system on which you are running

Details of the problem

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

OMPI 4.0.1 TCP connection errors beyond 86 nodes #6786

Description

Background information

What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Please describe the system on which you are running

Details of the problem

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions