btl/vader performance regression (that can be observed on fat nodes)

Following the discussion on the ML related to a performance regression between Open MPI 3.1.1 and 4.1.0,
here are my findings:

The performance regression can be observed on fat nodes with the fftw builtin benchmark and the following command line

```
mpirun --map-by core --rank-by core --bind-to core --mca pml ob1 --mca btl vader,self  ./mpi-bench -opatient -r1000 -s icf1000000
```

`git bisect` pointed me to open-mpi/ompi@7c8a1fb437

I ran this on a quad socket intel node with 18 cores per socket.
At 112 cores, the degradation is over 15%

![vader](https://user-images.githubusercontent.com/7233259/111064049-4c7c4a80-84f5-11eb-82ae-a14c9624349b.png)


If I read the commit message between the lines, the workaround that is needed for gcc < 6 was applied on all x86_64 based on the assumption it does not introduce any performance penalty.
The data above show otherwise, so we might want to improve this, for example by only applying the workaround for gcc < 6, or by **not** applying it for gcc >= 6 (depending on how we want to treat non GNU compilers)

@hjelmn could you please comment on this issue?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

btl/vader performance regression (that can be observed on fat nodes) #8603

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

btl/vader performance regression (that can be observed on fat nodes) #8603

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions