v4.0.x: gcc_builtin: fix performance regression on x86_64 #8624

ggouaillardet · 2021-03-16T03:58:14Z

in order to work around a bug in older gcc versions on x86_64,
__atomic_thread_fence (__ATOMIC_SEQ_CST)
was replaced with
__atomic_thread_fence (__ATOMIC_ACQUIRE)
based on the asumption that this did not introduce performance regressions.

It was recently found that this did introduce some performance regression,
mainly at scale on fat nodes.

So simply use an asm memory globber to both workaround older gcc bugs
and fix the performance regression.

Thanks S. Biplab Raut for bringing this issue to our attention.

Refs. #8603

Signed-off-by: Gilles Gouaillardet gilles@rist.or.jp

(cherry picked from commit d7e3f87)

in order to work around a bug in older gcc versions on x86_64, __atomic_thread_fence (__ATOMIC_SEQ_CST) was replaced with __atomic_thread_fence (__ATOMIC_ACQUIRE) based on the asumption that this did not introduce performance regressions. It was recently found that this did introduce some performance regression, mainly at scale on fat nodes. So simply use an asm memory globber to both workaround older gcc bugs and fix the performance regression. Thanks S. Biplab Raut for bringing this issue to our attention. Refs. open-mpi#8603 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> (cherry picked from commit open-mpi/ompi@d7e3f87)

jsquyres · 2021-03-16T14:16:27Z

bot:aws:retest

hppritcha · 2021-03-16T15:43:04Z

bot:ompi:retest

ggouaillardet added the Target: v4.0.x label Mar 16, 2021

ggouaillardet added this to the v4.0.6 milestone Mar 16, 2021

ggouaillardet requested a review from hjelmn March 16, 2021 03:58

hjelmn approved these changes Mar 16, 2021

View reviewed changes

jsquyres changed the title ~~gcc_builtin: fix performance regression on x86_64~~ v4.0.x: gcc_builtin: fix performance regression on x86_64 Mar 16, 2021

hppritcha added the NEWS label Mar 16, 2021

haampie mentioned this pull request Mar 17, 2021

fixing the perf regression issues with OpenMPI v4.0.x till v4.1.0 for x86_64 spack/spack#22350

Merged

hppritcha merged commit cadeb38 into open-mpi:v4.0.x Mar 22, 2021

awlauria mentioned this pull request Jun 29, 2021

Implementation of opal atomic barriers in x86_64 #8532

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v4.0.x: gcc_builtin: fix performance regression on x86_64 #8624

v4.0.x: gcc_builtin: fix performance regression on x86_64 #8624

Uh oh!

ggouaillardet commented Mar 16, 2021

Uh oh!

jsquyres commented Mar 16, 2021

Uh oh!

hppritcha commented Mar 16, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

v4.0.x: gcc_builtin: fix performance regression on x86_64 #8624

v4.0.x: gcc_builtin: fix performance regression on x86_64 #8624

Uh oh!

Conversation

ggouaillardet commented Mar 16, 2021

Uh oh!

jsquyres commented Mar 16, 2021

Uh oh!

hppritcha commented Mar 16, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants