Skip to content

Conversation

@ggouaillardet
Copy link
Contributor

in order to work around a bug in older gcc versions on x86_64,
__atomic_thread_fence (__ATOMIC_SEQ_CST)
was replaced with
__atomic_thread_fence (__ATOMIC_ACQUIRE)
based on the asumption that this did not introduce performance regressions.

It was recently found that this did introduce some performance regression,
mainly at scale on fat nodes.

So simply use an asm memory globber to both workaround older gcc bugs
and fix the performance regression.

Thanks S. Biplab Raut for bringing this issue to our attention.

Refs. #8603

Signed-off-by: Gilles Gouaillardet gilles@rist.or.jp

(cherry picked from commit d7e3f87)

in order to work around a bug in older gcc versions on x86_64,
__atomic_thread_fence (__ATOMIC_SEQ_CST)
was replaced with
__atomic_thread_fence (__ATOMIC_ACQUIRE)
based on the asumption that this did not introduce performance regressions.

It was recently found that this did introduce some performance regression,
mainly at scale on fat nodes.

So simply use an asm memory globber to both workaround older gcc bugs
and fix the performance regression.

Thanks S. Biplab Raut for bringing this issue to our attention.

Refs. open-mpi#8603

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@d7e3f87)
@ggouaillardet ggouaillardet added this to the v4.0.6 milestone Mar 16, 2021
@ggouaillardet ggouaillardet requested a review from hjelmn March 16, 2021 03:58
@jsquyres jsquyres changed the title gcc_builtin: fix performance regression on x86_64 v4.0.x: gcc_builtin: fix performance regression on x86_64 Mar 16, 2021
@jsquyres
Copy link
Member

bot:aws:retest

@hppritcha hppritcha added the NEWS label Mar 16, 2021
@hppritcha
Copy link
Member

bot:ompi:retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants