-
Notifications
You must be signed in to change notification settings - Fork 934
Description
Thank you for taking the time to submit an issue!
Background information
I'm packaging openmpi 2.1 for SUSE and end up hitting a bug (probably GCC's fault) in the test suite
What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)
OpenMPI 2.1.0
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
Built using tarball from the website
Please describe the system on which you are running
-
Operating system/version:
openSUSE_Tumbleweed
gcc (SUSE Linux) 6.3.1 20170202 [gcc-6-branch revision 245119] -
Computer hardware:
i586 -
Network type:
Details of the problem
Running the opal_fifo test (through make check) stalls.
After some debugging, it appears we end up into a broken loop here (opal_fifo.h:262)
if (!opal_atomic_cmpset_ptr (&fifo->opal_fifo_tail.data.item, item, &fifo->opal_fifo_ghost)) {
while (&fifo->opal_fifo_ghost == item->opal_list_next) {
opal_atomic_rmb ();
}
Looking into the generated assembly, it looks something like this
=> 0x08049675 <+357>: cmp %edi,%eax
0x08049677 <+359>: je 0x8049675 <thread_test+357>
which unless I'm mistaken means that GCC cached the value and doesn't load them from memory anymore.
The rmb used comes from gcc builtins.
Simply adding this:
diff --git a/opal/include/opal/sys/gcc_builtin/atomic.h b/opal/include/opal/sys/gcc_builtin/atomic.h
index 82b75f47d8..eea743503c 100644
--- a/opal/include/opal/sys/gcc_builtin/atomic.h
+++ b/opal/include/opal/sys/gcc_builtin/atomic.h
@@ -51,6 +51,9 @@ static inline void opal_atomic_mb(void)
static inline void opal_atomic_rmb(void)
{
+#if OPAL_ASSEMBLY_ARCH == OPAL_IA32
+ __asm__ __volatile__("": : :"memory");
+#endif
__atomic_thread_fence (__ATOMIC_ACQUIRE);
}
fixes the issue.
This really seems like a GCC bug, but I figured it might be worth notifying you.