Skip to content

Conversation

@hjelmn
Copy link
Member

@hjelmn hjelmn commented Aug 26, 2016

The xlc compiler seems to behave in a different way that gcc when it
comes the inline asm. There were two problems with the code with xlc:

  • The TOC read in mca_patcher_base_patch_hook used the syntax
    register unsigned long toc asm("r2") to read $r2 (the TOC
    pointer). With gcc this seems to behave as expected but with xlc
    the result in toc is not the same as $r2. I updated the code to use
    asm volatile ("std 2, %0" : "=m" (toc)) to load the TOC pointer.
  • The OPAL_PATCHER_BEGIN macro is meant to be the first thing in a
    hook. On PPC64 it loads the correct TOC pointer (thanks to
    mca_patcher_base_patch_hook) and saves the old one. The
    OPAL_PATCHER_END macro restores the TOC pointer. Because we need
    the TOC to be correct before it is accessed in the hook the
    OPAL_PATCHER_BEGIN macro MUST come first. We did this and all was
    well with gcc. With xlc on the other hand there was a TOC access
    before the assembly inserted by OPAL_PATCHER_BEGIN. To fix this
    quickly I broke each hook into a pair of function with the
    OPAL_PATCHER_* macros on the top level functions. This works around
    the issue but is not a clean way to fix this. In the future we
    should 1) either update overwrite to not need this, or 2) figure
    out why xlc is not inserting the asm before the first TOC read.

This fixes #1854

Signed-off-by: Nathan Hjelm hjelmn@lanl.gov

The xlc compiler seems to behave in a different way that gcc when it
comes the inline asm. There were two problems with the code with xlc:

 - The TOC read in mca_patcher_base_patch_hook used the syntax
   register unsigned long toc asm("r2") to read $r2 (the TOC
   pointer). With gcc this seems to behave as expected but with xlc
   the result in toc is not the same as $r2. I updated the code to use
   asm volatile ("std 2, %0" : "=m" (toc)) to load the TOC pointer.

 - The OPAL_PATCHER_BEGIN macro is meant to be the first thing in a
   hook. On PPC64 it loads the correct TOC pointer (thanks to
   mca_patcher_base_patch_hook) and saves the old one. The
   OPAL_PATCHER_END macro restores the TOC pointer. Because we *need*
   the TOC to be correct before it is accessed in the hook the
   OPAL_PATCHER_BEGIN macro MUST come first. We did this and all was
   well with gcc. With xlc on the other hand there was a TOC access
   before the assembly inserted by OPAL_PATCHER_BEGIN. To fix this
   quickly I broke each hook into a pair of function with the
   OPAL_PATCHER_* macros on the top level functions. This works around
   the issue but is not a clean way to fix this. In the future we
   should 1) either update overwrite to not need this, or 2) figure
   out why xlc is not inserting the asm before the first TOC read.

This fixes open-mpi#1854

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
@hjelmn
Copy link
Member Author

hjelmn commented Aug 26, 2016

@jjhursey Please double-check. This is running cleanly on my Power8 system with xlc with both --enable-static and without.

At some point we should probably get together with the xlc team and see why the assembly wasn't working as expected.

@hjelmn
Copy link
Member Author

hjelmn commented Aug 26, 2016

@jsquyres Ideally this should be in 2.0.1 but IBM could use this in their own internal version until we get out 2.0.2.

@jjhursey
Copy link
Member

I'll try to test on Monday.
@markalle Can you also take a look at this? I'm not as familiar with the patcher code as you might be.

@hjelmn
Copy link
Member Author

hjelmn commented Aug 27, 2016

Merging to let MTT have a crack at it.

@hjelmn hjelmn merged commit d33204b into open-mpi:master Aug 27, 2016
@PHHargrove
Copy link
Member

I will also try to retest at least three distinct ppc64 platforms this weekend if I have sufficient net access.

@PHHargrove
Copy link
Member

My testing consisted of building the most recent master tarball (openmpi-dev-4711-gd33204b), including:

  • configure
  • make
  • make check
  • make install
  • make clean
  • run "make" in the examples subdir, with new install in PATH and LD_LIBRARY_PATH
  • "mpirun -np 2 ./ring_c" with platform-appropriate "-mca btl FOO" and/or "-mca mtl BAR"

The following two both passed all of those steps, while the 2.0.1rc1 tarball got a SEGV in dlopen_test from "make check" (and thus didn't get as far as a ring_c test which the following ran fine):

Power8 little-endian
IBM XL C/C++ for Linux, V13.1.2 (5725-C73, 5765-J08)
Version: 13.01.0002.0000

Power7 big-endian
IBM XL C/C++ for Linux, V13.1
Version: 13.01.0000.0000

The following had not tested 2.0.1rc1 previously (but when tested today failed in a DIFFERENT way to be reported soon), but passed on master:

Power7 big-endian
IBM XL C/C++ for Blue Gene, V12.1
Version: 12.01.0000.0014
[NOTE: testing was on the Power7 front-end, NOT a Blue Gene compute node]

@PHHargrove
Copy link
Member

The following had not tested 2.0.1rc1 previously (but when tested today failed in a DIFFERENT way to be reported soon), but passed on master:

Power7 big-endian
IBM XL C/C++ for Blue Gene, V12.1
Version: 12.01.0000.0014
[NOTE: testing was on the Power7 front-end, NOT a Blue Gene compute node]

When 2.0.1rc is configured with --enable-builtin-atomics I can get past the "DIFFERENT" failure mentioned above, and reproduce my original SEGV from dlopen_test. So, this platform also represents one "fixed" on master while broken in 2.0.1rc1.

@jjhursey
Copy link
Member

I just tested master with the same environment/configuration as in #1854 - it passed all make check tests. HEAD of branch:
b0d86388 Merge pull request #2015 from jjhursey/topic/mixed-hostnames

I also tested the current v2.x branch and it passed as well. So that's odd as it had been failing before. HEAD of branch:
17bbb5c1 Merge pull request #1338 from hjelmn/v2.x_ugni

Both configured the same way:

./configure --prefix=/tmp/ompi/install CC=xlc CXX=xlC FC=xlf --disable-mpi-fortran

@hjelmn
Copy link
Member Author

hjelmn commented Aug 29, 2016

@jjhursey I couldn't reproduce the issue with make check. I was able to reproduce using a simple MPI application (ring_c).

@hjelmn
Copy link
Member Author

hjelmn commented Aug 29, 2016

@PHHargrove Thanks for testing! Looks like the fix worked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

dlopen_test crash with xlc on PPC in v2.x

3 participants