Skip to content

noinline to avoid compiler reading TOC before PATCHER_BEGIN #7799

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 17, 2020

Conversation

markalle
Copy link
Contributor

@markalle markalle commented Jun 9, 2020

This bug was first seen in a different product that's using the same
interception code as OMPI. But I think it's potentially in OMPI too.

In my vanilla build of OMPI master on RH8 if I "gdb libopen-pal.so" and
"disassemble intercept_brk", I'm seeing a suspicious extra instruction
in front of PATCHER_BEGIN:

   0x00000000000d6778 <+40>:    std     r2,24(r1) // something gcc put in front
   0x00000000000d677c <+44>:    std     r2,96(r1) // PATCHER_BEGIN's toc_save
   0x00000000000d6780 <+48>:    nop               // NOPs from PATCHER_BEGIN
   0x00000000000d6784 <+52>:    nop               // that get replaced
   0x00000000000d6788 <+56>:    nop               // by instructions that
   0x00000000000d678c <+60>:    nop               // change r2
   0x00000000000d6790 <+64>:    nop               //

Later there are loads from that location like
0x000000000019e0e4 <+132>: ld r2,24(r1)
that make me nervous since that's the pre-updated value.

I believe this is the same thing Nathan is describing way back in a9bc692
and his solution was to put a second call around each interception, where
the outer call is just

    intercept_brk():
        PATCHER_BEGIN
        _intercept_brk()
        PATCHER_END

and the inner call _intercept_brk() is where the bulk of the code goes.

What I'm seeing is that _intercept_brk() is being inlined and probably
negating Nathan's fix. So I want to add __opal_attribute_noinline__ to
restore the fix.

With this commit in place, the disassembly of intercept_brk becomes tiny
because it's no longer inlining intercept_brk() and the susipicious
early save of r2 is gone. I made the same fix to all the intercept
*
functions, although intercept_brk was the only one that had a suspicious
save of r2.

As far as empirical failures though, we only have those from the non-OMPI
product that's using the same patcher code. I'm not actually getting OMPI
to fail from the above suspicious data being saved in r1+24.

Signed-off-by: Mark Allen markalle@us.ibm.com

This bug was first seen in a different product that's using the same
interception code as OMPI.  But I think it's potentially in OMPI too.

In my vanilla build of OMPI master on RH8 if I "gdb libopen-pal.so" and
"disassemble intercept_brk", I'm seeing a suspicious extra instruction
in front of PATCHER_BEGIN:
   0x00000000000d6778 <+40>:    std     r2,24(r1) // something gcc put in front
   0x00000000000d677c <+44>:    std     r2,96(r1) // PATCHER_BEGIN's toc_save
   0x00000000000d6780 <+48>:    nop               // NOPs from PATCHER_BEGIN
   0x00000000000d6784 <+52>:    nop               // that get replaced
   0x00000000000d6788 <+56>:    nop               // by instructions that
   0x00000000000d678c <+60>:    nop               // change r2
   0x00000000000d6790 <+64>:    nop               //

Later there are loads from that location like
   0x000000000019e0e4 <+132>:   ld      r2,24(r1)
that make me nervous since that's the pre-updated value.

I believe this is the same thing Nathan is describing way back in a9bc692
and his solution was to put a second call around each interception, where
the outer call is just
    intercept_brk():
        PATCHER_BEGIN
        _intercept_brk()
        PATCHER_END
and the inner call _intercept_brk() is where the bulk of the code goes.

What I'm seeing is that _intercept_brk() is being inlined and probably
negating Nathan's fix.  So I want to add __opal_attribute_noinline__ to
restore the fix.

With this commit in place, the disassembly of intercept_brk becomes tiny
because it's no longer inlining _intercept_brk() and the susipicious
early save of r2 is gone.  I made the same fix to all the intercept_*
functions, although intercept_brk was the only one that had a suspicious
save of r2.

As far as empirical failures though, we only have those from the non-OMPI
product that's using the same patcher code.  I'm not actually getting OMPI
to fail from the above suspicious data being saved in r1+24.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
@gpaulsen
Copy link
Member

@markalle Should this block the v4.0.4 release? We were planning to put this out tomorrow. How hard have you tried to make OMPI fail? If you're not able to, while it'd be better to define these functions as no-inline, i'm not sure that this is a blocker. What is your opinion?

@markalle
Copy link
Contributor Author

Considering I don't have any failures from it I can't really declare it to be too high a priority. I'm still seeing if I can whittle down what leads to the actual failure from the non-OMPI usage and then see if I can make that fail using the OMPI patcher

@jsquyres
Copy link
Member

@hjelmn replied to me in email that this PR looks good to him, but he doesn't have easy access to a PPC machine to test.

@jsquyres
Copy link
Member

@gpaulsen mentioned to me in Slack that the issue can occur when you compile with -O3. If that's correct, given that the user can manually specify -O3 (e.g., ./configure CFLAGS=-O3 ...), it might be a good idea to get this into upcoming releases.

@jsquyres
Copy link
Member

For the history: there was more discussion off-issue between @markalle @gpaulsen and @hppritcha, and the conclusion was that this was actually a low risk item (i.e., as @markalle stated in the description, they saw this problem in another product and just brought the fix over to OMPI. They have not (yet?) been able to actually reproduce the problem in OMPI).

@gpaulsen gpaulsen merged commit 692f96e into open-mpi:master Jun 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants