Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Conversation

@CarolEidt
Copy link

When there are nested calls, and there is a non-ptr on the stack below the last ptr popped by the inner call, the argHigh and argCnt values can get out of sync.

When there are nested calls, and there is a non-ptr on the stack below the last ptr popped by the inner call, the `argHigh` and `argCnt` values can get out of sync.
@CarolEidt
Copy link
Author

test Windows_NT x86 Checked gcstress0xc_minopts_heapverify1


if (hasPartialArgInfo)
{
// We always leave argHigh pointing to the next ptr arg.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this how it always was, or was this changed with RyuJIT?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not very familiar with this code, and had to do a fair bit of spelunking to figure out how this is supposed to work, but looking at the code above, it assumes that argHigh is always pointing to the next ptr arg. The code has been like this since at least 2013, and I'm guessing that this particular case (partially interruptible ebp frame, nested calls, and non-ptr arg(s) below the last popped arg of the inner call) is rare enough that this hasn't been hit.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, my plan was to do some gcstress testing on this, and then get reviews from people who might be more familiar with this code.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am familiar with the x86 GC encodings.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@briansull - that's great! Could you have a look?
As I suspected, for this same test on desktop, jit32 un-nests the calls, so that it doesn't run into this issue.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and in addition to the fix, it would be great if you could look at the comments I added and verify that they are correct. Figuring this out would have been a lot easier with more comments.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the code dealing with partialArgInfo was recently introduced on the CoreCLR side of the world:

19b1ce56f95 (Jonghyun Park   2017-04-11 07:49:08 +0900 2435)     bool      hasPartialArgInfo;
42924c80d75 (Jonghyun Park   2017-04-05 09:30:08 +0900 2436) 
42924c80d75 (Jonghyun Park   2017-04-05 09:30:08 +0900 2437) #ifndef UNIX_X86_ABI
19b1ce56f95 (Jonghyun Park   2017-04-11 07:49:08 +0900 2438)     hasPartialArgInfo = info->ebpFrame;
42924c80d75 (Jonghyun Park   2017-04-05 09:30:08 +0900 2439) #else
42924c80d75 (Jonghyun Park   2017-04-05 09:30:08 +0900 2440)     // For x86/Linux, interruptible code always has full arg info
42924c80d75 (Jonghyun Park   2017-04-05 09:30:08 +0900 2441)     //
42924c80d75 (Jonghyun Park   2017-04-05 09:30:08 +0900 2442)     // This should be aligned with emitFullArgInfo setting at
42924c80d75 (Jonghyun Park   2017-04-05 09:30:08 +0900 2443)     // emitter::emitEndCodeGen (in JIT)
19b1ce56f95 (Jonghyun Park   2017-04-11 07:49:08 +0900 2444)     hasPartialArgInfo = false;
42924c80d75 (Jonghyun Park   2017-04-05 09:30:08 +0900 2445) #endif

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but the logic was the same previously, it was just using info->ebpFrame directly instead of hasPartialArgInfo

@BruceForstall
Copy link

Is there a bug number this fixes? Existing or new repro case?

@BruceForstall
Copy link

ok, i'm guessing https://github.com/dotnet/coreclr/issues/17242

@CarolEidt
Copy link
Author

@BruceForstall yes, #17242 - sorry I forgot to mention it

@CarolEidt
Copy link
Author

test Windows_NT x86 Checked gcstress0xc
test Windows_NT x86 Checked gcstress0xc_jitstress1
test Windows_NT x86 Checked gcstress0xc_jitstress2
test Windows_NT x86 Checked gcstress0xc_zapdisable
test Windows_NT x86 Checked gcstress0xc_zapdisable_heapverify1
test Windows_NT x86 Checked gcstress0xc_zapdisable_jitstress2

1 similar comment
@CarolEidt
Copy link
Author

test Windows_NT x86 Checked gcstress0xc
test Windows_NT x86 Checked gcstress0xc_jitstress1
test Windows_NT x86 Checked gcstress0xc_jitstress2
test Windows_NT x86 Checked gcstress0xc_zapdisable
test Windows_NT x86 Checked gcstress0xc_zapdisable_heapverify1
test Windows_NT x86 Checked gcstress0xc_zapdisable_jitstress2

@CarolEidt CarolEidt closed this Mar 31, 2018
@CarolEidt CarolEidt reopened this Mar 31, 2018
@CarolEidt
Copy link
Author

test Windows_NT x86 Checked gcstress0xc
test Windows_NT x86 Checked gcstress0xc_jitstress1
test Windows_NT x86 Checked gcstress0xc_jitstress2
test Windows_NT x86 Checked gcstress0xc_zapdisable
test Windows_NT x86 Checked gcstress0xc_zapdisable_heapverify1
test Windows_NT x86 Checked gcstress0xc_zapdisable_jitstress2

@CarolEidt
Copy link
Author

@dotnet/jit-contrib @jkotas @janvorli - could I get a review from someone with good familiarity with GC encodings on x86? (Please suggest someone else if you have any suggestions)

@CarolEidt
Copy link
Author

The tests failures are basically the same as without this fix, except that the _il_relarray3 test now passes.

@CarolEidt
Copy link
Author

Is the GC infor correct for the nested call site?
I am a bit concern that the number of items pushed is not equal to the number of items pop-ed.

I believe it is correct. Here is how I read/understand the encoding for an ebp frame:

  • For push ptr n, it is specifying a push of a single ptr, along with offset n from base (not top) of arg stack. That is, if n is zero, it is the first argument that's been pushed.
  • For the pop, it is specifying how many GC ptrs are being popped. It knows from previous pushes where those are on the stack.

So, this is how I understand the GC info:

F1 BF 8C       | 0076        push ptr  1 (iptr)    // Push 1 ptr, which is 1 slot from base of stack
BF 9E          | 007C        push ptr  3 (iptr)    // Push 1 ptr, which is 3 slots from base of stack
F0 34          | 0088        reg ESI becoming dead
38             | 0088        reg EDI becoming dead
C8             | 0088        pop  1 ptrs           // Pop a ptr (i.e. the one that's at the 4th arg position. )
                      // *** This is where it needs to pop any non-ptr arg positions, to keep `argCnt`, `argHigh` and `ptrArgs` in sync
F4 CC          | 00B4        pop  1 ptrs           // Pop a ptr (i.e. the one that's at the second arg position)
FF

If this is not how it works, then perhaps someone could explain how it's supposed to work.

ptrArgTP ptrArgs(0); // The mask of stack values that contain pointers.
ptrArgTP iptrArgs(0); // The subset of ptrArgs that are interior pointers.
ptrArgTP argHigh(0); // The current mask position that corresponds to the top of the stack.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments look correct to me

@briansull
Copy link

briansull commented Apr 2, 2018

Yes, I did figure that the GC info was correct after I posted that.
Your comments are right, expect that we also have a 0xBF which makes each of the pointers be "interior" (aka byref) pointers rather than the typical GC ref pointers.

F1 BF 8C       | 0076        push ptr  1 (iptr)    // Push 1 interior ptr, which is 1 slot from base of stack
BF 9E          | 007C        push ptr  3 (iptr)    // Push 1 interior ptr, which is 3 slots from base of stack

Copy link

@briansull briansull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@CarolEidt
Copy link
Author

A little more info:

  • This test passes on desktop with jit32 because it flattens the nested calls.
  • However, it also passes on desktop with RyuJIT. After debugging this, I found that the sense of the first condition in the assert was (apparently inadvertently) flipped in [x86/Linux] Use GCInfo for funclet unwinding #10782.

In PR #10782, for x86, checks for info->ebpFrame (which is a bool) were replaced with isPartialArgInfo (then later with hasPartialArgInfo. However, the assert that's firing here:

        _ASSERTE(!hasPartialArgInfo    ||
                 isZero(argHigh)       ||
                (argHigh == CONSTRUCT_ptrArgTP(1, (argCnt-1))));

was previously

        _ASSERTE((info->ebpFrame != 0)    ||
                 isZero(argHigh)          ||
                (argHigh == CONSTRUCT_ptrArgTP(1, (argCnt-1))));

Which reversed the sense (why was checking a bool against 0 is another question...). However, it appears to me that the correct tracking of argCnt was the only thing preventing the assert from being universally applicable - and note that this is, AFAICT the only bug that has surfaced after reversing the condition. I would love some input from others on whether to:

My preference would be the latter, as I think it provides a small amount of additional sanity checking. However, tracking argCnt is not strictly necessary for correctness when there's a frame, since the position of the args is communicated by the pushes, and the pops need only pop the next arg.

@briansull
Copy link

briansull commented Apr 3, 2018

ebpFrame may be a bool today but int he past it was an unsigned char because otherwise packing of bits in the struct didn't work.

Also see this definition:

Note that this got mangled by the upgrade to auto-format our code.

struct InfoHdrSmall {
    unsigned char  prologSize;        // 0
    unsigned char  epilogSize;        // 1
    unsigned char  epilogCount : 3; // 2 [0:2]
    unsigned char  epilogAtEnd : 1; // 2 [3]
    unsigned char  ediSaved : 1; // 2 [4]      which callee-saved regs are pushed onto stack
    unsigned char  esiSaved : 1; // 2 [5]
    unsigned char  ebxSaved : 1; // 2 [6]
    unsigned char  ebpSaved : 1; // 2 [7]
    unsigned char  ebpFrame : 1; // 3 [0]      locals accessed relative to ebp
    unsigned char  interruptible : 1; // 3 [1]      is intr. at all points (except prolog/epilog), not just call-sites
    unsigned char  doubleAlign : 1; // 3 [2]      uses double-aligned stack (ebpFrame will be false)
    unsigned char  security : 1; // 3 [3]      has slot for security object
    unsigned char  handlers : 1; // 3 [4]      has callable handlers
    unsigned char  localloc : 1; // 3 [5]      uses localloc
    unsigned char  editNcontinue : 1; // 3 [6]      was JITed in EnC mode
    unsigned char  varargs : 1; // 3 [7]      function uses varargs calling convention
    unsigned char  profCallbacks : 1; // 4 [0]
    unsigned char  genericsContext : 1;//4 [1]      function reports a generics context parameter is present
    unsigned char  genericsContextIsMethodDesc : 1;//4[2]
    unsigned char  returnKind : 2; // 4 [4]  Available GcInfo v2 onwards, previously undefined 
    unsigned short argCount;          // 5,6        in bytes
    unsigned int   frameSize;         // 7,8,9,10   in bytes
    unsigned int   untrackedCnt;      // 11,12,13,14
    unsigned int   varPtrTableSize;   // 15.16,17,18

@briansull
Copy link

That assert on three conditions needs an explaination as to what it is checking for.

@briansull
Copy link

briansull commented Apr 3, 2018

On the Desktop the assert says:
Either we have an ebpFrame (which means that the assert does nothing for ebp based methods)
or when we have an esp based frame then

  1. argHigh is either zero (meaning no args are currently pushed)
    or
  2. argHigh is thw matching the bit for the current argCnt

@briansull
Copy link

briansull commented Apr 3, 2018

I believe that the old assert was wrong once nested ebp call sites were allowed. (i.e RyuJIT only)

So I believe that going with your latter choice is appropriate.

  • eliminate the first condition altogether (i.e. maintain the assert for either case of hasPartialArgInfo

@AndyAyersMS
Copy link
Member

The various x86 gcstress failures all look like pre-existing issues (you can x-ref vs list in #17330).

@CarolEidt
Copy link
Author

The various x86 gcstress failures all look like pre-existing issues

Right, I mention that above, but I'd like to see the arm tests pass, since it also uses the method I changed. Not sure why they're not completing

@CarolEidt
Copy link
Author

@janvorli - could you have a look at this?

@CarolEidt CarolEidt requested a review from janvorli April 5, 2018 21:22
@CarolEidt CarolEidt merged commit 73e10d2 into dotnet:master Apr 5, 2018
@CarolEidt CarolEidt deleted the Fix17242 branch April 5, 2018 22:22
CarolEidt added a commit to CarolEidt/coreclr that referenced this pull request Apr 5, 2018
This assert was formerly enabled only for the "full arg info case", i.e. when there was not a frame on x86. WIth PR dotnet#10782 the sense of the first condition was inadvertently changed. With PR dotnet#17363 the tracking of `argCnt` vs. `argHigh` was fixed so that the assert is correct for the "partial arg info" case.
The first condition should be removed to make it cover the "full argo info case" as well (as before dotnet#10782).
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants