Skip to content

Adjust stack level while calculating offset for Vector.getElement #38311

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 27, 2020

Conversation

kunalspathak
Copy link
Member

@kunalspathak kunalspathak commented Jun 24, 2020

Include stack level while calculating the offset from stack pointer when emitting vmovss for getting element of SIMD intrinsic.

       3B7704       cmp      esi, dword ptr [edi+4]
       7339         jae      SHORT G_M33220_IG10
       C5FB5A44F708 vcvtsd2ss xmm0, dword ptr [edi+8*esi+8]
       83EC04       sub      esp, 4                            ; <--- stack adjustment
       C5FA110424   vmovss   dword ptr [esp], xmm0
       8D4604       lea      eax, [esi+4]
       C5FD10442430 vmovupd  ymm0, ymmword ptr[esp+2CH+04H]
       C5FD11442408 vmovupd  ymmword ptr[esp+04H+04H], ymm0
       C5FA104C8404 vmovss   xmm1, dword ptr [esp+4*eax+04H]   ; <--- doesn't include +0x4 because of stack adjustment above
       83EC04       sub      esp, 4
       C5FA110C24   vmovss   dword ptr [esp], xmm1
       E86DFAFFFF   call     Xunit.Assert:Equal(float,float)
       46           inc      esi
       EBA5         jmp      SHORT G_M33220_IG05

This started showing up in stress run because STRESS_DBL_ALN was set for the method which was forcing the stack to be stack pointer based rather than frame pointer.

Fixes: #36614

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 24, 2020
@kunalspathak kunalspathak marked this pull request as ready for review June 24, 2020 16:04
@kunalspathak kunalspathak changed the title Adjust stack level Adjust stack level while calculating offset for Vector.getElement Jun 24, 2020
@kunalspathak
Copy link
Member Author

@dotnet/jit-contrib

@BruceForstall BruceForstall self-requested a review June 24, 2020 21:27
bool isEBPbased;
unsigned offs = compiler->lvaFrameAddress(simdInitTempVarNum, &isEBPbased);

// Adjust the offset with stack level, if any.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could use a comment. I generally find it confusing to reason about the stack level, but I believe it's the case that 1) it will always be zero if there's a frame pointer (I was wondering why you weren't checking isEBPbased), and 2) it will always be zero when FEATURE_FIXED_OUT_ARGS is true. Can you confirm or correct?
Also, I'm wondering why this isn't needed further down when a temp is used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally, all local access is passed down to the emitter by storing the local number and offset within the local, and emitOutputSV does the appropriate offset translation using:

#if !FEATURE_FIXED_OUT_ARGS
        // Adjust the offset by the amount currently pushed on the CPU stack
        dsp += emitCurStackLvl;
#endif

It looks like the two cases that don't do this are the calls to lvaFrameAddress in this function, genSIMDIntrinsicGetItem.

So it seems like this is ok, but you need the #if !FEATURE_FIXED_OUT_ARGS case (for Linux x86 ABI) or prove Carol's #2. It would probably be better to only do the adjustment for the !isEBPbased case, for clarity.

Also, I think the op1 case above that calls lvaFrameAddress also needs this logic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Carol and Bruce for your valuable input. I thought about adding it under isEBPbased but I guess I was assuming that for EBP it will always be zero. I have instead added an assert to check if stacklevel is zero for isEBPbased and FEATURE_FIXED_OUT_ARGS.

Copy link
Member Author

@kunalspathak kunalspathak Jun 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, so just verified that for isEBPbased == true, genStackLevel can be non-zero. I was hitting previously added assert when crossgening System.Numerics.Plane:CreateFromVertices. I have now updated the assert(genStackLevel == 0) to only include when FEATURE_FIXED_OUT_ARGS is true. Below is the code that was getting generated when I hit the assert:

       83EC0C       sub      esp, 12
       F20F110424   movsd    qword ptr [esp], xmm0
       660F70C802   pshufd   xmm1, xmm0, 2
       F30F114C2408 movss    dword ptr [esp+08H], xmm1
       F30F108568FFFFFF movss    xmm0, dword ptr [ebp-98H]  ; <-- This line

In above highlighted line, offset returned by lvaFrameAddress was 0x98 and genStackLevel was 0n12 inside genSimdIntrinsicGetItem. Since it was isEBPBased, the stack level didn't get added to the offset 0x98 and everything now works fine. So what I can conclude is for EBP based, the varDsc->lvStkOff holds accurate value but that doesn't happen when ESP based?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what I can conclude is for EBP based, the varDsc->lvStkOff holds accurate value but that doesn't happen when ESP based?

Yes, that's right - because genStackLevel is relative to SP. EBP remains fixed, so the offset relative to that doesn't change.

Copy link
Contributor

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to add a test case or two to the coreclr tests, to cover this.

@kunalspathak
Copy link
Member Author

It would be nice to add a test case or two to the coreclr tests, to cover this.

I spent some time thinking about how to add a test for this. From my understanding, this only happens if isEBPBased == false and it is set to false here or here inside setFrameType(). We need to have doDoubleAlign == true in order for that to happen (done at the top of setFrameType()). doDoubleAlign is set to true here and for that, getCanDoubleAlign here should return MUST_DOUBLE_ALIGN. This only happens if JITStress or JitDoubleAlign is set. I tried setting JitDoubleAlign to set (and reverting my fix). Unfortunately we don't repro the issue. One of the key thing of this issue is the dest array whose element is wrongly loaded is displaced by 4 bytes as well. So we add that as well while calculating the element address. (esp+4*eax+04H , the 04H is the displacement). I don't see such displacement even if I set JitDoubleAlign but JitStress is not set. Additionally, there are other code diffs that might not make the bug easily reproducible. While debugging this issue, minor modification to source code was making it reproducible. Let me know if you have any thoughts on including a test. Also, the existing test for NarrowDouble with JitStress should exercise this code path, right?

@kunalspathak
Copy link
Member Author

I was able to come up with standalone case. Earlier I couldn't repro it because I was not using particular stress modes needed to repro it.

@kunalspathak kunalspathak merged commit 4a08419 into dotnet:master Jun 27, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Test failure: System.Numerics.Tests.GenericVectorTests.NarrowDouble
4 participants