Adding support for the 2-byte VEX encoding to the emitter #21453

tannergooding · 2018-12-09T05:35:11Z

This updates the emitter to support the 2-byte VEX encoding. It does this by one simple change, which is updating emitOutputRexOrVexPrefixIfNeeded to check if all the requirements are met when it tries to emit the prefix.

Resolves https://github.com/dotnet/coreclr/issues/19746

tannergooding · 2018-12-09T05:35:52Z

CC. @CarolEidt, @fiigii

tannergooding · 2018-12-09T07:00:35Z

src/jit/emitxarch.cpp

        //
        // Now output VEX prefix leaving the 4-byte opcode
+
+        if ((vexPrefix & 0xFFFF7F80) == 0x00C46100)


I had initially started doing this by updating AddVexPrefix to support both the 2 and 3-byte encodings and additionally adding a Supports2ByteVexPrefix flag to the instruction metadata in instrsxarch.h. However, that was leading to a cascade of changes and required knowledge of things like: "is the instruction using any extended registers" to be pre-checked and carried around.

I ended up realizing that, at the point we actually emit the encoding, we have all the relevant information and can just make the decision there, so this ended up being significantly simpler to support than initially expected.

That's really great!

Shall we add several assertions in this 2-byte VEX encoding section? e.g.,

assert(!TakesRexWPrefix(ins, attr)); assert(!IsExtendedReg(reg));

At this point, we don't have access to the attr or the reg information, so we can't readily do either of these asserts.

I do think it would be useful to have these types of asserts elsewhere, but I think the only place we logically have all the information is in emitDispIns, where we have both the instrDesc and the emitted code bytes (but emitDispIns needs some of its own cleanup as well). I think this would also be a useful place to have asserts like that INS_mov_i2xmm doesn't have an attr of EA_16BYTE (since one register is implied 16-byte and the attr needs to be used to determine if we are 32-bit or 64-bit instead).

@CarolEidt, thoughts?

In general, I think it's a bad idea to have asserts in emitDispIns, at least those that don't directly relate to displaying the instruction. This is because we don't want to have new asserts appearing in a dump or disassembly. Better would be to add an emitCheckIns or something, that can always be invoked on the instruction when DEBUG is set.

I've logged https://github.com/dotnet/coreclr/issues/21472 to track adding such a function. If no-one has any objections, I think it would be good to handle that in a separate PR (as we would already be in a broken state if the W-bit wasn't set and it should have been at this point, among other checks that emitCheckIns could do).

I think it would be good to handle that in a separate PR

It is okay. I just worried about potential corner case failures, as this is a very fundamental change that impacts all the floating point and SIMD code.

I think it is reasonable, and probably preferable, to do it in a separate PR.

tannergooding · 2018-12-09T07:22:22Z

@dotnet-bot test Windows_NT x64 Checked jitincompletehwintrinsic
@dotnet-bot test Windows_NT x64 Checked jitx86hwintrinsicnoavx
@dotnet-bot test Windows_NT x64 Checked jitx86hwintrinsicnoavx2
@dotnet-bot test Windows_NT x64 Checked jitx86hwintrinsicnosimd
@dotnet-bot test Windows_NT x64 Checked jitnox86hwintrinsic

@dotnet-bot test Windows_NT x86 Checked jitincompletehwintrinsic
@dotnet-bot test Windows_NT x86 Checked jitx86hwintrinsicnoavx
@dotnet-bot test Windows_NT x86 Checked jitx86hwintrinsicnoavx2
@dotnet-bot test Windows_NT x86 Checked jitx86hwintrinsicnosimd
@dotnet-bot test Windows_NT x86 Checked jitnox86hwintrinsic

@dotnet-bot test Ubuntu x64 Checked jitincompletehwintrinsic
@dotnet-bot test Ubuntu x64 Checked jitx86hwintrinsicnoavx
@dotnet-bot test Ubuntu x64 Checked jitx86hwintrinsicnoavx2
@dotnet-bot test Ubuntu x64 Checked jitx86hwintrinsicnosimd
@dotnet-bot test Ubuntu x64 Checked jitnox86hwintrinsic

tannergooding · 2018-12-09T07:28:36Z

Found 1 files with textual diffs.
PMI Diffs for System.Private.CoreLib.dll for x64 default jit
Summary:
(Lower is better)
Total bytes of diff: -19566 (-0.63% of base)
    diff is an improvement.
Top file improvements by size (bytes):
      -19566 : System.Private.CoreLib.dasm (-0.63% of base)
1 total files with size differences (1 improved, 0 regressed), 0 unchanged.
Top method improvements by size (bytes):
       -2064 (-9.16% of base) : System.Private.CoreLib.dasm - Avx2:ShiftLeftLogical128BitLane(struct,ubyte):struct (8 methods)
       -2064 (-9.16% of base) : System.Private.CoreLib.dasm - Avx2:ShiftRightLogical128BitLane(struct,ubyte):struct (8 methods)
        -516 (-9.14% of base) : System.Private.CoreLib.dasm - Avx:Shuffle(struct,struct,ubyte):struct (2 methods)
        -516 (-9.16% of base) : System.Private.CoreLib.dasm - Avx2:Shuffle(struct,ubyte):struct (2 methods)
        -516 (-9.16% of base) : System.Private.CoreLib.dasm - Avx2:ShuffleHigh(struct,ubyte):struct (2 methods)
Top method improvements by size (percentage):
          -8 (-14.81% of base) : System.Private.CoreLib.dasm - CalendricalCalculationsHelper:Angle(int,int,double):double
          -2 (-12.50% of base) : System.Private.CoreLib.dasm - Double:System.IConvertible.ToSingle(ref):float:this
          -2 (-12.50% of base) : System.Private.CoreLib.dasm - Single:System.IConvertible.ToDouble(ref):double:this
          -3 (-12.50% of base) : System.Private.CoreLib.dasm - Vector128:Create(double):struct
          -3 (-12.50% of base) : System.Private.CoreLib.dasm - Vector128:CreateScalar(double):struct
1607 total methods with size differences (1607 improved, 0 regressed), 12616 unchanged.

tannergooding · 2018-12-09T07:58:37Z

@dotnet-bot test Windows_NT x64 Checked gcstress0x3
@dotnet-bot test Windows_NT x64 Checked gcstress0xc
@dotnet-bot test Windows_NT x64 Checked jitstress1
@dotnet-bot test Windows_NT x64 Checked jitstress2
@dotnet-bot test Windows_NT x64 Checked jitstressregs1
@dotnet-bot test Windows_NT x64 Checked jitstressregs2
@dotnet-bot test Windows_NT x64 Checked jitstressregs3
@dotnet-bot test Windows_NT x64 Checked jitstressregs4
@dotnet-bot test Windows_NT x64 Checked jitstressregs8
@dotnet-bot test Windows_NT x64 Checked jitstressregs0x10
@dotnet-bot test Windows_NT x64 Checked jitstressregs0x80
@dotnet-bot test Windows_NT x64 Checked jitstressregs0x1000

@dotnet-bot test Ubuntu x64 Checked gcstress0x3
@dotnet-bot test Ubuntu x64 Checked gcstress0xc
@dotnet-bot test Ubuntu x64 Checked jitstress1
@dotnet-bot test Ubuntu x64 Checked jitstress2
@dotnet-bot test Ubuntu x64 Checked jitstressregs1
@dotnet-bot test Ubuntu x64 Checked jitstressregs2
@dotnet-bot test Ubuntu x64 Checked jitstressregs3
@dotnet-bot test Ubuntu x64 Checked jitstressregs4
@dotnet-bot test Ubuntu x64 Checked jitstressregs8
@dotnet-bot test Ubuntu x64 Checked jitstressregs0x10
@dotnet-bot test Ubuntu x64 Checked jitstressregs0x80
@dotnet-bot test Ubuntu x64 Checked jitstressregs0x1000

tannergooding · 2018-12-09T08:01:41Z

Found 1225 files with textual diffs.
PMI Diffs for assemblies in E:\Users\tagoo\Repos\coreclr\bin\tests\Windows_NT.x64.Release for x64 default jit
Summary:
(Lower is better)
Total bytes of diff: -924277 (-1.28% of base)
    diff is an improvement.
Top file improvements by size (bytes):
      -65739 : JIT\Methodical\fp\exgen\10w5d_cs_d\10w5d_cs_d.dasm (-4.32% of base)
      -65739 : JIT\Methodical\fp\exgen\10w5d_cs_r\10w5d_cs_r.dasm (-4.32% of base)
      -48763 : JIT\HardwareIntrinsics\X86\Avx2\Avx2_r\Avx2_r.dasm (-1.64% of base)
      -45744 : JIT\Methodical\fp\exgen\10w5d_cs_do\10w5d_cs_do.dasm (-4.54% of base)
      -43925 : JIT\HardwareIntrinsics\X86\Avx2\Avx2_ro\Avx2_ro.dasm (-2.51% of base)
1225 total files with size differences (1225 improved, 0 regressed), 1854 unchanged.
Top method improvements by size (bytes):
       -4543 (-4.57% of base) : JIT\Methodical\fp\exgen\1000w1d_cs_d\1000w1d_cs_d.dasm - testout1:Func_0():int
       -4543 (-4.57% of base) : JIT\Methodical\fp\exgen\1000w1d_cs_r\1000w1d_cs_r.dasm - testout1:Func_0():int
       -4389 (-5.33% of base) : JIT\HardwareIntrinsics\X86\Avx2\Avx2_ro\Avx2_ro.dasm - TestStruct:Create():struct (261 methods)
       -4337 (-2.54% of base) : JIT\HardwareIntrinsics\X86\Avx2\Avx2_r\Avx2_r.dasm - TestStruct:Create():struct (261 methods)
       -3805 (-5.15% of base) : JIT\Methodical\fp\exgen\1000w1d_cs_do\1000w1d_cs_do.dasm - testout1:Func_0():int
Top method improvements by size (percentage):
         -13 (-17.33% of base) : JIT\Performance\CodeQuality\SIMD\RayTracer\RayTracer\RayTracer.dasm - Color:.ctor(double,double,double):this
         -13 (-17.33% of base) : JIT\Performance\CodeQuality\SIMD\RayTracer\RayTracer\RayTracer.dasm - Vector:.ctor(double,double,double):this
         -11 (-17.19% of base) : JIT\Regression\JitBlue\GitHub_15237\GitHub_15237\GitHub_15237.dasm - QuaternionStruct:LengthSquaredNoVectors():float:this
         -12 (-16.90% of base) : JIT\Performance\CodeQuality\SIMD\RayTracer\RayTracer\RayTracer.dasm - Vector:set_Y(float):this
         -13 (-16.46% of base) : JIT\Regression\JitBlue\GitHub_19022\GitHub_19022\GitHub_19022.dasm - Item:.ctor():this
53747 total methods with size differences (53747 improved, 0 regressed), 174271 unchanged.

tannergooding · 2018-12-09T08:09:08Z

Found 103 files with textual diffs.
PMI Diffs for System.Private.CoreLib.dll, framework assemblies for x64 default jit
Summary:
(Lower is better)
Total bytes of diff: -204516 (-0.55% of base)
    diff is an improvement.
Top file improvements by size (bytes):
      -48566 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (-1.65% of base)
      -23358 : Microsoft.CodeAnalysis.VisualBasic.dasm (-0.42% of base)
      -23064 : Microsoft.CodeAnalysis.CSharp.dasm (-0.53% of base)
      -19566 : System.Private.CoreLib.dasm (-0.63% of base)
      -17080 : System.Linq.Parallel.dasm (-1.55% of base)
103 total files with size differences (103 improved, 0 regressed), 26 unchanged.
Top method improvements by size (bytes):
       -2322 (-3.47% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - CtfTraceEventSource:InitEventMap():ref
       -2064 (-9.16% of base) : System.Private.CoreLib.dasm - Avx2:ShiftLeftLogical128BitLane(struct,ubyte):struct (8 methods)
       -2064 (-9.16% of base) : System.Private.CoreLib.dasm - Avx2:ShiftRightLogical128BitLane(struct,ubyte):struct (8 methods)
       -1956 (-1.65% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - ApplicationServerTraceEventParser:EnumerateTemplates(ref,ref):this
       -1221 (-2.56% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - KernelTraceEventParser:EnumerateTemplates(ref,ref):this
Top method improvements by size (percentage):
         -26 (-17.45% of base) : System.Numerics.Vectors.dasm - Quaternion:Normalize(struct):struct
         -12 (-17.39% of base) : System.Numerics.Vectors.dasm - Quaternion:Length():float:this
         -11 (-17.19% of base) : System.Numerics.Vectors.dasm - Quaternion:LengthSquared():float:this
          -9 (-16.67% of base) : System.Numerics.Vectors.dasm - Vector4:.ctor(float):this
         -31 (-16.67% of base) : System.Numerics.Vectors.dasm - Quaternion:Inverse(struct):struct
21736 total methods with size differences (21736 improved, 0 regressed), 167232 unchanged.

tannergooding · 2018-12-10T03:30:12Z

PR Stress test failures look to be generally the same as appears in the normal stress test runs. A couple of them are HWIntrinsic and should be investigated further

danmoseley · 2018-12-10T17:29:05Z

Nice work !! If I understand right this reduces our footprint on customer disks by 200KB?

CarolEidt

LGTM, with a requested comment enhancement.

CarolEidt · 2018-12-10T17:51:41Z

src/jit/emitxarch.cpp

        //
        // Now output VEX prefix leaving the 4-byte opcode
+
+        if ((vexPrefix & 0xFFFF7F80) == 0x00C46100)


That's really great!

CarolEidt · 2018-12-10T17:53:20Z

src/jit/emitxarch.cpp

+            // (the 2-byte VEX encoding only supports the 0x0F leading byte). When these
+            // conditions are met, we can change byte-0 from 0xC4 to 0xC5 and then
+            // byte-1 is the logical-or of bit 7 from byte-1 and bits 0-6 from byte 2
+            // from the 3-byte VEX encoding.


I might move this comment up above the check for the condition, and break down the "magic" constants that it uses so it's clearer that the condition matches the comment.

Can do. I'm also going to wait to merge this until after @fiigii can review as well, as he may be able to double-check that I didn't miss some obscurity here.

Moved above and modified to be:

// The 2-byte VEX encoding, requires that the X and B-bits are set (these // bits are inverted from the REX values so set means off), the W-bit is // not set (this bit is not inverted), and that the m-mmmm bits are 0-0001 // (the 2-byte VEX encoding only supports the 0x0F leading byte). When these // conditions are met, we can change byte-0 from 0xC4 to 0xC5 and then // byte-1 is the logical-or of bit 7 from byte-1 and bits 0-6 from byte 2 // from the 3-byte VEX encoding. // // Given the above, the check can be reduced to a simple mask and comparison. // * 0xFFFF7F80 is a mask that ignores any bits whose value we don't care about: // * R can be set or unset (0x7F ignores bit 7) // * vvvv can be any value (0x80 ignores bits 3-6) // * L and be set or unset (0x80 ignores bit 2) // * pp can be any value (0x80 ignores bits 0-1) // * 0x00C46100 is a value that signifies the requirements listed above were met: // * We must be a three-byte VEX opcode (0x00C4) // * X and B must be set (0x61 validates bits 5-6) // * m-mmmm must be 0-00001 (0x61 validates bits 0-4) // * W must be unset (0x00 validates bit 7)

fiigii · 2018-12-10T18:25:21Z

If I understand right this reduces our footprint on customer disks by 200KB?

@danmosemsft Probably not. Crossgen only uses SSE encoding for hardware compatibility, so this work reduces our memory footprint on JITed managed code.

src/jit/emitxarch.cpp

CarolEidt · 2018-12-10T19:25:06Z

src/jit/emitxarch.cpp

        //
        // Now output VEX prefix leaving the 4-byte opcode
+
+        if ((vexPrefix & 0xFFFF7F80) == 0x00C46100)


In general, I think it's a bad idea to have asserts in emitDispIns, at least those that don't directly relate to displaying the instruction. This is because we don't want to have new asserts appearing in a dump or disassembly. Better would be to add an emitCheckIns or something, that can always be invoked on the instruction when DEBUG is set.

…encoding check

briansull

Looks Good

* Adding support for the 2-byte VEX encoding to the emitter * Relocating and expanding the comment which explains the two-byte VEX encoding check

…eclr#21453) * Adding support for the 2-byte VEX encoding to the emitter * Relocating and expanding the comment which explains the two-byte VEX encoding check Commit migrated from dotnet/coreclr@9fe1570

Adding support for the 2-byte VEX encoding to the emitter

aeb1db8

tannergooding commented Dec 9, 2018

View reviewed changes

tannergooding changed the title ~~[WIP] Adding support for the 2-byte VEX encoding to the emitter~~ Adding support for the 2-byte VEX encoding to the emitter Dec 9, 2018

CarolEidt approved these changes Dec 10, 2018

View reviewed changes

CarolEidt reviewed Dec 10, 2018

View reviewed changes

Relocating and expanding the comment which explains the two-byte VEX …

ba76a85

…encoding check

briansull approved these changes Dec 11, 2018

View reviewed changes

tannergooding merged commit 9fe1570 into dotnet:master Dec 11, 2018

morganbr pushed a commit that referenced this pull request Dec 14, 2018

Adding support for the 2-byte VEX encoding to the emitter (#21453)

fb13d9c

* Adding support for the 2-byte VEX encoding to the emitter * Relocating and expanding the comment which explains the two-byte VEX encoding check

tannergooding mentioned this pull request Jan 31, 2020

Consider adding an emitCheckIns that does debug only validation on the instrDesc and emitted bytes dotnet/runtime#11642

Open

kunalspathak mentioned this pull request Dec 8, 2020

JIT: instruction size mis-estimates can lead to CHK/REL diffs in codegen dotnet/runtime#8748

Closed

Adding support for the 2-byte VEX encoding to the emitter #21453

Adding support for the 2-byte VEX encoding to the emitter #21453

Uh oh!

Conversation

tannergooding commented Dec 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tannergooding commented Dec 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fiigii Dec 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tannergooding commented Dec 9, 2018

Uh oh!

tannergooding commented Dec 9, 2018

Uh oh!

tannergooding commented Dec 9, 2018

Uh oh!

tannergooding commented Dec 9, 2018

Uh oh!

tannergooding commented Dec 9, 2018

Uh oh!

tannergooding commented Dec 10, 2018

Uh oh!

danmoseley commented Dec 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CarolEidt left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fiigii commented Dec 10, 2018

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

briansull left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

tannergooding commented Dec 9, 2018 •

edited

Loading

tannergooding commented Dec 9, 2018 •

edited

Loading

fiigii Dec 10, 2018 •

edited

Loading

danmoseley commented Dec 10, 2018 •

edited

Loading