Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Conversation

@tannergooding
Copy link
Member

@tannergooding tannergooding commented Dec 9, 2018

This updates the emitter to support the 2-byte VEX encoding. It does this by one simple change, which is updating emitOutputRexOrVexPrefixIfNeeded to check if all the requirements are met when it tries to emit the prefix.

Resolves https://github.com/dotnet/coreclr/issues/19746

@tannergooding
Copy link
Member Author

tannergooding commented Dec 9, 2018

CC. @CarolEidt, @fiigii

//
// Now output VEX prefix leaving the 4-byte opcode

if ((vexPrefix & 0xFFFF7F80) == 0x00C46100)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had initially started doing this by updating AddVexPrefix to support both the 2 and 3-byte encodings and additionally adding a Supports2ByteVexPrefix flag to the instruction metadata in instrsxarch.h. However, that was leading to a cascade of changes and required knowledge of things like: "is the instruction using any extended registers" to be pre-checked and carried around.

I ended up realizing that, at the point we actually emit the encoding, we have all the relevant information and can just make the decision there, so this ended up being significantly simpler to support than initially expected.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's really great!

Copy link

@fiigii fiigii Dec 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add several assertions in this 2-byte VEX encoding section? e.g.,

assert(!TakesRexWPrefix(ins, attr));
assert(!IsExtendedReg(reg));

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, we don't have access to the attr or the reg information, so we can't readily do either of these asserts.

I do think it would be useful to have these types of asserts elsewhere, but I think the only place we logically have all the information is in emitDispIns, where we have both the instrDesc and the emitted code bytes (but emitDispIns needs some of its own cleanup as well). I think this would also be a useful place to have asserts like that INS_mov_i2xmm doesn't have an attr of EA_16BYTE (since one register is implied 16-byte and the attr needs to be used to determine if we are 32-bit or 64-bit instead).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I think it's a bad idea to have asserts in emitDispIns, at least those that don't directly relate to displaying the instruction. This is because we don't want to have new asserts appearing in a dump or disassembly. Better would be to add an emitCheckIns or something, that can always be invoked on the instruction when DEBUG is set.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've logged https://github.com/dotnet/coreclr/issues/21472 to track adding such a function. If no-one has any objections, I think it would be good to handle that in a separate PR (as we would already be in a broken state if the W-bit wasn't set and it should have been at this point, among other checks that emitCheckIns could do).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to handle that in a separate PR

It is okay. I just worried about potential corner case failures, as this is a very fundamental change that impacts all the floating point and SIMD code.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is reasonable, and probably preferable, to do it in a separate PR.

@tannergooding tannergooding changed the title [WIP] Adding support for the 2-byte VEX encoding to the emitter Adding support for the 2-byte VEX encoding to the emitter Dec 9, 2018
@tannergooding
Copy link
Member Author

@dotnet-bot test Windows_NT x64 Checked jitincompletehwintrinsic
@dotnet-bot test Windows_NT x64 Checked jitx86hwintrinsicnoavx
@dotnet-bot test Windows_NT x64 Checked jitx86hwintrinsicnoavx2
@dotnet-bot test Windows_NT x64 Checked jitx86hwintrinsicnosimd
@dotnet-bot test Windows_NT x64 Checked jitnox86hwintrinsic

@dotnet-bot test Windows_NT x86 Checked jitincompletehwintrinsic
@dotnet-bot test Windows_NT x86 Checked jitx86hwintrinsicnoavx
@dotnet-bot test Windows_NT x86 Checked jitx86hwintrinsicnoavx2
@dotnet-bot test Windows_NT x86 Checked jitx86hwintrinsicnosimd
@dotnet-bot test Windows_NT x86 Checked jitnox86hwintrinsic

@dotnet-bot test Ubuntu x64 Checked jitincompletehwintrinsic
@dotnet-bot test Ubuntu x64 Checked jitx86hwintrinsicnoavx
@dotnet-bot test Ubuntu x64 Checked jitx86hwintrinsicnoavx2
@dotnet-bot test Ubuntu x64 Checked jitx86hwintrinsicnosimd
@dotnet-bot test Ubuntu x64 Checked jitnox86hwintrinsic

@tannergooding
Copy link
Member Author

Found 1 files with textual diffs.
PMI Diffs for System.Private.CoreLib.dll for x64 default jit
Summary:
(Lower is better)
Total bytes of diff: -19566 (-0.63% of base)
    diff is an improvement.
Top file improvements by size (bytes):
      -19566 : System.Private.CoreLib.dasm (-0.63% of base)
1 total files with size differences (1 improved, 0 regressed), 0 unchanged.
Top method improvements by size (bytes):
       -2064 (-9.16% of base) : System.Private.CoreLib.dasm - Avx2:ShiftLeftLogical128BitLane(struct,ubyte):struct (8 methods)
       -2064 (-9.16% of base) : System.Private.CoreLib.dasm - Avx2:ShiftRightLogical128BitLane(struct,ubyte):struct (8 methods)
        -516 (-9.14% of base) : System.Private.CoreLib.dasm - Avx:Shuffle(struct,struct,ubyte):struct (2 methods)
        -516 (-9.16% of base) : System.Private.CoreLib.dasm - Avx2:Shuffle(struct,ubyte):struct (2 methods)
        -516 (-9.16% of base) : System.Private.CoreLib.dasm - Avx2:ShuffleHigh(struct,ubyte):struct (2 methods)
Top method improvements by size (percentage):
          -8 (-14.81% of base) : System.Private.CoreLib.dasm - CalendricalCalculationsHelper:Angle(int,int,double):double
          -2 (-12.50% of base) : System.Private.CoreLib.dasm - Double:System.IConvertible.ToSingle(ref):float:this
          -2 (-12.50% of base) : System.Private.CoreLib.dasm - Single:System.IConvertible.ToDouble(ref):double:this
          -3 (-12.50% of base) : System.Private.CoreLib.dasm - Vector128:Create(double):struct
          -3 (-12.50% of base) : System.Private.CoreLib.dasm - Vector128:CreateScalar(double):struct
1607 total methods with size differences (1607 improved, 0 regressed), 12616 unchanged.

@tannergooding
Copy link
Member Author

@dotnet-bot test Windows_NT x64 Checked gcstress0x3
@dotnet-bot test Windows_NT x64 Checked gcstress0xc
@dotnet-bot test Windows_NT x64 Checked jitstress1
@dotnet-bot test Windows_NT x64 Checked jitstress2
@dotnet-bot test Windows_NT x64 Checked jitstressregs1
@dotnet-bot test Windows_NT x64 Checked jitstressregs2
@dotnet-bot test Windows_NT x64 Checked jitstressregs3
@dotnet-bot test Windows_NT x64 Checked jitstressregs4
@dotnet-bot test Windows_NT x64 Checked jitstressregs8
@dotnet-bot test Windows_NT x64 Checked jitstressregs0x10
@dotnet-bot test Windows_NT x64 Checked jitstressregs0x80
@dotnet-bot test Windows_NT x64 Checked jitstressregs0x1000

@dotnet-bot test Ubuntu x64 Checked gcstress0x3
@dotnet-bot test Ubuntu x64 Checked gcstress0xc
@dotnet-bot test Ubuntu x64 Checked jitstress1
@dotnet-bot test Ubuntu x64 Checked jitstress2
@dotnet-bot test Ubuntu x64 Checked jitstressregs1
@dotnet-bot test Ubuntu x64 Checked jitstressregs2
@dotnet-bot test Ubuntu x64 Checked jitstressregs3
@dotnet-bot test Ubuntu x64 Checked jitstressregs4
@dotnet-bot test Ubuntu x64 Checked jitstressregs8
@dotnet-bot test Ubuntu x64 Checked jitstressregs0x10
@dotnet-bot test Ubuntu x64 Checked jitstressregs0x80
@dotnet-bot test Ubuntu x64 Checked jitstressregs0x1000

@tannergooding
Copy link
Member Author

Found 1225 files with textual diffs.
PMI Diffs for assemblies in E:\Users\tagoo\Repos\coreclr\bin\tests\Windows_NT.x64.Release for x64 default jit
Summary:
(Lower is better)
Total bytes of diff: -924277 (-1.28% of base)
    diff is an improvement.
Top file improvements by size (bytes):
      -65739 : JIT\Methodical\fp\exgen\10w5d_cs_d\10w5d_cs_d.dasm (-4.32% of base)
      -65739 : JIT\Methodical\fp\exgen\10w5d_cs_r\10w5d_cs_r.dasm (-4.32% of base)
      -48763 : JIT\HardwareIntrinsics\X86\Avx2\Avx2_r\Avx2_r.dasm (-1.64% of base)
      -45744 : JIT\Methodical\fp\exgen\10w5d_cs_do\10w5d_cs_do.dasm (-4.54% of base)
      -43925 : JIT\HardwareIntrinsics\X86\Avx2\Avx2_ro\Avx2_ro.dasm (-2.51% of base)
1225 total files with size differences (1225 improved, 0 regressed), 1854 unchanged.
Top method improvements by size (bytes):
       -4543 (-4.57% of base) : JIT\Methodical\fp\exgen\1000w1d_cs_d\1000w1d_cs_d.dasm - testout1:Func_0():int
       -4543 (-4.57% of base) : JIT\Methodical\fp\exgen\1000w1d_cs_r\1000w1d_cs_r.dasm - testout1:Func_0():int
       -4389 (-5.33% of base) : JIT\HardwareIntrinsics\X86\Avx2\Avx2_ro\Avx2_ro.dasm - TestStruct:Create():struct (261 methods)
       -4337 (-2.54% of base) : JIT\HardwareIntrinsics\X86\Avx2\Avx2_r\Avx2_r.dasm - TestStruct:Create():struct (261 methods)
       -3805 (-5.15% of base) : JIT\Methodical\fp\exgen\1000w1d_cs_do\1000w1d_cs_do.dasm - testout1:Func_0():int
Top method improvements by size (percentage):
         -13 (-17.33% of base) : JIT\Performance\CodeQuality\SIMD\RayTracer\RayTracer\RayTracer.dasm - Color:.ctor(double,double,double):this
         -13 (-17.33% of base) : JIT\Performance\CodeQuality\SIMD\RayTracer\RayTracer\RayTracer.dasm - Vector:.ctor(double,double,double):this
         -11 (-17.19% of base) : JIT\Regression\JitBlue\GitHub_15237\GitHub_15237\GitHub_15237.dasm - QuaternionStruct:LengthSquaredNoVectors():float:this
         -12 (-16.90% of base) : JIT\Performance\CodeQuality\SIMD\RayTracer\RayTracer\RayTracer.dasm - Vector:set_Y(float):this
         -13 (-16.46% of base) : JIT\Regression\JitBlue\GitHub_19022\GitHub_19022\GitHub_19022.dasm - Item:.ctor():this
53747 total methods with size differences (53747 improved, 0 regressed), 174271 unchanged.

@tannergooding
Copy link
Member Author

Found 103 files with textual diffs.
PMI Diffs for System.Private.CoreLib.dll, framework assemblies for x64 default jit
Summary:
(Lower is better)
Total bytes of diff: -204516 (-0.55% of base)
    diff is an improvement.
Top file improvements by size (bytes):
      -48566 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (-1.65% of base)
      -23358 : Microsoft.CodeAnalysis.VisualBasic.dasm (-0.42% of base)
      -23064 : Microsoft.CodeAnalysis.CSharp.dasm (-0.53% of base)
      -19566 : System.Private.CoreLib.dasm (-0.63% of base)
      -17080 : System.Linq.Parallel.dasm (-1.55% of base)
103 total files with size differences (103 improved, 0 regressed), 26 unchanged.
Top method improvements by size (bytes):
       -2322 (-3.47% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - CtfTraceEventSource:InitEventMap():ref
       -2064 (-9.16% of base) : System.Private.CoreLib.dasm - Avx2:ShiftLeftLogical128BitLane(struct,ubyte):struct (8 methods)
       -2064 (-9.16% of base) : System.Private.CoreLib.dasm - Avx2:ShiftRightLogical128BitLane(struct,ubyte):struct (8 methods)
       -1956 (-1.65% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - ApplicationServerTraceEventParser:EnumerateTemplates(ref,ref):this
       -1221 (-2.56% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - KernelTraceEventParser:EnumerateTemplates(ref,ref):this
Top method improvements by size (percentage):
         -26 (-17.45% of base) : System.Numerics.Vectors.dasm - Quaternion:Normalize(struct):struct
         -12 (-17.39% of base) : System.Numerics.Vectors.dasm - Quaternion:Length():float:this
         -11 (-17.19% of base) : System.Numerics.Vectors.dasm - Quaternion:LengthSquared():float:this
          -9 (-16.67% of base) : System.Numerics.Vectors.dasm - Vector4:.ctor(float):this
         -31 (-16.67% of base) : System.Numerics.Vectors.dasm - Quaternion:Inverse(struct):struct
21736 total methods with size differences (21736 improved, 0 regressed), 167232 unchanged.

@tannergooding
Copy link
Member Author

PR Stress test failures look to be generally the same as appears in the normal stress test runs. A couple of them are HWIntrinsic and should be investigated further

@danmoseley
Copy link
Member

danmoseley commented Dec 10, 2018

Nice work !! If I understand right this reduces our footprint on customer disks by 200KB?

Copy link

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with a requested comment enhancement.

//
// Now output VEX prefix leaving the 4-byte opcode

if ((vexPrefix & 0xFFFF7F80) == 0x00C46100)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's really great!

// (the 2-byte VEX encoding only supports the 0x0F leading byte). When these
// conditions are met, we can change byte-0 from 0xC4 to 0xC5 and then
// byte-1 is the logical-or of bit 7 from byte-1 and bits 0-6 from byte 2
// from the 3-byte VEX encoding.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might move this comment up above the check for the condition, and break down the "magic" constants that it uses so it's clearer that the condition matches the comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can do. I'm also going to wait to merge this until after @fiigii can review as well, as he may be able to double-check that I didn't miss some obscurity here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved above and modified to be:

        // The 2-byte VEX encoding, requires that the X and B-bits are set (these
        // bits are inverted from the REX values so set means off), the W-bit is
        // not set (this bit is not inverted), and that the m-mmmm bits are 0-0001
        // (the 2-byte VEX encoding only supports the 0x0F leading byte). When these
        // conditions are met, we can change byte-0 from 0xC4 to 0xC5 and then
        // byte-1 is the logical-or of bit 7 from byte-1 and bits 0-6 from byte 2
        // from the 3-byte VEX encoding.
        //
        // Given the above, the check can be reduced to a simple mask and comparison.
        // * 0xFFFF7F80 is a mask that ignores any bits whose value we don't care about:
        //   * R can be set or unset              (0x7F ignores bit 7)
        //   * vvvv can be any value              (0x80 ignores bits 3-6)
        //   * L and be set or unset              (0x80 ignores bit 2)
        //   * pp can be any value                (0x80 ignores bits 0-1)
        // * 0x00C46100 is a value that signifies the requirements listed above were met:
        //   * We must be a three-byte VEX opcode (0x00C4)
        //   * X and B must be set                (0x61 validates bits 5-6)
        //   * m-mmmm must be 0-00001             (0x61 validates bits 0-4)
        //   * W must be unset                    (0x00 validates bit 7)

@fiigii
Copy link

fiigii commented Dec 10, 2018

If I understand right this reduces our footprint on customer disks by 200KB?

@danmosemsft Probably not. Crossgen only uses SSE encoding for hardware compatibility, so this work reduces our memory footprint on JITed managed code.

//
// Now output VEX prefix leaving the 4-byte opcode

if ((vexPrefix & 0xFFFF7F80) == 0x00C46100)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I think it's a bad idea to have asserts in emitDispIns, at least those that don't directly relate to displaying the instruction. This is because we don't want to have new asserts appearing in a dump or disassembly. Better would be to add an emitCheckIns or something, that can always be invoked on the instruction when DEBUG is set.

Copy link

@briansull briansull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks Good

@tannergooding tannergooding merged commit 9fe1570 into dotnet:master Dec 11, 2018
morganbr pushed a commit that referenced this pull request Dec 14, 2018
* Adding support for the 2-byte VEX encoding to the emitter

* Relocating and expanding the comment which explains the two-byte VEX encoding check
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
…eclr#21453)

* Adding support for the 2-byte VEX encoding to the emitter

* Relocating and expanding the comment which explains the two-byte VEX encoding check


Commit migrated from dotnet/coreclr@9fe1570
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update the emitter to support 2-byte VEX prefix

5 participants