[JIT] Enable EGPRs in JIT by adding REX2 encoding to the backend. #106557

Ruihan-Yin · 2024-08-16T18:10:28Z

Overview

This PR is the follow-up PR after #104637, which added the initial CPUID and XSAVE updates for APX.

This PR adds REX2 encoding functionality for legacy instructions which enables the use of EGPR for add, sub, etc. Note that this PR focuses on REX2 encoding only: a follow up PR will enable EGPR support via the register allocator.

Specification

REX2 is a 2-byte prefix with a leading byte of 0xD5, detailed format below:

Similar to REX prefix, it provides the extended bits for the MODRM.REG field, REX2.R4/R3, and MODRM.R/M field, REX2.B4/B3, and the index register in SIB byte, REX2.X4/X3, those bits will act as the higher 5th/4th bits and combine with the field in MODRM and SIB byte as a 5-bit binary to access up to 32 registers.

REX2 prefix is generally available for legacy-map-0 and legacy-map-1 instructions, say 1-byte opcode or 2-byte opcode with escape byte 0x0F, with some exceptions.

Like VEX/EVEX, REX2 is considered as the last prefix before the main opcode, so it can not co-exist with REX/VEX/EVEX.

Design

The bulk of the changes occur in the backend emitter.

As there is no existing hardware that has APX support yet, we had some hacks to bypass the CPUID checks. In this PR, DOTNET_JitStressRex2Encoding will force all the eligible instructions to be encoded in REX2, regardless the presence of EGPRs in the operand. We had another switch DOTNET_JitBypassAPXCheck, with which will only bypass the APX CPUID check but JIT will encode REX2 only if needed, this is more useful when the LSRA changes come.

Note: REX2 can be used to address the lower 16 vector registers (XMM0~XMM15). But in this PR, we are not planning to add the support for this part now for simplicity, and the EGPRs functionality for SIMD instructions can be achieved with EVEX, we are open to discuss this part and tweak the design in the follow-up PRs.

Testing

We followed a multi-step testing plan to verify the encoding correctness and the semantic correctness.

Testing results will be presented below.

1. Emitter unit tests

In codgenxarch.cpp, similar to genAmd64EmitterUnitTestsSse2, we used the JitLateDisasm feature to insert instructions to encode as unit tests for emitter, and LateDisasm will invoke LLVM to disasm the code stream, this gave us the chance to cross validate the disassembly from JIT and LLVM. The output of this step is to verify the emit paths are generating "correct" code that would not trigger #UD or have wrong semantics.

Note that we are using a custom coredistools.dll which uses a recent LLVM that supports APX decoding.

2. SuperPMI

In this step, we would run the SuperPMI pipeline to get the asmdiffs with REX2 on and off, the inputs are all the MCH files. This step will give us the chance to check if there is any assertion failure or internal error within JIT and since the pipeline will invoke coredistools.dll as well, so we can verify the encoding correctness in a larger scope.

To ensure the new changes will not hit the existing code path in terms of throughput, we ran tpdiff with base JIT to be the main branch where changes are based on, and diff JIT to be the one with all the REX2 changes.

3. JIT unit tests

The 2 steps mentioned above are mainly verifying the encoding correctness of the generated binary code. Then the last will examine the semantic correctness of the generated code, say since we are simply forcing all the compatible instructions to be encoded in REX2, so the original semantics should not change, so we expect exactly the same output with REX2 on/off.

We used the existing CoreCLR unit test set: JIT and run it in the Intel SDE emulator.

Follow-up plans

This PR is only intended to provide the REX2 encoding functionality to the JIT backend, in terms of how to properly use it, we are preparing another PR that includes the updates on LSRA such that JIT will be able to allocate EGPRs only when needed, and generate optimal code.

Ruihan-Yin · 2024-08-16T18:10:55Z

Testing results

1. Emitter unit tests

2. SuperPMI

2.1 AsmDiffs - REX2 off (No diffs expected)

Diffs are based on 2,830,588 contexts (1,185,269 MinOpts, 1,645,319 FullOpts).

MISSED contexts: base: 0 (0.00%), diff: 11 (0.00%)

Overall (-100 bytes)

Collection	Base size (bytes)	Diff size (bytes)	PerfScore in Diffs
coreclr_tests.run.windows.x64.checked.mch	409,086,766	-82	-1.35%
libraries.pmi.windows.x64.checked.mch	63,022,393	+3	0.00%
smoke_tests.nativeaot.windows.x64.checked.mch	5,023,568	-21	0.00%

MinOpts (-40 bytes)

Collection	Base size (bytes)	Diff size (bytes)	PerfScore in Diffs
coreclr_tests.run.windows.x64.checked.mch	287,081,075	-40	-2.31%

FullOpts (-60 bytes)

Collection	Base size (bytes)	Diff size (bytes)	PerfScore in Diffs
coreclr_tests.run.windows.x64.checked.mch	122,005,691	-42	0.00%
libraries.pmi.windows.x64.checked.mch	62,909,432	+3	0.00%
smoke_tests.nativeaot.windows.x64.checked.mch	5,022,597	-21	0.00%

Example diffs

coreclr_tests.run.windows.x64.checked.mch

-3 (-50.00%) : 509186.dasm - System.Runtime.Intrinsics.X86.Popcnt+X64:get_IsSupported():ubyte (FullOpts)

@@ -14,13 +14,13 @@
 G_M37565_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
 						;; size=0 bbWeight=1 PerfScore 0.00
 G_M37565_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       mov      eax, 1
-						;; size=5 bbWeight=1 PerfScore 0.25
+       xor      eax, eax
+						;; size=2 bbWeight=1 PerfScore 0.25
 G_M37565_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 6, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 6 (MethodHash=4c306d42) for method System.Runtime.Intrinsics.X86.Popcnt+X64:get_IsSupported():ubyte (FullOpts)
+; Total bytes of code 3, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 3 (MethodHash=4c306d42) for method System.Runtime.Intrinsics.X86.Popcnt+X64:get_IsSupported():ubyte (FullOpts)
 ; ============================================================
 
 Unwind Info:

-3 (-50.00%) : 579101.dasm - Runtime_34587:get_PopcntX64IsSupported():ubyte (FullOpts)

@@ -14,13 +14,13 @@
 G_M19947_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
 						;; size=0 bbWeight=1 PerfScore 0.00
 G_M19947_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       mov      eax, 1
-						;; size=5 bbWeight=1 PerfScore 0.25
+       xor      eax, eax
+						;; size=2 bbWeight=1 PerfScore 0.25
 G_M19947_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 6, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 6 (MethodHash=801bb214) for method Runtime_34587:get_PopcntX64IsSupported():ubyte (FullOpts)
+; Total bytes of code 3, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 3 (MethodHash=801bb214) for method Runtime_34587:get_PopcntX64IsSupported():ubyte (FullOpts)
 ; ============================================================
 
 Unwind Info:

-32 (-35.16%) : 510272.dasm - IntelHardwareIntrinsicTest.Program:TestEntryPoint():int (FullOpts)

@@ -12,8 +12,8 @@
 ;* V01 loc1         [V01    ] (  0,  0   )     ref  ->  zero-ref    class-hnd <<unknown class>>
 ;  V02 OutArgs      [V02    ] (  1,  1   )  struct (32) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
 ;* V03 tmp1         [V03    ] (  0,  0   )     int  ->  zero-ref   
-;  V04 tmp2         [V04,T01] (  2,  0   )     ref  ->  rdx         class-hnd single-def "impSpillSpecialSideEff" <<unknown class>>
-;  V05 tmp3         [V05,T02] (  2,  0   )     int  ->  [rbp-0x04]  do-not-enreg[M] EH-live
+;* V04 tmp2         [V04    ] (  0,  0   )     ref  ->  zero-ref    class-hnd "impSpillSpecialSideEff" <<unknown class>>
+;* V05 tmp3         [V05    ] (  0,  0   )     int  ->  zero-ref   
 ;  V06 PSPSym       [V06,T00] (  1,  1   )    long  ->  [rbp-0x10]  do-not-enreg[V] "PSPSym"
 ;
 ; Lcl frame size = 48
@@ -30,41 +30,30 @@ G_M30609_IG02:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        int3     
 						;; size=6 bbWeight=0 PerfScore 0.00
 G_M30609_IG03:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       mov      eax, dword ptr [rbp-0x04]
-						;; size=3 bbWeight=0 PerfScore 0.00
+       xor      eax, eax
+						;; size=2 bbWeight=0 PerfScore 0.00
 G_M30609_IG04:        ; bbWeight=0, epilog, nogc, extend
        add      rsp, 48
        pop      rbp
        ret      
 						;; size=6 bbWeight=0 PerfScore 0.00
-G_M30609_IG05:        ; bbWeight=0, gcrefRegs=0004 {rdx}, byrefRegs=0000 {}, byref, funclet prolog, nogc
-       ; gcrRegs +[rdx]
+G_M30609_IG05:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, funclet prolog, nogc
        push     rbp
        sub      rsp, 48
        mov      rbp, qword ptr [rcx+0x20]
        mov      qword ptr [rsp+0x20], rbp
        lea      rbp, [rbp+0x30]
 						;; size=18 bbWeight=0 PerfScore 0.00
-G_M30609_IG06:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0004 {rdx}, byrefRegs=0000 {}, gcvars, byref
-       mov      rcx, 0xD1FFAB1E      ; <unknown class>
-       call     CORINFO_HELP_ISINSTANCEOFCLASS
-       ; gcrRegs -[rdx] +[rax]
-       ; gcr arg pop 0
-       xor      ecx, ecx
-       mov      edx, 100
-       test     rax, rax
-       cmovne   ecx, edx
-       mov      dword ptr [rbp-0x04], ecx
+G_M30609_IG06:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref
        lea      rax, G_M30609_IG03
-       ; gcrRegs -[rax]
-						;; size=38 bbWeight=0 PerfScore 0.00
+						;; size=7 bbWeight=0 PerfScore 0.00
 G_M30609_IG07:        ; bbWeight=0, funclet epilog, nogc, extend
        add      rsp, 48
        pop      rbp
        ret      
 						;; size=6 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 91, prolog size 14, PerfScore 0.00, instruction count 26, allocated bytes for code 91 (MethodHash=3926886e) for method IntelHardwareIntrinsicTest.Program:TestEntryPoint():int (FullOpts)
+; Total bytes of code 59, prolog size 14, PerfScore 0.00, instruction count 19, allocated bytes for code 59 (MethodHash=3926886e) for method IntelHardwareIntrinsicTest.Program:TestEntryPoint():int (FullOpts)
 ; ============================================================
 
 Unwind Info:

-1 (-0.04%) : 510371.dasm - IntelHardwareIntrinsicTest.General.Program:IsSupported() (FullOpts)

@@ -915,13 +915,13 @@ G_M58490_IG59:        ; bbWeight=0.50, gcrefRegs=0041 {rax rsi}, byrefRegs=0000
        ; gcrRegs +[rcx]
        call     [System.Convert:ToBoolean(System.Object):ubyte]
        ; gcrRegs -[rax rcx]
-       cmp      eax, 1
+       test     eax, eax
        jne      G_M58490_IG66
        xor      rcx, rcx
        ; gcrRegs +[rcx]
        mov      gword ptr [rsp+0x20], rcx
        mov      dword ptr [rsp+0x28], 3
-						;; size=59 bbWeight=0.50 PerfScore 7.88
+						;; size=58 bbWeight=0.50 PerfScore 7.88
 G_M58490_IG60:        ; bbWeight=0.50, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, byref
        ; gcrRegs -[rcx]
        mov      gword ptr [rsp+0x30], rcx
@@ -1041,7 +1041,7 @@ G_M58490_IG70:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        int3     
 						;; size=28 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 2296, prolog size 27, PerfScore 264.21, instruction count 525, allocated bytes for code 2296 (MethodHash=a9531b85) for method IntelHardwareIntrinsicTest.General.Program:IsSupported() (FullOpts)
+; Total bytes of code 2295, prolog size 27, PerfScore 264.21, instruction count 525, allocated bytes for code 2295 (MethodHash=a9531b85) for method IntelHardwareIntrinsicTest.General.Program:IsSupported() (FullOpts)
 ; ============================================================
 
 Unwind Info:

-3 (-0.03%) : 579023.dasm - Runtime_34587:TestEntryPoint():int (FullOpts)

@@ -2680,7 +2680,7 @@ G_M52152_IG121:        ; bbWeight=0.50, gcrefRegs=0008 {rbx}, byrefRegs=0040 {rs
 G_M52152_IG122:        ; bbWeight=1.00, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, byref
        ; byrRegs -[rsi]
        lea      rcx, [rsp+0x50]
-       mov      edx, 1
+       xor      edx, edx
        call     [<unknown method>]
        ; gcr arg pop 0
        lea      rcx, [rsp+0x50]
@@ -2703,7 +2703,7 @@ G_M52152_IG122:        ; bbWeight=1.00, gcrefRegs=0008 {rbx}, byrefRegs=0000 {},
        mov      gword ptr [rsp+0x58], rax
        test     rax, rax
        je       G_M52152_IG205
-						;; size=71 bbWeight=1.00 PerfScore 17.50
+						;; size=68 bbWeight=1.00 PerfScore 17.50
 G_M52152_IG123:        ; bbWeight=0.50, gcrefRegs=0001 {rax}, byrefRegs=0000 {}, byref
        lea      rsi, bword ptr [rax+0x10]
        ; byrRegs +[rsi]
@@ -4142,7 +4142,7 @@ G_M52152_IG207:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        int3     
 						;; size=7 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 9099, prolog size 37, PerfScore 1583.63, instruction count 1959, allocated bytes for code 9099 (MethodHash=30563447) for method Runtime_34587:TestEntryPoint():int (FullOpts)
+; Total bytes of code 9096, prolog size 37, PerfScore 1583.63, instruction count 1959, allocated bytes for code 9096 (MethodHash=30563447) for method Runtime_34587:TestEntryPoint():int (FullOpts)
 ; ============================================================
 
 Unwind Info:

+4 (+1.99%) : 205245.dasm - IntelHardwareIntrinsicTest.Program:TestEntryPoint():int (MinOpts)

@@ -39,14 +39,13 @@ G_M30609_IG03:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
 						;; size=5 bbWeight=0.50 PerfScore 0.50
 G_M30609_IG04:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
        nop      
-       xor      eax, eax
-       mov      dword ptr [rbp-0x18], eax
+       mov      dword ptr [rbp-0x18], 1
        cmp      dword ptr [rbp-0x18], 0
        jne      SHORT G_M30609_IG05
        xor      eax, eax
        mov      dword ptr [rbp-0x1C], eax
        jmp      SHORT G_M30609_IG06
-						;; size=19 bbWeight=1 PerfScore 7.75
+						;; size=21 bbWeight=1 PerfScore 7.50
 G_M30609_IG05:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        mov      dword ptr [rbp-0x1C], 100
 						;; size=7 bbWeight=1 PerfScore 1.00
@@ -87,14 +86,12 @@ G_M30609_IG11:        ; bbWeight=1, gcVars=0000000000000000 {}, gcrefRegs=0004 {
        ; gcrRegs +[rax]
        mov      gword ptr [rbp-0x10], rax
        nop      
-       xor      eax, eax
-       ; gcrRegs -[rax]
-       mov      dword ptr [rbp-0x2C], eax
+       mov      dword ptr [rbp-0x2C], 1
        cmp      dword ptr [rbp-0x2C], 0
        jne      SHORT G_M30609_IG13
-						;; size=24 bbWeight=1 PerfScore 7.50
+						;; size=26 bbWeight=1 PerfScore 7.25
 G_M30609_IG12:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
-       ; gcrRegs -[rdx]
+       ; gcrRegs -[rax rdx]
        mov      rdx, gword ptr [rbp-0x10]
        ; gcrRegs +[rdx]
        mov      rcx, 0xD1FFAB1E      ; <unknown class>
@@ -124,7 +121,7 @@ G_M30609_IG15:        ; bbWeight=0, funclet epilog, nogc, extend
        ret      
 						;; size=6 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 201, prolog size 28, PerfScore 43.58, instruction count 63, allocated bytes for code 201 (MethodHash=3926886e) for method IntelHardwareIntrinsicTest.Program:TestEntryPoint():int (MinOpts)
+; Total bytes of code 205, prolog size 28, PerfScore 43.08, instruction count 61, allocated bytes for code 205 (MethodHash=3926886e) for method IntelHardwareIntrinsicTest.Program:TestEntryPoint():int (MinOpts)
 ; ============================================================
 
 Unwind Info:

libraries.pmi.windows.x64.checked.mch

-3 (-50.00%) : 29280.dasm - System.Runtime.Intrinsics.X86.Popcnt+X64:get_IsSupported():ubyte (FullOpts)

@@ -14,13 +14,13 @@
 G_M37565_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
 						;; size=0 bbWeight=1 PerfScore 0.00
 G_M37565_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       mov      eax, 1
-						;; size=5 bbWeight=1 PerfScore 0.25
+       xor      eax, eax
+						;; size=2 bbWeight=1 PerfScore 0.25
 G_M37565_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 6, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 6 (MethodHash=4c306d42) for method System.Runtime.Intrinsics.X86.Popcnt+X64:get_IsSupported():ubyte (FullOpts)
+; Total bytes of code 3, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 3 (MethodHash=4c306d42) for method System.Runtime.Intrinsics.X86.Popcnt+X64:get_IsSupported():ubyte (FullOpts)
 ; ============================================================
 
 Unwind Info:

+3 (+100.00%) : 29884.dasm - System.Runtime.Intrinsics.X86.X86Serialize+X64:get_IsSupported():ubyte (FullOpts)

@@ -14,13 +14,13 @@
 G_M34763_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
 						;; size=0 bbWeight=1 PerfScore 0.00
 G_M34763_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+       mov      eax, 1
+						;; size=5 bbWeight=1 PerfScore 0.25
 G_M34763_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 3, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 3 (MethodHash=bcd77834) for method System.Runtime.Intrinsics.X86.X86Serialize+X64:get_IsSupported():ubyte (FullOpts)
+; Total bytes of code 6, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 6 (MethodHash=bcd77834) for method System.Runtime.Intrinsics.X86.X86Serialize+X64:get_IsSupported():ubyte (FullOpts)
 ; ============================================================
 
 Unwind Info:

+3 (+100.00%) : 29207.dasm - System.Runtime.Intrinsics.X86.AvxVnni+X64:get_IsSupported():ubyte (FullOpts)

@@ -14,13 +14,13 @@
 G_M31227_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
 						;; size=0 bbWeight=1 PerfScore 0.00
 G_M31227_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+       mov      eax, 1
+						;; size=5 bbWeight=1 PerfScore 0.25
 G_M31227_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 3, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 3 (MethodHash=e6278604) for method System.Runtime.Intrinsics.X86.AvxVnni+X64:get_IsSupported():ubyte (FullOpts)
+; Total bytes of code 6, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 6 (MethodHash=e6278604) for method System.Runtime.Intrinsics.X86.AvxVnni+X64:get_IsSupported():ubyte (FullOpts)
 ; ============================================================
 
 Unwind Info:

smoke_tests.nativeaot.windows.x64.checked.mch

-3 (-50.00%) : 14296.dasm - Program:X86SerializeX64IsSupported():ubyte (FullOpts)

@@ -14,13 +14,13 @@
 G_M13406_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
 						;; size=0 bbWeight=1 PerfScore 0.00
 G_M13406_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       mov      eax, 1
-						;; size=5 bbWeight=1 PerfScore 0.25
+       xor      eax, eax
+						;; size=2 bbWeight=1 PerfScore 0.25
 G_M13406_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 6, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 6 (MethodHash=00a5cba1) for method Program:X86SerializeX64IsSupported():ubyte (FullOpts)
+; Total bytes of code 3, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 3 (MethodHash=00a5cba1) for method Program:X86SerializeX64IsSupported():ubyte (FullOpts)
 ; ============================================================
 
 Unwind Info:

-3 (-50.00%) : 21532.dasm - Program:AesX64IsSupported():ubyte (FullOpts)

@@ -14,13 +14,13 @@
 G_M55817_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
 						;; size=0 bbWeight=1 PerfScore 0.00
 G_M55817_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       mov      eax, 1
-						;; size=5 bbWeight=1 PerfScore 0.25
+       xor      eax, eax
+						;; size=2 bbWeight=1 PerfScore 0.25
 G_M55817_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 6, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 6 (MethodHash=c5da25f6) for method Program:AesX64IsSupported():ubyte (FullOpts)
+; Total bytes of code 3, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 3 (MethodHash=c5da25f6) for method Program:AesX64IsSupported():ubyte (FullOpts)
 ; ============================================================
 
 Unwind Info:

-3 (-50.00%) : 19229.dasm - Program:X86SerializeX64IsSupported():ubyte (FullOpts)

@@ -14,13 +14,13 @@
 G_M13406_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
 						;; size=0 bbWeight=1 PerfScore 0.00
 G_M13406_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       mov      eax, 1
-						;; size=5 bbWeight=1 PerfScore 0.25
+       xor      eax, eax
+						;; size=2 bbWeight=1 PerfScore 0.25
 G_M13406_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 6, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 6 (MethodHash=00a5cba1) for method Program:X86SerializeX64IsSupported():ubyte (FullOpts)
+; Total bytes of code 3, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 3 (MethodHash=00a5cba1) for method Program:X86SerializeX64IsSupported():ubyte (FullOpts)
 ; ============================================================
 
 Unwind Info:

+3 (+100.00%) : 19199.dasm - Program:AvxVnniX64IsSupported():ubyte (FullOpts)

@@ -14,13 +14,13 @@
 G_M60430_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
 						;; size=0 bbWeight=1 PerfScore 0.00
 G_M60430_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+       mov      eax, 1
+						;; size=5 bbWeight=1 PerfScore 0.25
 G_M60430_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 3, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 3 (MethodHash=e20b13f1) for method Program:AvxVnniX64IsSupported():ubyte (FullOpts)
+; Total bytes of code 6, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 6 (MethodHash=e20b13f1) for method Program:AvxVnniX64IsSupported():ubyte (FullOpts)
 ; ============================================================
 
 Unwind Info:

+3 (+100.00%) : 21515.dasm - Program:FmaX64IsSupported():ubyte (FullOpts)

@@ -14,13 +14,13 @@
 G_M2260_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
 						;; size=0 bbWeight=1 PerfScore 0.00
 G_M2260_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+       mov      eax, 1
+						;; size=5 bbWeight=1 PerfScore 0.25
 G_M2260_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 3, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 3 (MethodHash=36a7f72b) for method Program:FmaX64IsSupported():ubyte (FullOpts)
+; Total bytes of code 6, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 6 (MethodHash=36a7f72b) for method Program:FmaX64IsSupported():ubyte (FullOpts)
 ; ============================================================
 
 Unwind Info:

+3 (+100.00%) : 21526.dasm - Program:Avx2X64IsSupported():ubyte (FullOpts)

@@ -14,13 +14,13 @@
 G_M13187_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
 						;; size=0 bbWeight=1 PerfScore 0.00
 G_M13187_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+       mov      eax, 1
+						;; size=5 bbWeight=1 PerfScore 0.25
 G_M13187_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 3, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 3 (MethodHash=f683cc7c) for method Program:Avx2X64IsSupported():ubyte (FullOpts)
+; Total bytes of code 6, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 6 (MethodHash=f683cc7c) for method Program:Avx2X64IsSupported():ubyte (FullOpts)
 ; ============================================================
 
 Unwind Info:

Details

Size improvements/regressions per collection

Collection	Contexts with diffs	Improvements	Regressions	Improvements (bytes)	Regressions (bytes)
aspnet.run.windows.x64.checked.mch	0	0	0	-0	+0
benchmarks.run.windows.x64.checked.mch	0	0	0	-0	+0
benchmarks.run_pgo.windows.x64.checked.mch	0	0	0	-0	+0
benchmarks.run_tiered.windows.x64.checked.mch	0	0	0	-0	+0
coreclr_tests.run.windows.x64.checked.mch	12	11	1	-86	+4
libraries.crossgen2.windows.x64.checked.mch	0	0	0	-0	+0
libraries.pmi.windows.x64.checked.mch	3	1	2	-3	+6
libraries_tests.run.windows.x64.Release.mch	0	0	0	-0	+0
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	0	0	0	-0	+0
realworld.run.windows.x64.checked.mch	0	0	0	-0	+0
smoke_tests.nativeaot.windows.x64.checked.mch	17	12	5	-36	+15
	32	24	8	-125	+25

PerfScore improvements/regressions per collection

Collection	Contexts with diffs	Improvements	Regressions	Same PerfScore	Improvements (PerfScore)	Regressions (PerfScore)	PerfScore Overall in FullOpts
aspnet.run.windows.x64.checked.mch	0	0	0	0	0.00%	0.00%	0.0000%
benchmarks.run.windows.x64.checked.mch	0	0	0	0	0.00%	0.00%	0.0000%
benchmarks.run_pgo.windows.x64.checked.mch	0	0	0	0	0.00%	0.00%	0.0000%
benchmarks.run_tiered.windows.x64.checked.mch	0	0	0	0	0.00%	0.00%	0.0000%
coreclr_tests.run.windows.x64.checked.mch	12	2	1	9	-7.87%	+0.03%	0.0000%
libraries.crossgen2.windows.x64.checked.mch	0	0	0	0	0.00%	0.00%	0.0000%
libraries.pmi.windows.x64.checked.mch	3	0	0	3	0.00%	0.00%	0.0000%
libraries_tests.run.windows.x64.Release.mch	0	0	0	0	0.00%	0.00%	0.0000%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	0	0	0	0	0.00%	0.00%	0.0000%
realworld.run.windows.x64.checked.mch	0	0	0	0	0.00%	0.00%	0.0000%
smoke_tests.nativeaot.windows.x64.checked.mch	17	0	0	17	0.00%	0.00%	0.0000%

Context information

Collection	Diffed contexts	MinOpts	FullOpts	Missed, base	Missed, diff
aspnet.run.windows.x64.checked.mch	141,224	77,324	63,900	0 (0.00%)	0 (0.00%)
benchmarks.run.windows.x64.checked.mch	38,352	6	38,346	0 (0.00%)	0 (0.00%)
benchmarks.run_pgo.windows.x64.checked.mch	120,280	68,103	52,177	0 (0.00%)	0 (0.00%)
benchmarks.run_tiered.windows.x64.checked.mch	76,876	56,358	20,518	0 (0.00%)	0 (0.00%)
coreclr_tests.run.windows.x64.checked.mch	642,813	393,776	249,037	0 (0.00%)	5 (0.00%)
libraries.crossgen2.windows.x64.checked.mch	276,889	15	276,874	0 (0.00%)	2 (0.00%)
libraries.pmi.windows.x64.checked.mch	316,010	6	316,004	0 (0.00%)	1 (0.00%)
libraries_tests.run.windows.x64.Release.mch	814,679	567,674	247,005	0 (0.00%)	0 (0.00%)
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	343,895	21,994	321,901	0 (0.00%)	0 (0.00%)
realworld.run.windows.x64.checked.mch	28,368	3	28,365	0 (0.00%)	0 (0.00%)
smoke_tests.nativeaot.windows.x64.checked.mch	31,202	10	31,192	0 (0.00%)	3 (0.01%)
	2,830,588	1,185,269	1,645,319	0 (0.00%)	11 (0.00%)

jit-analyze output

Comment:
SuperPMI pipeline with REX2 off:
Theoretically, it should be clean compared with the base corerun, the diff found here is because of the changes in the ISA definition, as it can be noticed, all the diffs are either from xor eax, eax to mov eax, 1, or in the reverse way, this is essentially indicating runtime is reporting discrepant ISA availability, and this is expected to be resolved when the public CPUID PR gets merged.

2.2 AsmDiffs - REX2 on

SuperPMI pipeline:

Diffs are based on 2,830,588 contexts (1,185,269 MinOpts, 1,645,319 FullOpts).

MISSED contexts: base: 0 (0.00%), diff: 11 (0.00%)

Diff JIT options: JitStressRex2Encoding=1

Overall (+243,564,575 bytes)

Collection	Base size (bytes)	Diff size (bytes)	PerfScore in Diffs
aspnet.run.windows.x64.checked.mch	49,406,065	+10,392,179	0.00%
benchmarks.run.windows.x64.checked.mch	12,230,572	+3,013,399	0.00%
benchmarks.run_pgo.windows.x64.checked.mch	40,192,955	+8,962,474	0.00%
benchmarks.run_tiered.windows.x64.checked.mch	17,606,620	+4,199,746	0.00%
coreclr_tests.run.windows.x64.checked.mch	409,086,766	+84,314,011	-0.00%
libraries.crossgen2.windows.x64.checked.mch	45,250,222	+11,739,139	0.00%
libraries.pmi.windows.x64.checked.mch	63,022,393	+15,125,725	0.00%
libraries_tests.run.windows.x64.Release.mch	336,307,360	+69,356,394	0.00%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	147,986,092	+32,521,941	0.00%
realworld.run.windows.x64.checked.mch	11,552,911	+2,545,126	0.00%
smoke_tests.nativeaot.windows.x64.checked.mch	5,023,568	+1,394,441	0.00%

MinOpts (+113,554,608 bytes)

Collection	Base size (bytes)	Diff size (bytes)	PerfScore in Diffs
aspnet.run.windows.x64.checked.mch	23,379,337	+4,569,510	0.00%
benchmarks.run.windows.x64.checked.mch	588	+163	0.00%
benchmarks.run_pgo.windows.x64.checked.mch	18,796,230	+4,022,239	0.00%
benchmarks.run_tiered.windows.x64.checked.mch	13,707,415	+3,160,967	0.00%
coreclr_tests.run.windows.x64.checked.mch	287,081,075	+59,383,662	-0.00%
libraries.crossgen2.windows.x64.checked.mch	1,705	+442	0.00%
libraries.pmi.windows.x64.checked.mch	112,961	+15,358	0.00%
libraries_tests.run.windows.x64.Release.mch	203,705,533	+39,847,511	0.00%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	10,696,900	+2,483,962	0.00%
realworld.run.windows.x64.checked.mch	412,968	+70,590	0.00%
smoke_tests.nativeaot.windows.x64.checked.mch	971	+204	0.00%

FullOpts (+130,009,967 bytes)

Collection	Base size (bytes)	Diff size (bytes)	PerfScore in Diffs
aspnet.run.windows.x64.checked.mch	26,026,728	+5,822,669	0.00%
benchmarks.run.windows.x64.checked.mch	12,229,984	+3,013,236	0.00%
benchmarks.run_pgo.windows.x64.checked.mch	21,396,725	+4,940,235	0.00%
benchmarks.run_tiered.windows.x64.checked.mch	3,899,205	+1,038,779	0.00%
coreclr_tests.run.windows.x64.checked.mch	122,005,691	+24,930,349	0.00%
libraries.crossgen2.windows.x64.checked.mch	45,248,517	+11,738,697	0.00%
libraries.pmi.windows.x64.checked.mch	62,909,432	+15,110,367	0.00%
libraries_tests.run.windows.x64.Release.mch	132,601,827	+29,508,883	0.00%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	137,289,192	+30,037,979	0.00%
realworld.run.windows.x64.checked.mch	11,139,943	+2,474,536	0.00%
smoke_tests.nativeaot.windows.x64.checked.mch	5,022,597	+1,394,237	0.00%

Example diffs

aspnet.run.windows.x64.checked.mch

+2 (+0.97%) : 32079.dasm - Perfolizer.Mathematics.Distributions.StudentDistribution:InverseTwoTailStudent(double,double):double (Tier1)

@@ -29,7 +29,7 @@ G_M40993_IG01:        ; bbWeight=0.25, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
        vmovaps  xmmword ptr [rsp+0x20], xmm12
        vmovaps  xmm6, xmm0
        vmovaps  xmm7, xmm1
-						;; size=60 bbWeight=0.25 PerfScore 3.69
+						;; size=61 bbWeight=0.25 PerfScore 3.69
 G_M40993_IG02:        ; bbWeight=0.25, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        vxorps   xmm8, xmm8, xmm8
        vmovsd   xmm9, qword ptr [reloc @RWD00]
@@ -72,13 +72,13 @@ G_M40993_IG08:        ; bbWeight=1, epilog, nogc, extend
        vmovaps  xmm12, xmmword ptr [rsp+0x20]
        add      rsp, 152
        ret      
-						;; size=53 bbWeight=1 PerfScore 29.25
+						;; size=54 bbWeight=1 PerfScore 29.25
 RWD00  	dq	408F400000000000h	;         1000
 RWD08  	dq	3FE0000000000000h	;          0.5
 RWD16  	dq	3E112E0BE826D695h	;        1e-09
 
 
-; Total bytes of code 207, prolog size 52, PerfScore 120.27, instruction count 38, allocated bytes for code 207 (MethodHash=c7795fde) for method Perfolizer.Mathematics.Distributions.StudentDistribution:InverseTwoTailStudent(double,double):double (Tier1)
+; Total bytes of code 209, prolog size 53, PerfScore 120.27, instruction count 38, allocated bytes for code 209 (MethodHash=c7795fde) for method Perfolizer.Mathematics.Distributions.StudentDistribution:InverseTwoTailStudent(double,double):double (Tier1)
 ; ============================================================
 
 Unwind Info:
@@ -86,24 +86,24 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x34
+  SizeOfProlog      : 0x35
   CountOfUnwindCodes: 16
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x34 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM12 (12)
+    CodeOffset: 0x35 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM12 (12)
       Scaled Small Offset: 2 * 16 = 32 = 0x00020
-    CodeOffset: 0x2E UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM11 (11)
+    CodeOffset: 0x2F UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM11 (11)
       Scaled Small Offset: 3 * 16 = 48 = 0x00030
-    CodeOffset: 0x28 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM10 (10)
+    CodeOffset: 0x29 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM10 (10)
       Scaled Small Offset: 4 * 16 = 64 = 0x00040
-    CodeOffset: 0x22 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM9 (9)
+    CodeOffset: 0x23 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM9 (9)
       Scaled Small Offset: 5 * 16 = 80 = 0x00050
-    CodeOffset: 0x1C UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM8 (8)
+    CodeOffset: 0x1D UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM8 (8)
       Scaled Small Offset: 6 * 16 = 96 = 0x00060
-    CodeOffset: 0x16 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM7 (7)
+    CodeOffset: 0x17 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM7 (7)
       Scaled Small Offset: 7 * 16 = 112 = 0x00070
-    CodeOffset: 0x10 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM6 (6)
+    CodeOffset: 0x11 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM6 (6)
       Scaled Small Offset: 8 * 16 = 128 = 0x00080
-    CodeOffset: 0x07 UnwindOp: UWOP_ALLOC_LARGE (1)     OpInfo: 0 - Scaled small  
+    CodeOffset: 0x08 UnwindOp: UWOP_ALLOC_LARGE (1)     OpInfo: 0 - Scaled small  
       Size: 19 * 8 = 152 = 0x00098

+2 (+0.97%) : 66029.dasm - Perfolizer.Mathematics.Distributions.StudentDistribution:InverseTwoTailStudent(double,double):double (Tier1)

@@ -30,7 +30,7 @@ G_M40993_IG01:        ; bbWeight=0.98, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
        vmovaps  xmmword ptr [rsp+0x20], xmm12
        vmovaps  xmm6, xmm0
        vmovaps  xmm7, xmm1
-						;; size=60 bbWeight=0.98 PerfScore 14.39
+						;; size=61 bbWeight=0.98 PerfScore 14.39
 G_M40993_IG02:        ; bbWeight=0.98, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        vxorps   xmm8, xmm8, xmm8
        vmovsd   xmm9, qword ptr [reloc @RWD00]
@@ -69,7 +69,7 @@ G_M40993_IG07:        ; bbWeight=1.00, epilog, nogc, extend
        vmovaps  xmm12, xmmword ptr [rsp+0x20]
        add      rsp, 152
        ret      
-						;; size=53 bbWeight=1.00 PerfScore 29.25
+						;; size=54 bbWeight=1.00 PerfScore 29.25
 G_M40993_IG08:        ; bbWeight=15.73, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref, isz
        vmovaps  xmm8, xmm12
        jmp      SHORT G_M40993_IG05
@@ -79,7 +79,7 @@ RWD08  	dq	3FE0000000000000h	;          0.5
 RWD16  	dq	3E112E0BE826D695h	;        1e-09
 
 
-; Total bytes of code 207, prolog size 52, PerfScore 840.20, instruction count 38, allocated bytes for code 207 (MethodHash=c7795fde) for method Perfolizer.Mathematics.Distributions.StudentDistribution:InverseTwoTailStudent(double,double):double (Tier1)
+; Total bytes of code 209, prolog size 53, PerfScore 840.20, instruction count 38, allocated bytes for code 209 (MethodHash=c7795fde) for method Perfolizer.Mathematics.Distributions.StudentDistribution:InverseTwoTailStudent(double,double):double (Tier1)
 ; ============================================================
 
 Unwind Info:
@@ -87,24 +87,24 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x34
+  SizeOfProlog      : 0x35
   CountOfUnwindCodes: 16
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x34 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM12 (12)
+    CodeOffset: 0x35 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM12 (12)
       Scaled Small Offset: 2 * 16 = 32 = 0x00020
-    CodeOffset: 0x2E UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM11 (11)
+    CodeOffset: 0x2F UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM11 (11)
       Scaled Small Offset: 3 * 16 = 48 = 0x00030
-    CodeOffset: 0x28 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM10 (10)
+    CodeOffset: 0x29 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM10 (10)
       Scaled Small Offset: 4 * 16 = 64 = 0x00040
-    CodeOffset: 0x22 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM9 (9)
+    CodeOffset: 0x23 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM9 (9)
       Scaled Small Offset: 5 * 16 = 80 = 0x00050
-    CodeOffset: 0x1C UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM8 (8)
+    CodeOffset: 0x1D UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM8 (8)
       Scaled Small Offset: 6 * 16 = 96 = 0x00060
-    CodeOffset: 0x16 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM7 (7)
+    CodeOffset: 0x17 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM7 (7)
       Scaled Small Offset: 7 * 16 = 112 = 0x00070
-    CodeOffset: 0x10 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM6 (6)
+    CodeOffset: 0x11 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM6 (6)
       Scaled Small Offset: 8 * 16 = 128 = 0x00080
-    CodeOffset: 0x07 UnwindOp: UWOP_ALLOC_LARGE (1)     OpInfo: 0 - Scaled small  
+    CodeOffset: 0x08 UnwindOp: UWOP_ALLOC_LARGE (1)     OpInfo: 0 - Scaled small  
       Size: 19 * 8 = 152 = 0x00098

+2 (+0.97%) : 60922.dasm - Perfolizer.Mathematics.Distributions.StudentDistribution:InverseTwoTailStudent(double,double):double (FullOpts)

@@ -29,7 +29,7 @@ G_M40993_IG01:        ; bbWeight=0.25, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
        vmovaps  xmmword ptr [rsp+0x20], xmm12
        vmovaps  xmm6, xmm0
        vmovaps  xmm7, xmm1
-						;; size=60 bbWeight=0.25 PerfScore 3.69
+						;; size=61 bbWeight=0.25 PerfScore 3.69
 G_M40993_IG02:        ; bbWeight=0.25, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        vxorps   xmm8, xmm8, xmm8
        vmovsd   xmm9, qword ptr [reloc @RWD00]
@@ -72,13 +72,13 @@ G_M40993_IG08:        ; bbWeight=1, epilog, nogc, extend
        vmovaps  xmm12, xmmword ptr [rsp+0x20]
        add      rsp, 152
        ret      
-						;; size=53 bbWeight=1 PerfScore 29.25
+						;; size=54 bbWeight=1 PerfScore 29.25
 RWD00  	dq	408F400000000000h	;         1000
 RWD08  	dq	3FE0000000000000h	;          0.5
 RWD16  	dq	3E112E0BE826D695h	;        1e-09
 
 
-; Total bytes of code 207, prolog size 52, PerfScore 120.27, instruction count 38, allocated bytes for code 207 (MethodHash=c7795fde) for method Perfolizer.Mathematics.Distributions.StudentDistribution:InverseTwoTailStudent(double,double):double (FullOpts)
+; Total bytes of code 209, prolog size 53, PerfScore 120.27, instruction count 38, allocated bytes for code 209 (MethodHash=c7795fde) for method Perfolizer.Mathematics.Distributions.StudentDistribution:InverseTwoTailStudent(double,double):double (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -86,24 +86,24 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x34
+  SizeOfProlog      : 0x35
   CountOfUnwindCodes: 16
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x34 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM12 (12)
+    CodeOffset: 0x35 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM12 (12)
       Scaled Small Offset: 2 * 16 = 32 = 0x00020
-    CodeOffset: 0x2E UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM11 (11)
+    CodeOffset: 0x2F UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM11 (11)
       Scaled Small Offset: 3 * 16 = 48 = 0x00030
-    CodeOffset: 0x28 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM10 (10)
+    CodeOffset: 0x29 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM10 (10)
       Scaled Small Offset: 4 * 16 = 64 = 0x00040
-    CodeOffset: 0x22 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM9 (9)
+    CodeOffset: 0x23 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM9 (9)
       Scaled Small Offset: 5 * 16 = 80 = 0x00050
-    CodeOffset: 0x1C UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM8 (8)
+    CodeOffset: 0x1D UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM8 (8)
       Scaled Small Offset: 6 * 16 = 96 = 0x00060
-    CodeOffset: 0x16 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM7 (7)
+    CodeOffset: 0x17 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM7 (7)
       Scaled Small Offset: 7 * 16 = 112 = 0x00070
-    CodeOffset: 0x10 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM6 (6)
+    CodeOffset: 0x11 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM6 (6)
       Scaled Small Offset: 8 * 16 = 128 = 0x00080
-    CodeOffset: 0x07 UnwindOp: UWOP_ALLOC_LARGE (1)     OpInfo: 0 - Scaled small  
+    CodeOffset: 0x08 UnwindOp: UWOP_ALLOC_LARGE (1)     OpInfo: 0 - Scaled small  
       Size: 19 * 8 = 152 = 0x00098

+7 (+87.50%) : 23755.dasm - System.Runtime.Intrinsics.X86.X86Serialize:get_IsSupported():ubyte (Tier0)

@@ -12,16 +12,16 @@
 G_M10906_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
        push     rbp
        mov      rbp, rsp
-						;; size=4 bbWeight=1 PerfScore 1.25
+						;; size=7 bbWeight=1 PerfScore 1.25
 G_M10906_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+						;; size=4 bbWeight=1 PerfScore 0.25
 G_M10906_IG03:        ; bbWeight=1, epilog, nogc, extend
        pop      rbp
        ret      
-						;; size=2 bbWeight=1 PerfScore 1.50
+						;; size=4 bbWeight=1 PerfScore 1.50
 
-; Total bytes of code 8, prolog size 4, PerfScore 3.00, instruction count 5, allocated bytes for code 8 (MethodHash=da05d565) for method System.Runtime.Intrinsics.X86.X86Serialize:get_IsSupported():ubyte (Tier0)
+; Total bytes of code 15, prolog size 7, PerfScore 3.00, instruction count 5, allocated bytes for code 15 (MethodHash=da05d565) for method System.Runtime.Intrinsics.X86.X86Serialize:get_IsSupported():ubyte (Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -29,9 +29,9 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x01
+  SizeOfProlog      : 0x03
   CountOfUnwindCodes: 1
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+    CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)

+7 (+87.50%) : 9608.dasm - System.Runtime.CompilerServices.RuntimeHelpers:IsReferenceOrContainsReferences[System.Text.Json.JsonDocument+DbRow]():ubyte (Tier0)

@@ -12,16 +12,16 @@
 G_M31768_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
        push     rbp
        mov      rbp, rsp
-						;; size=4 bbWeight=1 PerfScore 1.25
+						;; size=7 bbWeight=1 PerfScore 1.25
 G_M31768_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+						;; size=4 bbWeight=1 PerfScore 0.25
 G_M31768_IG03:        ; bbWeight=1, epilog, nogc, extend
        pop      rbp
        ret      
-						;; size=2 bbWeight=1 PerfScore 1.50
+						;; size=4 bbWeight=1 PerfScore 1.50
 
-; Total bytes of code 8, prolog size 4, PerfScore 3.00, instruction count 5, allocated bytes for code 8 (MethodHash=2e9583e7) for method System.Runtime.CompilerServices.RuntimeHelpers:IsReferenceOrContainsReferences[System.Text.Json.JsonDocument+DbRow]():ubyte (Tier0)
+; Total bytes of code 15, prolog size 7, PerfScore 3.00, instruction count 5, allocated bytes for code 15 (MethodHash=2e9583e7) for method System.Runtime.CompilerServices.RuntimeHelpers:IsReferenceOrContainsReferences[System.Text.Json.JsonDocument+DbRow]():ubyte (Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -29,9 +29,9 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x01
+  SizeOfProlog      : 0x03
   CountOfUnwindCodes: 1
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+    CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)

+7 (+87.50%) : 861.dasm - System.Buffers.IndexOfAnyAsciiSearcher+ContainsAnyResultMapper`1[short]:get_NotFound():ubyte (Tier0)

@@ -12,16 +12,16 @@
 G_M16088_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
        push     rbp
        mov      rbp, rsp
-						;; size=4 bbWeight=1 PerfScore 1.25
+						;; size=7 bbWeight=1 PerfScore 1.25
 G_M16088_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+						;; size=4 bbWeight=1 PerfScore 0.25
 G_M16088_IG03:        ; bbWeight=1, epilog, nogc, extend
        pop      rbp
        ret      
-						;; size=2 bbWeight=1 PerfScore 1.50
+						;; size=4 bbWeight=1 PerfScore 1.50
 
-; Total bytes of code 8, prolog size 4, PerfScore 3.00, instruction count 5, allocated bytes for code 8 (MethodHash=da12c127) for method System.Buffers.IndexOfAnyAsciiSearcher+ContainsAnyResultMapper`1[short]:get_NotFound():ubyte (Tier0)
+; Total bytes of code 15, prolog size 7, PerfScore 3.00, instruction count 5, allocated bytes for code 15 (MethodHash=da12c127) for method System.Buffers.IndexOfAnyAsciiSearcher+ContainsAnyResultMapper`1[short]:get_NotFound():ubyte (Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -29,9 +29,9 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x01
+  SizeOfProlog      : 0x03
   CountOfUnwindCodes: 1
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+    CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)

benchmarks.run.windows.x64.checked.mch

+1 (+0.14%) : 30038.dasm - System.Numerics.Matrix4x4+Impl:Transform(byref,byref):System.Numerics.Matrix4x4+Impl (FullOpts)

@@ -199,14 +199,14 @@ G_M8955_IG03:        ; bbWeight=1, extend
        vmovups  xmmword ptr [rcx+0x30], xmm0
        mov      rax, rcx
        ; byrRegs +[rax]
-						;; size=354 bbWeight=1 PerfScore 161.25
+						;; size=355 bbWeight=1 PerfScore 161.25
 G_M8955_IG04:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 RWD00  	dd	3F800000h		;         1
 
 
-; Total bytes of code 693, prolog size 0, PerfScore 345.25, instruction count 118, allocated bytes for code 693 (MethodHash=5906dd04) for method System.Numerics.Matrix4x4+Impl:Transform(byref,byref):System.Numerics.Matrix4x4+Impl (FullOpts)
+; Total bytes of code 694, prolog size 0, PerfScore 345.25, instruction count 118, allocated bytes for code 694 (MethodHash=5906dd04) for method System.Numerics.Matrix4x4+Impl:Transform(byref,byref):System.Numerics.Matrix4x4+Impl (FullOpts)
 ; ============================================================
 
 Unwind Info:

+1 (+0.20%) : 21351.dasm - System.Numerics.Quaternion:CreateFromRotationMatrix(System.Numerics.Matrix4x4):System.Numerics.Quaternion (FullOpts)

@@ -140,7 +140,7 @@ G_M15800_IG07:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0002 {rcx}, byr
        vmovups  xmmword ptr [rcx], xmm4
        mov      rax, rcx
        ; byrRegs +[rax]
-						;; size=7 bbWeight=1 PerfScore 2.25
+						;; size=8 bbWeight=1 PerfScore 2.25
 G_M15800_IG08:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
@@ -148,7 +148,7 @@ RWD00  	dd	3F800000h		;         1
 RWD04  	dd	3F000000h		;       0.5
 
 
-; Total bytes of code 494, prolog size 0, PerfScore 173.42, instruction count 93, allocated bytes for code 494 (MethodHash=096ec247) for method System.Numerics.Quaternion:CreateFromRotationMatrix(System.Numerics.Matrix4x4):System.Numerics.Quaternion (FullOpts)
+; Total bytes of code 495, prolog size 0, PerfScore 173.42, instruction count 93, allocated bytes for code 495 (MethodHash=096ec247) for method System.Numerics.Quaternion:CreateFromRotationMatrix(System.Numerics.Matrix4x4):System.Numerics.Quaternion (FullOpts)
 ; ============================================================
 
 Unwind Info:

+1 (+0.23%) : 33197.dasm - System.Runtime.Intrinsics.VectorMath:ExpSingle[System.Runtime.Intrinsics.Vector512`1[float],System.Runtime.Intrinsics.Vector512`1[uint],System.Runtime.Intrinsics.Vector512`1[double],System.Runtime.Intrinsics.Vector512`1[ulong]](System.Runtime.Intrinsics.Vector512`1[float]):System.Runtime.Intrinsics.Vector512`1[float] (FullOpts)

@@ -140,7 +140,7 @@ G_M41960_IG04:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0002 {rcx}, byr
        vmovups  zmmword ptr [rcx], zmm1
        mov      rax, rcx
        ; byrRegs +[rax]
-						;; size=9 bbWeight=1 PerfScore 2.25
+						;; size=10 bbWeight=1 PerfScore 2.25
 G_M41960_IG05:        ; bbWeight=1, epilog, nogc, extend
        vzeroupper 
        ret      
@@ -163,7 +163,7 @@ RWD588 	dd	00000000h, 00000000h, 00000000h, 00000000h, 00000000h, 00000000h
 RWD640 	dq	7F8000007F800000h, 7F8000007F800000h, 7F8000007F800000h, 7F8000007F800000h, 7F8000007F800000h, 7F8000007F800000h, 7F8000007F800000h, 7F8000007F800000h
 
 
-; Total bytes of code 438, prolog size 0, PerfScore 166.17, instruction count 66, allocated bytes for code 441 (MethodHash=326b5c17) for method System.Runtime.Intrinsics.VectorMath:ExpSingle[System.Runtime.Intrinsics.Vector512`1[float],System.Runtime.Intrinsics.Vector512`1[uint],System.Runtime.Intrinsics.Vector512`1[double],System.Runtime.Intrinsics.Vector512`1[ulong]](System.Runtime.Intrinsics.Vector512`1[float]):System.Runtime.Intrinsics.Vector512`1[float] (FullOpts)
+; Total bytes of code 439, prolog size 0, PerfScore 166.17, instruction count 66, allocated bytes for code 442 (MethodHash=326b5c17) for method System.Runtime.Intrinsics.VectorMath:ExpSingle[System.Runtime.Intrinsics.Vector512`1[float],System.Runtime.Intrinsics.Vector512`1[uint],System.Runtime.Intrinsics.Vector512`1[double],System.Runtime.Intrinsics.Vector512`1[ulong]](System.Runtime.Intrinsics.Vector512`1[float]):System.Runtime.Intrinsics.Vector512`1[float] (FullOpts)
 ; ============================================================
 
 Unwind Info:

+4 (+80.00%) : 32809.dasm - System.Linq.Tests.Perf_Enumerable+<>c:b__25_1(int):int:this (FullOpts)

@@ -18,12 +18,12 @@ G_M1177_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
 G_M1177_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        mov      eax, edx
        neg      eax
-						;; size=4 bbWeight=1 PerfScore 0.50
+						;; size=8 bbWeight=1 PerfScore 0.50
 G_M1177_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 5, prolog size 0, PerfScore 1.50, instruction count 3, allocated bytes for code 5 (MethodHash=bea0fb66) for method System.Linq.Tests.Perf_Enumerable+<>c:<OrderByThenBy>b__25_1(int):int:this (FullOpts)
+; Total bytes of code 9, prolog size 0, PerfScore 1.50, instruction count 3, allocated bytes for code 9 (MethodHash=bea0fb66) for method System.Linq.Tests.Perf_Enumerable+<>c:<OrderByThenBy>b__25_1(int):int:this (FullOpts)
 ; ============================================================
 
 Unwind Info:

+4 (+80.00%) : 38257.dasm - Microsoft.Extensions.Primitives.StringSegmentBenchmark:Equals_Object_Invalid():ubyte:this (FullOpts)

@@ -98,12 +98,12 @@ G_M24192_IG02:        ; bbWeight=1, gcrefRegs=0002 {rcx}, byrefRegs=0000 {}, byr
        ; gcrRegs +[rcx]
        cmp      byte  ptr [rcx], cl
        xor      eax, eax
-						;; size=4 bbWeight=1 PerfScore 3.25
+						;; size=8 bbWeight=1 PerfScore 3.25
 G_M24192_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 5, prolog size 0, PerfScore 4.25, instruction count 3, allocated bytes for code 5 (MethodHash=7f17a17f) for method Microsoft.Extensions.Primitives.StringSegmentBenchmark:Equals_Object_Invalid():ubyte:this (FullOpts)
+; Total bytes of code 9, prolog size 0, PerfScore 4.25, instruction count 3, allocated bytes for code 9 (MethodHash=7f17a17f) for method Microsoft.Extensions.Primitives.StringSegmentBenchmark:Equals_Object_Invalid():ubyte:this (FullOpts)
 ; ============================================================
 
 Unwind Info:

+4 (+80.00%) : 22418.dasm - System.Xml.XmlDictionaryReader:TryGetArrayLength(byref):ubyte:this (FullOpts)

@@ -19,13 +19,13 @@ G_M47878_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0004 {rdx}, byr
        ; byrRegs +[rdx]
        xor      eax, eax
        mov      dword ptr [rdx], eax
-						;; size=4 bbWeight=1 PerfScore 1.25
+						;; size=8 bbWeight=1 PerfScore 1.25
 G_M47878_IG03:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, epilog, nogc
        ; byrRegs -[rdx]
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 5, prolog size 0, PerfScore 2.25, instruction count 3, allocated bytes for code 5 (MethodHash=068744f9) for method System.Xml.XmlDictionaryReader:TryGetArrayLength(byref):ubyte:this (FullOpts)
+; Total bytes of code 9, prolog size 0, PerfScore 2.25, instruction count 3, allocated bytes for code 9 (MethodHash=068744f9) for method System.Xml.XmlDictionaryReader:TryGetArrayLength(byref):ubyte:this (FullOpts)
 ; ============================================================
 
 Unwind Info:

benchmarks.run_pgo.windows.x64.checked.mch

+19 (+0.74%) : 95347.dasm - System.Runtime.Intrinsics.X86.Sse2:ShiftRightLogical128BitLane(System.Runtime.Intrinsics.Vector128`1[ulong],ubyte):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier0)

@@ -18,7 +18,7 @@ G_M9309_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
        mov      bword ptr [rbp+0x10], rcx
        mov      bword ptr [rbp+0x18], rdx
        mov      dword ptr [rbp+0x20], r8d
-						;; size=16 bbWeight=1 PerfScore 4.25
+						;; size=22 bbWeight=1 PerfScore 4.25
 G_M9309_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        mov      rax, bword ptr [rbp+0x18]
        ; byrRegs +[rax]
@@ -31,7 +31,7 @@ G_M9309_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        lea      rcx, G_M9309_IG02
        add      rdx, rcx
        jmp      rdx
-						;; size=36 bbWeight=1 PerfScore 12.00
+						;; size=45 bbWeight=1 PerfScore 12.00
 G_M9309_IG03:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        vpsrldq  xmm0, xmm0, 0
        jmp      G_M9309_IG259
@@ -1061,11 +1061,11 @@ G_M9309_IG259:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        ; byrRegs +[rax]
        vmovups  xmmword ptr [rax], xmm0
        mov      rax, bword ptr [rbp+0x10]
-						;; size=12 bbWeight=1 PerfScore 4.00
+						;; size=14 bbWeight=1 PerfScore 4.00
 G_M9309_IG260:        ; bbWeight=1, epilog, nogc, extend
        pop      rbp
        ret      
-						;; size=2 bbWeight=1 PerfScore 1.50
+						;; size=4 bbWeight=1 PerfScore 1.50
 RWD00  	dd	G_M9309_IG03 - G_M9309_IG02
        	dd	G_M9309_IG04 - G_M9309_IG02
        	dd	G_M9309_IG05 - G_M9309_IG02
@@ -1324,7 +1324,7 @@ RWD00  	dd	G_M9309_IG03 - G_M9309_IG02
        	dd	G_M9309_IG258 - G_M9309_IG02
 
 
-; Total bytes of code 2572, prolog size 4, PerfScore 789.75, instruction count 531, allocated bytes for code 2572 (MethodHash=c6eadba2) for method System.Runtime.Intrinsics.X86.Sse2:ShiftRightLogical128BitLane(System.Runtime.Intrinsics.Vector128`1[ulong],ubyte):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier0)
+; Total bytes of code 2591, prolog size 7, PerfScore 789.75, instruction count 531, allocated bytes for code 2591 (MethodHash=c6eadba2) for method System.Runtime.Intrinsics.X86.Sse2:ShiftRightLogical128BitLane(System.Runtime.Intrinsics.Vector128`1[ulong],ubyte):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -1332,9 +1332,9 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x01
+  SizeOfProlog      : 0x03
   CountOfUnwindCodes: 1
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+    CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)

+1 (+0.85%) : 74356.dasm - Benchmarks.SIMD.RayTracer.Sphere:Normal(Benchmarks.SIMD.RayTracer.Vector):Benchmarks.SIMD.RayTracer.Vector:this (Tier1)

@@ -75,7 +75,7 @@ G_M56301_IG04:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0004 {rdx}, byr
        vextractps dword ptr [rdx+0x08], xmm0, 2
        mov      rax, rdx
        ; byrRegs +[rax]
-						;; size=31 bbWeight=1 PerfScore 18.25
+						;; size=32 bbWeight=1 PerfScore 18.25
 G_M56301_IG05:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
@@ -88,7 +88,7 @@ RWD00  	dd	3F800000h		;         1
 RWD04  	dd	7F800000h		;       inf
 
 
-; Total bytes of code 118, prolog size 0, PerfScore 86.58, instruction count 26, allocated bytes for code 118 (MethodHash=3e632412) for method Benchmarks.SIMD.RayTracer.Sphere:Normal(Benchmarks.SIMD.RayTracer.Vector):Benchmarks.SIMD.RayTracer.Vector:this (Tier1)
+; Total bytes of code 119, prolog size 0, PerfScore 86.58, instruction count 26, allocated bytes for code 119 (MethodHash=3e632412) for method Benchmarks.SIMD.RayTracer.Sphere:Normal(Benchmarks.SIMD.RayTracer.Vector):Benchmarks.SIMD.RayTracer.Vector:this (Tier1)
 ; ============================================================
 
 Unwind Info:

+1 (+0.99%) : 74345.dasm - Benchmarks.SIMD.RayTracer.Vector:Norm(Benchmarks.SIMD.RayTracer.Vector):Benchmarks.SIMD.RayTracer.Vector (Tier1)

@@ -60,7 +60,7 @@ G_M16924_IG04:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0002 {rcx}, byr
        vextractps dword ptr [rcx+0x08], xmm0, 2
        mov      rax, rcx
        ; byrRegs +[rax]
-						;; size=31 bbWeight=1 PerfScore 18.25
+						;; size=32 bbWeight=1 PerfScore 18.25
 G_M16924_IG05:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
@@ -73,7 +73,7 @@ RWD00  	dd	3F800000h		;         1
 RWD04  	dd	7F800000h		;       inf
 
 
-; Total bytes of code 101, prolog size 0, PerfScore 76.58, instruction count 23, allocated bytes for code 101 (MethodHash=5eedbde3) for method Benchmarks.SIMD.RayTracer.Vector:Norm(Benchmarks.SIMD.RayTracer.Vector):Benchmarks.SIMD.RayTracer.Vector (Tier1)
+; Total bytes of code 102, prolog size 0, PerfScore 76.58, instruction count 23, allocated bytes for code 102 (MethodHash=5eedbde3) for method Benchmarks.SIMD.RayTracer.Vector:Norm(Benchmarks.SIMD.RayTracer.Vector):Benchmarks.SIMD.RayTracer.Vector (Tier1)
 ; ============================================================
 
 Unwind Info:

+7 (+87.50%) : 1151.dasm - System.OperatingSystem:IsBrowser():ubyte (Tier0)

@@ -12,16 +12,16 @@
 G_M61665_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
        push     rbp
        mov      rbp, rsp
-						;; size=4 bbWeight=1 PerfScore 1.25
+						;; size=7 bbWeight=1 PerfScore 1.25
 G_M61665_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+						;; size=4 bbWeight=1 PerfScore 0.25
 G_M61665_IG03:        ; bbWeight=1, epilog, nogc, extend
        pop      rbp
        ret      
-						;; size=2 bbWeight=1 PerfScore 1.50
+						;; size=4 bbWeight=1 PerfScore 1.50
 
-; Total bytes of code 8, prolog size 4, PerfScore 3.00, instruction count 5, allocated bytes for code 8 (MethodHash=f0b70f1e) for method System.OperatingSystem:IsBrowser():ubyte (Tier0)
+; Total bytes of code 15, prolog size 7, PerfScore 3.00, instruction count 5, allocated bytes for code 15 (MethodHash=f0b70f1e) for method System.OperatingSystem:IsBrowser():ubyte (Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -29,9 +29,9 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x01
+  SizeOfProlog      : 0x03
   CountOfUnwindCodes: 1
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+    CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)

+7 (+87.50%) : 76175.dasm - Microsoft.CodeAnalysis.Collections.Internal.RoslynUnsafe:NullRef[int]():byref (Tier0)

@@ -12,16 +12,16 @@
 G_M47256_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
        push     rbp
        mov      rbp, rsp
-						;; size=4 bbWeight=1 PerfScore 1.25
+						;; size=7 bbWeight=1 PerfScore 1.25
 G_M47256_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+						;; size=4 bbWeight=1 PerfScore 0.25
 G_M47256_IG03:        ; bbWeight=1, epilog, nogc, extend
        pop      rbp
        ret      
-						;; size=2 bbWeight=1 PerfScore 1.50
+						;; size=4 bbWeight=1 PerfScore 1.50
 
-; Total bytes of code 8, prolog size 4, PerfScore 3.00, instruction count 5, allocated bytes for code 8 (MethodHash=8e824767) for method Microsoft.CodeAnalysis.Collections.Internal.RoslynUnsafe:NullRef[int]():byref (Tier0)
+; Total bytes of code 15, prolog size 7, PerfScore 3.00, instruction count 5, allocated bytes for code 15 (MethodHash=8e824767) for method Microsoft.CodeAnalysis.Collections.Internal.RoslynUnsafe:NullRef[int]():byref (Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -29,9 +29,9 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x01
+  SizeOfProlog      : 0x03
   CountOfUnwindCodes: 1
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+    CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)

+7 (+87.50%) : 27756.dasm - System.SByte:System.Numerics.INumberBase.get_Zero():byte (Tier0)

@@ -12,16 +12,16 @@
 G_M14356_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
        push     rbp
        mov      rbp, rsp
-						;; size=4 bbWeight=1 PerfScore 1.25
+						;; size=7 bbWeight=1 PerfScore 1.25
 G_M14356_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+						;; size=4 bbWeight=1 PerfScore 0.25
 G_M14356_IG03:        ; bbWeight=1, epilog, nogc, extend
        pop      rbp
        ret      
-						;; size=2 bbWeight=1 PerfScore 1.50
+						;; size=4 bbWeight=1 PerfScore 1.50
 
-; Total bytes of code 8, prolog size 4, PerfScore 3.00, instruction count 5, allocated bytes for code 8 (MethodHash=1d31c7eb) for method System.SByte:System.Numerics.INumberBase<System.SByte>.get_Zero():byte (Tier0)
+; Total bytes of code 15, prolog size 7, PerfScore 3.00, instruction count 5, allocated bytes for code 15 (MethodHash=1d31c7eb) for method System.SByte:System.Numerics.INumberBase<System.SByte>.get_Zero():byte (Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -29,9 +29,9 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x01
+  SizeOfProlog      : 0x03
   CountOfUnwindCodes: 1
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+    CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)

benchmarks.run_tiered.windows.x64.checked.mch

+1 (+0.26%) : 61218.dasm - System.Runtime.Intrinsics.VectorMath:ExpSingle[System.Runtime.Intrinsics.Vector256`1[float],System.Runtime.Intrinsics.Vector256`1[uint],System.Runtime.Intrinsics.Vector256`1[double],System.Runtime.Intrinsics.Vector256`1[ulong]](System.Runtime.Intrinsics.Vector256`1[float]):System.Runtime.Intrinsics.Vector256`1[float] (Tier1)

@@ -139,7 +139,7 @@ G_M61896_IG04:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0002 {rcx}, byr
        vmovups  ymmword ptr [rcx], ymm1
        mov      rax, rcx
        ; byrRegs +[rax]
-						;; size=7 bbWeight=1 PerfScore 2.25
+						;; size=8 bbWeight=1 PerfScore 2.25
 G_M61896_IG05:        ; bbWeight=1, epilog, nogc, extend
        vzeroupper 
        ret      
@@ -161,7 +161,7 @@ RWD352 	dq	42B1721842B17218h, 42B1721842B17218h, 42B1721842B17218h, 42B1721842B1
 RWD384 	dq	7F8000007F800000h, 7F8000007F800000h, 7F8000007F800000h, 7F8000007F800000h
 
 
-; Total bytes of code 378, prolog size 0, PerfScore 188.17, instruction count 65, allocated bytes for code 380 (MethodHash=a8e80e37) for method System.Runtime.Intrinsics.VectorMath:ExpSingle[System.Runtime.Intrinsics.Vector256`1[float],System.Runtime.Intrinsics.Vector256`1[uint],System.Runtime.Intrinsics.Vector256`1[double],System.Runtime.Intrinsics.Vector256`1[ulong]](System.Runtime.Intrinsics.Vector256`1[float]):System.Runtime.Intrinsics.Vector256`1[float] (Tier1)
+; Total bytes of code 379, prolog size 0, PerfScore 188.17, instruction count 65, allocated bytes for code 381 (MethodHash=a8e80e37) for method System.Runtime.Intrinsics.VectorMath:ExpSingle[System.Runtime.Intrinsics.Vector256`1[float],System.Runtime.Intrinsics.Vector256`1[uint],System.Runtime.Intrinsics.Vector256`1[double],System.Runtime.Intrinsics.Vector256`1[ulong]](System.Runtime.Intrinsics.Vector256`1[float]):System.Runtime.Intrinsics.Vector256`1[float] (Tier1)
 ; ============================================================
 
 Unwind Info:

+19 (+0.74%) : 59614.dasm - System.Runtime.Intrinsics.X86.Sse2:ShiftRightLogical128BitLane(System.Runtime.Intrinsics.Vector128`1[ulong],ubyte):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier0)

@@ -18,7 +18,7 @@ G_M9309_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
        mov      bword ptr [rbp+0x10], rcx
        mov      bword ptr [rbp+0x18], rdx
        mov      dword ptr [rbp+0x20], r8d
-						;; size=16 bbWeight=1 PerfScore 4.25
+						;; size=22 bbWeight=1 PerfScore 4.25
 G_M9309_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        mov      rax, bword ptr [rbp+0x18]
        ; byrRegs +[rax]
@@ -31,7 +31,7 @@ G_M9309_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        lea      rcx, G_M9309_IG02
        add      rdx, rcx
        jmp      rdx
-						;; size=36 bbWeight=1 PerfScore 12.00
+						;; size=45 bbWeight=1 PerfScore 12.00
 G_M9309_IG03:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        vpsrldq  xmm0, xmm0, 0
        jmp      G_M9309_IG259
@@ -1061,11 +1061,11 @@ G_M9309_IG259:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        ; byrRegs +[rax]
        vmovups  xmmword ptr [rax], xmm0
        mov      rax, bword ptr [rbp+0x10]
-						;; size=12 bbWeight=1 PerfScore 4.00
+						;; size=14 bbWeight=1 PerfScore 4.00
 G_M9309_IG260:        ; bbWeight=1, epilog, nogc, extend
        pop      rbp
        ret      
-						;; size=2 bbWeight=1 PerfScore 1.50
+						;; size=4 bbWeight=1 PerfScore 1.50
 RWD00  	dd	G_M9309_IG03 - G_M9309_IG02
        	dd	G_M9309_IG04 - G_M9309_IG02
        	dd	G_M9309_IG05 - G_M9309_IG02
@@ -1324,7 +1324,7 @@ RWD00  	dd	G_M9309_IG03 - G_M9309_IG02
        	dd	G_M9309_IG258 - G_M9309_IG02
 
 
-; Total bytes of code 2572, prolog size 4, PerfScore 789.75, instruction count 531, allocated bytes for code 2572 (MethodHash=c6eadba2) for method System.Runtime.Intrinsics.X86.Sse2:ShiftRightLogical128BitLane(System.Runtime.Intrinsics.Vector128`1[ulong],ubyte):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier0)
+; Total bytes of code 2591, prolog size 7, PerfScore 789.75, instruction count 531, allocated bytes for code 2591 (MethodHash=c6eadba2) for method System.Runtime.Intrinsics.X86.Sse2:ShiftRightLogical128BitLane(System.Runtime.Intrinsics.Vector128`1[ulong],ubyte):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -1332,9 +1332,9 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x01
+  SizeOfProlog      : 0x03
   CountOfUnwindCodes: 1
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+    CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)

+1 (+0.85%) : 56215.dasm - Benchmarks.SIMD.RayTracer.Sphere:Normal(Benchmarks.SIMD.RayTracer.Vector):Benchmarks.SIMD.RayTracer.Vector:this (Tier1)

@@ -77,7 +77,7 @@ G_M56301_IG05:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0004 {rdx}, byr
        vextractps dword ptr [rdx+0x08], xmm0, 2
        mov      rax, rdx
        ; byrRegs +[rax]
-						;; size=31 bbWeight=1 PerfScore 18.25
+						;; size=32 bbWeight=1 PerfScore 18.25
 G_M56301_IG06:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
@@ -85,7 +85,7 @@ RWD00  	dd	3F800000h		;         1
 RWD04  	dd	7F800000h		;       inf
 
 
-; Total bytes of code 118, prolog size 0, PerfScore 82.58, instruction count 26, allocated bytes for code 118 (MethodHash=3e632412) for method Benchmarks.SIMD.RayTracer.Sphere:Normal(Benchmarks.SIMD.RayTracer.Vector):Benchmarks.SIMD.RayTracer.Vector:this (Tier1)
+; Total bytes of code 119, prolog size 0, PerfScore 82.58, instruction count 26, allocated bytes for code 119 (MethodHash=3e632412) for method Benchmarks.SIMD.RayTracer.Sphere:Normal(Benchmarks.SIMD.RayTracer.Vector):Benchmarks.SIMD.RayTracer.Vector:this (Tier1)
 ; ============================================================
 
 Unwind Info:

+7 (+87.50%) : 12704.dasm - System.UInt16:System.Numerics.INumberBase.get_Zero():ushort (Tier0)

@@ -12,16 +12,16 @@
 G_M3961_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
        push     rbp
        mov      rbp, rsp
-						;; size=4 bbWeight=1 PerfScore 1.25
+						;; size=7 bbWeight=1 PerfScore 1.25
 G_M3961_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+						;; size=4 bbWeight=1 PerfScore 0.25
 G_M3961_IG03:        ; bbWeight=1, epilog, nogc, extend
        pop      rbp
        ret      
-						;; size=2 bbWeight=1 PerfScore 1.50
+						;; size=4 bbWeight=1 PerfScore 1.50
 
-; Total bytes of code 8, prolog size 4, PerfScore 3.00, instruction count 5, allocated bytes for code 8 (MethodHash=985cf086) for method System.UInt16:System.Numerics.INumberBase<System.UInt16>.get_Zero():ushort (Tier0)
+; Total bytes of code 15, prolog size 7, PerfScore 3.00, instruction count 5, allocated bytes for code 15 (MethodHash=985cf086) for method System.UInt16:System.Numerics.INumberBase<System.UInt16>.get_Zero():ushort (Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -29,9 +29,9 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x01
+  SizeOfProlog      : 0x03
   CountOfUnwindCodes: 1
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+    CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)

+7 (+87.50%) : 73294.dasm - System.Byte:System.Numerics.INumberBase.get_Zero():ubyte (Tier0)

@@ -12,16 +12,16 @@
 G_M54785_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
        push     rbp
        mov      rbp, rsp
-						;; size=4 bbWeight=1 PerfScore 1.25
+						;; size=7 bbWeight=1 PerfScore 1.25
 G_M54785_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+						;; size=4 bbWeight=1 PerfScore 0.25
 G_M54785_IG03:        ; bbWeight=1, epilog, nogc, extend
        pop      rbp
        ret      
-						;; size=2 bbWeight=1 PerfScore 1.50
+						;; size=4 bbWeight=1 PerfScore 1.50
 
-; Total bytes of code 8, prolog size 4, PerfScore 3.00, instruction count 5, allocated bytes for code 8 (MethodHash=e45a29fe) for method System.Byte:System.Numerics.INumberBase<System.Byte>.get_Zero():ubyte (Tier0)
+; Total bytes of code 15, prolog size 7, PerfScore 3.00, instruction count 5, allocated bytes for code 15 (MethodHash=e45a29fe) for method System.Byte:System.Numerics.INumberBase<System.Byte>.get_Zero():ubyte (Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -29,9 +29,9 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x01
+  SizeOfProlog      : 0x03
   CountOfUnwindCodes: 1
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+    CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)

+7 (+87.50%) : 15012.dasm - System.Runtime.CompilerServices.RuntimeHelpers:IsBitwiseEquatable[System.Reflection.Emit.OpCode]():ubyte (Tier0)

@@ -12,16 +12,16 @@
 G_M969_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
        push     rbp
        mov      rbp, rsp
-						;; size=4 bbWeight=1 PerfScore 1.25
+						;; size=7 bbWeight=1 PerfScore 1.25
 G_M969_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+						;; size=4 bbWeight=1 PerfScore 0.25
 G_M969_IG03:        ; bbWeight=1, epilog, nogc, extend
        pop      rbp
        ret      
-						;; size=2 bbWeight=1 PerfScore 1.50
+						;; size=4 bbWeight=1 PerfScore 1.50
 
-; Total bytes of code 8, prolog size 4, PerfScore 3.00, instruction count 5, allocated bytes for code 8 (MethodHash=f362fc36) for method System.Runtime.CompilerServices.RuntimeHelpers:IsBitwiseEquatable[System.Reflection.Emit.OpCode]():ubyte (Tier0)
+; Total bytes of code 15, prolog size 7, PerfScore 3.00, instruction count 5, allocated bytes for code 15 (MethodHash=f362fc36) for method System.Runtime.CompilerServices.RuntimeHelpers:IsBitwiseEquatable[System.Reflection.Emit.OpCode]():ubyte (Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -29,9 +29,9 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x01
+  SizeOfProlog      : 0x03
   CountOfUnwindCodes: 1
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+    CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)

coreclr_tests.run.windows.x64.checked.mch

-1 (-16.67%) : 509186.dasm - System.Runtime.Intrinsics.X86.Popcnt+X64:get_IsSupported():ubyte (FullOpts)

@@ -14,13 +14,13 @@
 G_M37565_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
 						;; size=0 bbWeight=1 PerfScore 0.00
 G_M37565_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       mov      eax, 1
-						;; size=5 bbWeight=1 PerfScore 0.25
+       xor      eax, eax
+						;; size=4 bbWeight=1 PerfScore 0.25
 G_M37565_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 6, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 6 (MethodHash=4c306d42) for method System.Runtime.Intrinsics.X86.Popcnt+X64:get_IsSupported():ubyte (FullOpts)
+; Total bytes of code 5, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 5 (MethodHash=4c306d42) for method System.Runtime.Intrinsics.X86.Popcnt+X64:get_IsSupported():ubyte (FullOpts)
 ; ============================================================
 
 Unwind Info:

-1 (-16.67%) : 579101.dasm - Runtime_34587:get_PopcntX64IsSupported():ubyte (FullOpts)

@@ -14,13 +14,13 @@
 G_M19947_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
 						;; size=0 bbWeight=1 PerfScore 0.00
 G_M19947_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       mov      eax, 1
-						;; size=5 bbWeight=1 PerfScore 0.25
+       xor      eax, eax
+						;; size=4 bbWeight=1 PerfScore 0.25
 G_M19947_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 6, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 6 (MethodHash=801bb214) for method Runtime_34587:get_PopcntX64IsSupported():ubyte (FullOpts)
+; Total bytes of code 5, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 5 (MethodHash=801bb214) for method Runtime_34587:get_PopcntX64IsSupported():ubyte (FullOpts)
 ; ============================================================
 
 Unwind Info:

-12 (-13.19%) : 510272.dasm - IntelHardwareIntrinsicTest.Program:TestEntryPoint():int (FullOpts)

@@ -12,8 +12,8 @@
 ;* V01 loc1         [V01    ] (  0,  0   )     ref  ->  zero-ref    class-hnd <<unknown class>>
 ;  V02 OutArgs      [V02    ] (  1,  1   )  struct (32) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
 ;* V03 tmp1         [V03    ] (  0,  0   )     int  ->  zero-ref   
-;  V04 tmp2         [V04,T01] (  2,  0   )     ref  ->  rdx         class-hnd single-def "impSpillSpecialSideEff" <<unknown class>>
-;  V05 tmp3         [V05,T02] (  2,  0   )     int  ->  [rbp-0x04]  do-not-enreg[M] EH-live
+;* V04 tmp2         [V04    ] (  0,  0   )     ref  ->  zero-ref    class-hnd "impSpillSpecialSideEff" <<unknown class>>
+;* V05 tmp3         [V05    ] (  0,  0   )     int  ->  zero-ref   
 ;  V06 PSPSym       [V06,T00] (  1,  1   )    long  ->  [rbp-0x10]  do-not-enreg[V] "PSPSym"
 ;
 ; Lcl frame size = 48
@@ -23,48 +23,37 @@ G_M30609_IG01:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {
        sub      rsp, 48
        lea      rbp, [rsp+0x30]
        mov      qword ptr [rbp-0x10], rsp
-						;; size=14 bbWeight=0 PerfScore 0.00
+						;; size=19 bbWeight=0 PerfScore 0.00
 G_M30609_IG02:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        call     CORINFO_HELP_THROW_PLATFORM_NOT_SUPPORTED
        ; gcr arg pop 0
        int3     
 						;; size=6 bbWeight=0 PerfScore 0.00
 G_M30609_IG03:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       mov      eax, dword ptr [rbp-0x04]
-						;; size=3 bbWeight=0 PerfScore 0.00
+       xor      eax, eax
+						;; size=4 bbWeight=0 PerfScore 0.00
 G_M30609_IG04:        ; bbWeight=0, epilog, nogc, extend
        add      rsp, 48
        pop      rbp
        ret      
-						;; size=6 bbWeight=0 PerfScore 0.00
-G_M30609_IG05:        ; bbWeight=0, gcrefRegs=0004 {rdx}, byrefRegs=0000 {}, byref, funclet prolog, nogc
-       ; gcrRegs +[rdx]
+						;; size=9 bbWeight=0 PerfScore 0.00
+G_M30609_IG05:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, funclet prolog, nogc
        push     rbp
        sub      rsp, 48
        mov      rbp, qword ptr [rcx+0x20]
        mov      qword ptr [rsp+0x20], rbp
        lea      rbp, [rbp+0x30]
-						;; size=18 bbWeight=0 PerfScore 0.00
-G_M30609_IG06:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0004 {rdx}, byrefRegs=0000 {}, gcvars, byref
-       mov      rcx, 0xD1FFAB1E      ; <unknown class>
-       call     CORINFO_HELP_ISINSTANCEOFCLASS
-       ; gcrRegs -[rdx] +[rax]
-       ; gcr arg pop 0
-       xor      ecx, ecx
-       mov      edx, 100
-       test     rax, rax
-       cmovne   ecx, edx
-       mov      dword ptr [rbp-0x04], ecx
+						;; size=24 bbWeight=0 PerfScore 0.00
+G_M30609_IG06:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref
        lea      rax, G_M30609_IG03
-       ; gcrRegs -[rax]
-						;; size=38 bbWeight=0 PerfScore 0.00
+						;; size=8 bbWeight=0 PerfScore 0.00
 G_M30609_IG07:        ; bbWeight=0, funclet epilog, nogc, extend
        add      rsp, 48
        pop      rbp
        ret      
-						;; size=6 bbWeight=0 PerfScore 0.00
+						;; size=9 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 91, prolog size 14, PerfScore 0.00, instruction count 26, allocated bytes for code 91 (MethodHash=3926886e) for method IntelHardwareIntrinsicTest.Program:TestEntryPoint():int (FullOpts)
+; Total bytes of code 79, prolog size 19, PerfScore 0.00, instruction count 19, allocated bytes for code 79 (MethodHash=3926886e) for method IntelHardwareIntrinsicTest.Program:TestEntryPoint():int (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -72,25 +61,25 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x05
+  SizeOfProlog      : 0x08
   CountOfUnwindCodes: 2
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x05 UnwindOp: UWOP_ALLOC_SMALL (2)     OpInfo: 5 * 8 + 8 = 48 = 0x30
-    CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+    CodeOffset: 0x08 UnwindOp: UWOP_ALLOC_SMALL (2)     OpInfo: 5 * 8 + 8 = 48 = 0x30
+    CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
 Unwind Info:
   >> Start offset   : 0xd1ffab1e (not in unwind data)
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x05
+  SizeOfProlog      : 0x08
   CountOfUnwindCodes: 2
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x05 UnwindOp: UWOP_ALLOC_SMALL (2)     OpInfo: 5 * 8 + 8 = 48 = 0x30
-    CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+    CodeOffset: 0x08 UnwindOp: UWOP_ALLOC_SMALL (2)     OpInfo: 5 * 8 + 8 = 48 = 0x30
+    CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
 *************** EH table for IntelHardwareIntrinsicTest.Program:TestEntryPoint():int
 1 EH table entries, 0 duplicate clauses, 0 cloned finallys, 1 total EH entries reported to VM
 EH#0: try [G_M30609_IG02..G_M30609_IG03) handled by [G_M30609_IG05..END) (class: 100000B)

+7 (+87.50%) : 149326.dasm - System.Runtime.CompilerServices.RuntimeHelpers:IsReferenceOrContainsReferences[double]():ubyte (Tier0)

@@ -12,16 +12,16 @@
 G_M50835_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
        push     rbp
        mov      rbp, rsp
-						;; size=4 bbWeight=1 PerfScore 1.25
+						;; size=7 bbWeight=1 PerfScore 1.25
 G_M50835_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+						;; size=4 bbWeight=1 PerfScore 0.25
 G_M50835_IG03:        ; bbWeight=1, epilog, nogc, extend
        pop      rbp
        ret      
-						;; size=2 bbWeight=1 PerfScore 1.50
+						;; size=4 bbWeight=1 PerfScore 1.50
 
-; Total bytes of code 8, prolog size 4, PerfScore 3.00, instruction count 5, allocated bytes for code 8 (MethodHash=94b8396c) for method System.Runtime.CompilerServices.RuntimeHelpers:IsReferenceOrContainsReferences[double]():ubyte (Tier0)
+; Total bytes of code 15, prolog size 7, PerfScore 3.00, instruction count 5, allocated bytes for code 15 (MethodHash=94b8396c) for method System.Runtime.CompilerServices.RuntimeHelpers:IsReferenceOrContainsReferences[double]():ubyte (Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -29,9 +29,9 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x01
+  SizeOfProlog      : 0x03
   CountOfUnwindCodes: 1
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+    CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)

+7 (+87.50%) : 270935.dasm - Benchstone.BenchF.LLoops:Clock():int (Instrumented Tier0)

@@ -12,16 +12,16 @@
 G_M63398_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
        push     rbp
        mov      rbp, rsp
-						;; size=4 bbWeight=1 PerfScore 1.25
+						;; size=7 bbWeight=1 PerfScore 1.25
 G_M63398_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+						;; size=4 bbWeight=1 PerfScore 0.25
 G_M63398_IG03:        ; bbWeight=1, epilog, nogc, extend
        pop      rbp
        ret      
-						;; size=2 bbWeight=1 PerfScore 1.50
+						;; size=4 bbWeight=1 PerfScore 1.50
 
-; Total bytes of code 8, prolog size 4, PerfScore 3.00, instruction count 5, allocated bytes for code 8 (MethodHash=e4d00859) for method Benchstone.BenchF.LLoops:Clock():int (Instrumented Tier0)
+; Total bytes of code 15, prolog size 7, PerfScore 3.00, instruction count 5, allocated bytes for code 15 (MethodHash=e4d00859) for method Benchstone.BenchF.LLoops:Clock():int (Instrumented Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -29,9 +29,9 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x01
+  SizeOfProlog      : 0x03
   CountOfUnwindCodes: 1
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+    CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)

+10 (+90.91%) : 526916.dasm - BringUpTest_NotAndNeg:NotAndNeg(int,int):int (FullOpts)

@@ -21,12 +21,12 @@ G_M23640_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        mov      ecx, edx
        not      ecx
        xor      eax, ecx
-						;; size=10 bbWeight=1 PerfScore 1.25
+						;; size=20 bbWeight=1 PerfScore 1.25
 G_M23640_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 11, prolog size 0, PerfScore 2.25, instruction count 6, allocated bytes for code 11 (MethodHash=81d2a3a7) for method BringUpTest_NotAndNeg:NotAndNeg(int,int):int (FullOpts)
+; Total bytes of code 21, prolog size 0, PerfScore 2.25, instruction count 6, allocated bytes for code 21 (MethodHash=81d2a3a7) for method BringUpTest_NotAndNeg:NotAndNeg(int,int):int (FullOpts)
 ; ============================================================
 
 Unwind Info:

libraries.crossgen2.windows.x64.checked.mch

+2 (+0.08%) : 30851.dasm - System.Globalization.CalendricalCalculationsHelper:SumLongSequenceOfPeriodicTerms(double):double (FullOpts)

@@ -174,7 +174,7 @@ G_M54838_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
        movaps   xmmword ptr [rsp+0x30], xmm9
        movaps   xmmword ptr [rsp+0x20], xmm10
        movaps   xmm6, xmm0
-						;; size=35 bbWeight=1 PerfScore 10.50
+						;; size=36 bbWeight=1 PerfScore 10.50
 G_M54838_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        movaps   xmm0, xmm6
        mulsd    xmm0, qword ptr [reloc @RWD00]
@@ -643,7 +643,7 @@ G_M54838_IG08:        ; bbWeight=1, epilog, nogc, extend
        movaps   xmm10, xmmword ptr [rsp+0x20]
        add      rsp, 120
        ret      
-						;; size=33 bbWeight=1 PerfScore 21.25
+						;; size=34 bbWeight=1 PerfScore 21.25
 RWD00  	dq	3FEDB8A420DC189Ah	;    0.9287892
 RWD08  	dq	4070E8C71B478423h	;    270.54861
 RWD16  	dq	400921FB54442D18h	;   3.14159265
@@ -788,7 +788,7 @@ RWD1120	dq	40F5FD9C72B020C5h	;    90073.778
 RWD1128	dq	4062433333333333h	;        146.1
 
 
-; Total bytes of code 2395, prolog size 32, PerfScore 1804.33, instruction count 413, allocated bytes for code 2395 (MethodHash=72e629c9) for method System.Globalization.CalendricalCalculationsHelper:SumLongSequenceOfPeriodicTerms(double):double (FullOpts)
+; Total bytes of code 2397, prolog size 33, PerfScore 1804.33, instruction count 413, allocated bytes for code 2397 (MethodHash=72e629c9) for method System.Globalization.CalendricalCalculationsHelper:SumLongSequenceOfPeriodicTerms(double):double (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -796,19 +796,19 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x20
+  SizeOfProlog      : 0x21
   CountOfUnwindCodes: 11
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x20 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM10 (10)
+    CodeOffset: 0x21 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM10 (10)
       Scaled Small Offset: 2 * 16 = 32 = 0x00020
-    CodeOffset: 0x1A UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM9 (9)
+    CodeOffset: 0x1B UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM9 (9)
       Scaled Small Offset: 3 * 16 = 48 = 0x00030
-    CodeOffset: 0x14 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM8 (8)
+    CodeOffset: 0x15 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM8 (8)
       Scaled Small Offset: 4 * 16 = 64 = 0x00040
-    CodeOffset: 0x0E UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM7 (7)
+    CodeOffset: 0x0F UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM7 (7)
       Scaled Small Offset: 5 * 16 = 80 = 0x00050
-    CodeOffset: 0x09 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM6 (6)
+    CodeOffset: 0x0A UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM6 (6)
       Scaled Small Offset: 6 * 16 = 96 = 0x00060
-    CodeOffset: 0x04 UnwindOp: UWOP_ALLOC_SMALL (2)     OpInfo: 14 * 8 + 8 = 120 = 0x78
+    CodeOffset: 0x05 UnwindOp: UWOP_ALLOC_SMALL (2)     OpInfo: 14 * 8 + 8 = 120 = 0x78

+1 (+0.11%) : 15946.dasm - System.Runtime.Intrinsics.Vector512:Create(byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte):System.Runtime.Intrinsics.Vector512`1[byte] (FullOpts)

@@ -245,12 +245,12 @@ G_M31854_IG03:        ; bbWeight=1, extend
        movups   xmmword ptr [rcx+0x30], xmm3
        mov      rax, rcx
        ; byrRegs +[rax]
-						;; size=436 bbWeight=1 PerfScore 175.25
+						;; size=437 bbWeight=1 PerfScore 175.25
 G_M31854_IG04:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 923, prolog size 0, PerfScore 381.00, instruction count 134, allocated bytes for code 923 (MethodHash=6feb8391) for method System.Runtime.Intrinsics.Vector512:Create(byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte):System.Runtime.Intrinsics.Vector512`1[byte] (FullOpts)
+; Total bytes of code 924, prolog size 0, PerfScore 381.00, instruction count 134, allocated bytes for code 924 (MethodHash=6feb8391) for method System.Runtime.Intrinsics.Vector512:Create(byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte):System.Runtime.Intrinsics.Vector512`1[byte] (FullOpts)
 ; ============================================================
 
 Unwind Info:

+1 (+0.23%) : 15798.dasm - System.Runtime.Intrinsics.Vector256:Create(byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte):System.Runtime.Intrinsics.Vector256`1[byte] (FullOpts)

@@ -126,12 +126,12 @@ G_M51694_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0002 {rcx}, byr
        movups   xmmword ptr [rcx+0x10], xmm1
        mov      rax, rcx
        ; byrRegs +[rax]
-						;; size=438 bbWeight=1 PerfScore 186.00
+						;; size=439 bbWeight=1 PerfScore 186.00
 G_M51694_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 439, prolog size 0, PerfScore 187.00, instruction count 68, allocated bytes for code 439 (MethodHash=4b683611) for method System.Runtime.Intrinsics.Vector256:Create(byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte):System.Runtime.Intrinsics.Vector256`1[byte] (FullOpts)
+; Total bytes of code 440, prolog size 0, PerfScore 187.00, instruction count 68, allocated bytes for code 440 (MethodHash=4b683611) for method System.Runtime.Intrinsics.Vector256:Create(byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte):System.Runtime.Intrinsics.Vector256`1[byte] (FullOpts)
 ; ============================================================
 
 Unwind Info:

+6 (+85.71%) : 11664.dasm - System.UInt32:System.Numerics.IShiftOperators.op_LeftShift(uint,int):uint (FullOpts)

@@ -16,16 +16,16 @@
 
 G_M41089_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
        mov      eax, ecx
-						;; size=2 bbWeight=1 PerfScore 0.25
+						;; size=4 bbWeight=1 PerfScore 0.25
 G_M41089_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        mov      ecx, edx
        shl      eax, cl
-						;; size=4 bbWeight=1 PerfScore 2.25
+						;; size=8 bbWeight=1 PerfScore 2.25
 G_M41089_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 7, prolog size 0, PerfScore 3.50, instruction count 4, allocated bytes for code 7 (MethodHash=9e7e5f7e) for method System.UInt32:System.Numerics.IShiftOperators<System.UInt32,System.Int32,System.UInt32>.op_LeftShift(uint,int):uint (FullOpts)
+; Total bytes of code 13, prolog size 0, PerfScore 3.50, instruction count 4, allocated bytes for code 13 (MethodHash=9e7e5f7e) for method System.UInt32:System.Numerics.IShiftOperators<System.UInt32,System.Int32,System.UInt32>.op_LeftShift(uint,int):uint (FullOpts)
 ; ============================================================
 
 Unwind Info:

+34 (+87.18%) : 170393.dasm - Microsoft.CodeAnalysis.CachingBase`1[System.__Canon]:AlignSize(int):int (FullOpts)

@@ -34,12 +34,12 @@ G_M65205_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        sar      eax, 16
        or       eax, edx
        inc      eax
-						;; size=38 bbWeight=1 PerfScore 5.50
+						;; size=72 bbWeight=1 PerfScore 5.50
 G_M65205_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 39, prolog size 0, PerfScore 6.50, instruction count 18, allocated bytes for code 39 (MethodHash=8bf1014a) for method Microsoft.CodeAnalysis.CachingBase`1[System.__Canon]:AlignSize(int):int (FullOpts)
+; Total bytes of code 73, prolog size 0, PerfScore 6.50, instruction count 18, allocated bytes for code 73 (MethodHash=8bf1014a) for method Microsoft.CodeAnalysis.CachingBase`1[System.__Canon]:AlignSize(int):int (FullOpts)
 ; ============================================================
 
 Unwind Info:

+8 (+88.89%) : 126160.dasm - Microsoft.Diagnostics.Tracing.Ctf.IntHelpers:AlignDown(int,int):int (FullOpts)

@@ -21,12 +21,12 @@ G_M35496_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        mov      eax, edx
        not      eax
        and      eax, ecx
-						;; size=8 bbWeight=1 PerfScore 1.00
+						;; size=16 bbWeight=1 PerfScore 1.00
 G_M35496_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 9, prolog size 0, PerfScore 2.00, instruction count 5, allocated bytes for code 9 (MethodHash=596f7557) for method Microsoft.Diagnostics.Tracing.Ctf.IntHelpers:AlignDown(int,int):int (FullOpts)
+; Total bytes of code 17, prolog size 0, PerfScore 2.00, instruction count 5, allocated bytes for code 17 (MethodHash=596f7557) for method Microsoft.Diagnostics.Tracing.Ctf.IntHelpers:AlignDown(int,int):int (FullOpts)
 ; ============================================================
 
 Unwind Info:

libraries.pmi.windows.x64.checked.mch

-1 (-16.67%) : 29280.dasm - System.Runtime.Intrinsics.X86.Popcnt+X64:get_IsSupported():ubyte (FullOpts)

@@ -14,13 +14,13 @@
 G_M37565_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
 						;; size=0 bbWeight=1 PerfScore 0.00
 G_M37565_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       mov      eax, 1
-						;; size=5 bbWeight=1 PerfScore 0.25
+       xor      eax, eax
+						;; size=4 bbWeight=1 PerfScore 0.25
 G_M37565_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 6, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 6 (MethodHash=4c306d42) for method System.Runtime.Intrinsics.X86.Popcnt+X64:get_IsSupported():ubyte (FullOpts)
+; Total bytes of code 5, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 5 (MethodHash=4c306d42) for method System.Runtime.Intrinsics.X86.Popcnt+X64:get_IsSupported():ubyte (FullOpts)
 ; ============================================================
 
 Unwind Info:

+1 (+0.14%) : 22972.dasm - System.Numerics.Matrix4x4+Impl:Transform(byref,byref):System.Numerics.Matrix4x4+Impl (FullOpts)

@@ -199,14 +199,14 @@ G_M8955_IG03:        ; bbWeight=1, extend
        vmovups  xmmword ptr [rcx+0x30], xmm0
        mov      rax, rcx
        ; byrRegs +[rax]
-						;; size=354 bbWeight=1 PerfScore 161.25
+						;; size=355 bbWeight=1 PerfScore 161.25
 G_M8955_IG04:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 RWD00  	dd	3F800000h		;         1
 
 
-; Total bytes of code 693, prolog size 0, PerfScore 345.25, instruction count 118, allocated bytes for code 693 (MethodHash=5906dd04) for method System.Numerics.Matrix4x4+Impl:Transform(byref,byref):System.Numerics.Matrix4x4+Impl (FullOpts)
+; Total bytes of code 694, prolog size 0, PerfScore 345.25, instruction count 118, allocated bytes for code 694 (MethodHash=5906dd04) for method System.Numerics.Matrix4x4+Impl:Transform(byref,byref):System.Numerics.Matrix4x4+Impl (FullOpts)
 ; ============================================================
 
 Unwind Info:

+5 (+0.16%) : 28844.dasm - System.Runtime.Intrinsics.X86.Avx512F:Shuffle(System.Runtime.Intrinsics.Vector512`1[float],System.Runtime.Intrinsics.Vector512`1[float],ubyte):System.Runtime.Intrinsics.Vector512`1[float] (FullOpts)

@@ -29,7 +29,7 @@ G_M43564_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0106 {rcx rdx r
        ; byrRegs -[rdx]
        add      r8, rdx
        jmp      r8
-						;; size=40 bbWeight=1 PerfScore 14.00
+						;; size=44 bbWeight=1 PerfScore 14.00
 G_M43564_IG03:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0002 {rcx}, byref
        vshufps  zmm0, zmm0, zmm1, 0
        jmp      G_M43564_IG259
@@ -1058,7 +1058,7 @@ G_M43564_IG259:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0002 {rcx}, by
        vmovups  zmmword ptr [rcx], zmm0
        mov      rax, rcx
        ; byrRegs +[rax]
-						;; size=9 bbWeight=1 PerfScore 2.25
+						;; size=10 bbWeight=1 PerfScore 2.25
 G_M43564_IG260:        ; bbWeight=1, epilog, nogc, extend
        vzeroupper 
        ret      
@@ -1321,7 +1321,7 @@ RWD00  	dd	G_M43564_IG03 - G_M43564_IG02
        	dd	G_M43564_IG258 - G_M43564_IG02
 
 
-; Total bytes of code 3083, prolog size 0, PerfScore 786.25, instruction count 524, allocated bytes for code 3083 (MethodHash=20a155d3) for method System.Runtime.Intrinsics.X86.Avx512F:Shuffle(System.Runtime.Intrinsics.Vector512`1[float],System.Runtime.Intrinsics.Vector512`1[float],ubyte):System.Runtime.Intrinsics.Vector512`1[float] (FullOpts)
+; Total bytes of code 3088, prolog size 0, PerfScore 786.25, instruction count 524, allocated bytes for code 3088 (MethodHash=20a155d3) for method System.Runtime.Intrinsics.X86.Avx512F:Shuffle(System.Runtime.Intrinsics.Vector512`1[float],System.Runtime.Intrinsics.Vector512`1[float],ubyte):System.Runtime.Intrinsics.Vector512`1[float] (FullOpts)
 ; ============================================================
 
 Unwind Info:

+34 (+87.18%) : 137491.dasm - Microsoft.CodeAnalysis.CachingBase`1[ubyte]:AlignSize(int):int (FullOpts)

@@ -32,12 +32,12 @@ G_M17100_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        sar      eax, 16
        or       eax, ecx
        inc      eax
-						;; size=38 bbWeight=1 PerfScore 5.50
+						;; size=72 bbWeight=1 PerfScore 5.50
 G_M17100_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 39, prolog size 0, PerfScore 6.50, instruction count 18, allocated bytes for code 39 (MethodHash=31c8bd33) for method Microsoft.CodeAnalysis.CachingBase`1[ubyte]:AlignSize(int):int (FullOpts)
+; Total bytes of code 73, prolog size 0, PerfScore 6.50, instruction count 18, allocated bytes for code 73 (MethodHash=31c8bd33) for method Microsoft.CodeAnalysis.CachingBase`1[ubyte]:AlignSize(int):int (FullOpts)
 ; ============================================================
 
 Unwind Info:

+5 (+166.67%) : 29884.dasm - System.Runtime.Intrinsics.X86.X86Serialize+X64:get_IsSupported():ubyte (FullOpts)

@@ -14,13 +14,13 @@
 G_M34763_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
 						;; size=0 bbWeight=1 PerfScore 0.00
 G_M34763_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+       mov      eax, 1
+						;; size=7 bbWeight=1 PerfScore 0.25
 G_M34763_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 3, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 3 (MethodHash=bcd77834) for method System.Runtime.Intrinsics.X86.X86Serialize+X64:get_IsSupported():ubyte (FullOpts)
+; Total bytes of code 8, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 8 (MethodHash=bcd77834) for method System.Runtime.Intrinsics.X86.X86Serialize+X64:get_IsSupported():ubyte (FullOpts)
 ; ============================================================
 
 Unwind Info:

+5 (+166.67%) : 29207.dasm - System.Runtime.Intrinsics.X86.AvxVnni+X64:get_IsSupported():ubyte (FullOpts)

@@ -14,13 +14,13 @@
 G_M31227_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
 						;; size=0 bbWeight=1 PerfScore 0.00
 G_M31227_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+       mov      eax, 1
+						;; size=7 bbWeight=1 PerfScore 0.25
 G_M31227_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 3, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 3 (MethodHash=e6278604) for method System.Runtime.Intrinsics.X86.AvxVnni+X64:get_IsSupported():ubyte (FullOpts)
+; Total bytes of code 8, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 8 (MethodHash=e6278604) for method System.Runtime.Intrinsics.X86.AvxVnni+X64:get_IsSupported():ubyte (FullOpts)
 ; ============================================================
 
 Unwind Info:

libraries_tests.run.windows.x64.Release.mch

+2 (+0.08%) : 600803.dasm - System.Globalization.CalendricalCalculationsHelper:SumLongSequenceOfPeriodicTerms(double):double (Instrumented Tier1)

@@ -176,7 +176,7 @@ G_M54838_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
        vmovaps  xmmword ptr [rsp+0x40], xmm9
        vmovaps  xmmword ptr [rsp+0x30], xmm10
        vmovaps  xmm6, xmm0
-						;; size=41 bbWeight=1 PerfScore 10.50
+						;; size=42 bbWeight=1 PerfScore 10.50
 G_M54838_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        vmulsd   xmm0, xmm6, qword ptr [reloc @RWD00]
        vaddsd   xmm0, xmm0, qword ptr [reloc @RWD08]
@@ -622,7 +622,7 @@ G_M54838_IG09:        ; bbWeight=1, epilog, nogc, extend
        vmovaps  xmm10, xmmword ptr [rsp+0x30]
        add      rsp, 136
        ret      
-						;; size=38 bbWeight=1 PerfScore 21.25
+						;; size=39 bbWeight=1 PerfScore 21.25
 RWD00  	dq	3FEDB8A420DC189Ah	;    0.9287892
 RWD08  	dq	4070E8C71B478423h	;    270.54861
 RWD16  	dq	400921FB54442D18h	;   3.14159265
@@ -767,7 +767,7 @@ RWD1120	dq	40F5FD9C72B020C5h	;    90073.778
 RWD1128	dq	4062433333333333h	;        146.1
 
 
-; Total bytes of code 2372, prolog size 37, PerfScore 1766.08, instruction count 388, allocated bytes for code 2372 (MethodHash=72e629c9) for method System.Globalization.CalendricalCalculationsHelper:SumLongSequenceOfPeriodicTerms(double):double (Instrumented Tier1)
+; Total bytes of code 2374, prolog size 38, PerfScore 1766.08, instruction count 388, allocated bytes for code 2374 (MethodHash=72e629c9) for method System.Globalization.CalendricalCalculationsHelper:SumLongSequenceOfPeriodicTerms(double):double (Instrumented Tier1)
 ; ============================================================
 
 Unwind Info:
@@ -775,20 +775,20 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x25
+  SizeOfProlog      : 0x26
   CountOfUnwindCodes: 12
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x25 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM10 (10)
+    CodeOffset: 0x26 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM10 (10)
       Scaled Small Offset: 3 * 16 = 48 = 0x00030
-    CodeOffset: 0x1F UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM9 (9)
+    CodeOffset: 0x20 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM9 (9)
       Scaled Small Offset: 4 * 16 = 64 = 0x00040
-    CodeOffset: 0x19 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM8 (8)
+    CodeOffset: 0x1A UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM8 (8)
       Scaled Small Offset: 5 * 16 = 80 = 0x00050
-    CodeOffset: 0x13 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM7 (7)
+    CodeOffset: 0x14 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM7 (7)
       Scaled Small Offset: 6 * 16 = 96 = 0x00060
-    CodeOffset: 0x0D UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM6 (6)
+    CodeOffset: 0x0E UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM6 (6)
       Scaled Small Offset: 7 * 16 = 112 = 0x00070
-    CodeOffset: 0x07 UnwindOp: UWOP_ALLOC_LARGE (1)     OpInfo: 0 - Scaled small  
+    CodeOffset: 0x08 UnwindOp: UWOP_ALLOC_LARGE (1)     OpInfo: 0 - Scaled small  
       Size: 17 * 8 = 136 = 0x00088

+1 (+0.23%) : 497356.dasm - System.Runtime.Intrinsics.VectorMath:LogSingle[System.Runtime.Intrinsics.Vector256`1[float],System.Runtime.Intrinsics.Vector256`1[int],System.Runtime.Intrinsics.Vector256`1[uint]](System.Runtime.Intrinsics.Vector256`1[float]):System.Runtime.Intrinsics.Vector256`1[float] (Tier1)

@@ -138,7 +138,7 @@ G_M36528_IG04:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0006 {rcx rdx},
        vmovups  ymmword ptr [rcx], ymm2
        mov      rax, rcx
        ; byrRegs +[rax]
-						;; size=243 bbWeight=1 PerfScore 117.75
+						;; size=244 bbWeight=1 PerfScore 117.75
 G_M36528_IG05:        ; bbWeight=1, epilog, nogc, extend
        vzeroupper 
        ret      
@@ -167,7 +167,7 @@ RWD232 	dd	BF000002h		;      -0.5
 RWD236 	dd	3F317218h		;  0.693147
 
 
-; Total bytes of code 432, prolog size 0, PerfScore 173.03, instruction count 68, allocated bytes for code 432 (MethodHash=421e714f) for method System.Runtime.Intrinsics.VectorMath:LogSingle[System.Runtime.Intrinsics.Vector256`1[float],System.Runtime.Intrinsics.Vector256`1[int],System.Runtime.Intrinsics.Vector256`1[uint]](System.Runtime.Intrinsics.Vector256`1[float]):System.Runtime.Intrinsics.Vector256`1[float] (Tier1)
+; Total bytes of code 433, prolog size 0, PerfScore 173.03, instruction count 68, allocated bytes for code 433 (MethodHash=421e714f) for method System.Runtime.Intrinsics.VectorMath:LogSingle[System.Runtime.Intrinsics.Vector256`1[float],System.Runtime.Intrinsics.Vector256`1[int],System.Runtime.Intrinsics.Vector256`1[uint]](System.Runtime.Intrinsics.Vector256`1[float]):System.Runtime.Intrinsics.Vector256`1[float] (Tier1)
 ; ============================================================
 
 Unwind Info:

+1 (+0.23%) : 484509.dasm - System.Numerics.Tensors.TensorPrimitives+LogOperatorSingle:Invoke(System.Runtime.Intrinsics.Vector256`1[float]):System.Runtime.Intrinsics.Vector256`1[float] (Tier1)

@@ -108,7 +108,7 @@ G_M39372_IG04:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0006 {rcx rdx},
        vmovups  ymmword ptr [rcx], ymm2
        mov      rax, rcx
        ; byrRegs +[rax]
-						;; size=243 bbWeight=1 PerfScore 117.75
+						;; size=244 bbWeight=1 PerfScore 117.75
 G_M39372_IG05:        ; bbWeight=1, epilog, nogc, extend
        vzeroupper 
        ret      
@@ -137,7 +137,7 @@ RWD232 	dd	BF000002h		;      -0.5
 RWD236 	dd	3F317218h		;  0.693147
 
 
-; Total bytes of code 432, prolog size 0, PerfScore 161.72, instruction count 68, allocated bytes for code 432 (MethodHash=a2e86633) for method System.Numerics.Tensors.TensorPrimitives+LogOperatorSingle:Invoke(System.Runtime.Intrinsics.Vector256`1[float]):System.Runtime.Intrinsics.Vector256`1[float] (Tier1)
+; Total bytes of code 433, prolog size 0, PerfScore 161.72, instruction count 68, allocated bytes for code 433 (MethodHash=a2e86633) for method System.Numerics.Tensors.TensorPrimitives+LogOperatorSingle:Invoke(System.Runtime.Intrinsics.Vector256`1[float]):System.Runtime.Intrinsics.Vector256`1[float] (Tier1)
 ; ============================================================
 
 Unwind Info:

+7 (+87.50%) : 344948.dasm - ManagedTests.DynamicCSharp.Conformance.dynamic.declarations.backwardscompatible.dynamictypedeclared017.dynamictypedeclared017.A:MainMethod():int (Tier0)

@@ -12,16 +12,16 @@
 G_M39926_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
        push     rbp
        mov      rbp, rsp
-						;; size=4 bbWeight=1 PerfScore 1.25
+						;; size=7 bbWeight=1 PerfScore 1.25
 G_M39926_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+						;; size=4 bbWeight=1 PerfScore 0.25
 G_M39926_IG03:        ; bbWeight=1, epilog, nogc, extend
        pop      rbp
        ret      
-						;; size=2 bbWeight=1 PerfScore 1.50
+						;; size=4 bbWeight=1 PerfScore 1.50
 
-; Total bytes of code 8, prolog size 4, PerfScore 3.00, instruction count 5, allocated bytes for code 8 (MethodHash=d45d6409) for method ManagedTests.DynamicCSharp.Conformance.dynamic.declarations.backwardscompatible.dynamictypedeclared017.dynamictypedeclared017.A:MainMethod():int (Tier0)
+; Total bytes of code 15, prolog size 7, PerfScore 3.00, instruction count 5, allocated bytes for code 15 (MethodHash=d45d6409) for method ManagedTests.DynamicCSharp.Conformance.dynamic.declarations.backwardscompatible.dynamictypedeclared017.dynamictypedeclared017.A:MainMethod():int (Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -29,9 +29,9 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x01
+  SizeOfProlog      : 0x03
   CountOfUnwindCodes: 1
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+    CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)

+7 (+87.50%) : 561668.dasm - System.Runtime.Intrinsics.Vector128`1[System.Int128]:get_IsSupported():ubyte (Tier0)

@@ -12,16 +12,16 @@
 G_M6228_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
        push     rbp
        mov      rbp, rsp
-						;; size=4 bbWeight=1 PerfScore 1.25
+						;; size=7 bbWeight=1 PerfScore 1.25
 G_M6228_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+						;; size=4 bbWeight=1 PerfScore 0.25
 G_M6228_IG03:        ; bbWeight=1, epilog, nogc, extend
        pop      rbp
        ret      
-						;; size=2 bbWeight=1 PerfScore 1.50
+						;; size=4 bbWeight=1 PerfScore 1.50
 
-; Total bytes of code 8, prolog size 4, PerfScore 3.00, instruction count 5, allocated bytes for code 8 (MethodHash=9c00e7ab) for method System.Runtime.Intrinsics.Vector128`1[System.Int128]:get_IsSupported():ubyte (Tier0)
+; Total bytes of code 15, prolog size 7, PerfScore 3.00, instruction count 5, allocated bytes for code 15 (MethodHash=9c00e7ab) for method System.Runtime.Intrinsics.Vector128`1[System.Int128]:get_IsSupported():ubyte (Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -29,9 +29,9 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x01
+  SizeOfProlog      : 0x03
   CountOfUnwindCodes: 1
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+    CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)

+7 (+87.50%) : 259812.dasm - System.Runtime.CompilerServices.RuntimeHelpers:IsBitwiseEquatable[System.Collections.Frozen.Tests.SimpleNonComparableStruct]():ubyte (Tier0)

@@ -12,16 +12,16 @@
 G_M54291_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
        push     rbp
        mov      rbp, rsp
-						;; size=4 bbWeight=1 PerfScore 1.25
+						;; size=7 bbWeight=1 PerfScore 1.25
 G_M54291_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+						;; size=4 bbWeight=1 PerfScore 0.25
 G_M54291_IG03:        ; bbWeight=1, epilog, nogc, extend
        pop      rbp
        ret      
-						;; size=2 bbWeight=1 PerfScore 1.50
+						;; size=4 bbWeight=1 PerfScore 1.50
 
-; Total bytes of code 8, prolog size 4, PerfScore 3.00, instruction count 5, allocated bytes for code 8 (MethodHash=c4122bec) for method System.Runtime.CompilerServices.RuntimeHelpers:IsBitwiseEquatable[System.Collections.Frozen.Tests.SimpleNonComparableStruct]():ubyte (Tier0)
+; Total bytes of code 15, prolog size 7, PerfScore 3.00, instruction count 5, allocated bytes for code 15 (MethodHash=c4122bec) for method System.Runtime.CompilerServices.RuntimeHelpers:IsBitwiseEquatable[System.Collections.Frozen.Tests.SimpleNonComparableStruct]():ubyte (Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -29,9 +29,9 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x01
+  SizeOfProlog      : 0x03
   CountOfUnwindCodes: 1
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+    CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)

libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch

+1 (+0.14%) : 179042.dasm - System.Numerics.Matrix4x4+Impl:Transform(byref,byref):System.Numerics.Matrix4x4+Impl (FullOpts)

@@ -199,14 +199,14 @@ G_M8955_IG03:        ; bbWeight=1, extend
        vmovups  xmmword ptr [rcx+0x30], xmm0
        mov      rax, rcx
        ; byrRegs +[rax]
-						;; size=354 bbWeight=1 PerfScore 161.25
+						;; size=355 bbWeight=1 PerfScore 161.25
 G_M8955_IG04:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 RWD00  	dd	3F800000h		;         1
 
 
-; Total bytes of code 693, prolog size 0, PerfScore 345.25, instruction count 118, allocated bytes for code 693 (MethodHash=5906dd04) for method System.Numerics.Matrix4x4+Impl:Transform(byref,byref):System.Numerics.Matrix4x4+Impl (FullOpts)
+; Total bytes of code 694, prolog size 0, PerfScore 345.25, instruction count 118, allocated bytes for code 694 (MethodHash=5906dd04) for method System.Numerics.Matrix4x4+Impl:Transform(byref,byref):System.Numerics.Matrix4x4+Impl (FullOpts)
 ; ============================================================
 
 Unwind Info:

+1 (+0.15%) : 212365.dasm - System.Runtime.Intrinsics.VectorMath:LogDouble[System.Runtime.Intrinsics.Vector512`1[double],System.Runtime.Intrinsics.Vector512`1[long],System.Runtime.Intrinsics.Vector512`1[ulong]](System.Runtime.Intrinsics.Vector512`1[double]):System.Runtime.Intrinsics.Vector512`1[double] (FullOpts)

@@ -161,7 +161,7 @@ G_M22781_IG04:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0006 {rcx rdx},
        vmovups  zmmword ptr [rcx], zmm2
        mov      rax, rcx
        ; byrRegs +[rax]
-						;; size=437 bbWeight=1 PerfScore 201.42
+						;; size=438 bbWeight=1 PerfScore 201.42
 G_M22781_IG05:        ; bbWeight=1, epilog, nogc, extend
        vzeroupper 
        ret      
@@ -203,7 +203,7 @@ RWD488 	dq	BF2BD0105C610CA8h	; -0.00021219444
 RWD496 	dq	3FE6300000000000h	;  0.693359375
 
 
-; Total bytes of code 665, prolog size 0, PerfScore 237.75, instruction count 91, allocated bytes for code 667 (MethodHash=3c50a702) for method System.Runtime.Intrinsics.VectorMath:LogDouble[System.Runtime.Intrinsics.Vector512`1[double],System.Runtime.Intrinsics.Vector512`1[long],System.Runtime.Intrinsics.Vector512`1[ulong]](System.Runtime.Intrinsics.Vector512`1[double]):System.Runtime.Intrinsics.Vector512`1[double] (FullOpts)
+; Total bytes of code 666, prolog size 0, PerfScore 237.75, instruction count 91, allocated bytes for code 668 (MethodHash=3c50a702) for method System.Runtime.Intrinsics.VectorMath:LogDouble[System.Runtime.Intrinsics.Vector512`1[double],System.Runtime.Intrinsics.Vector512`1[long],System.Runtime.Intrinsics.Vector512`1[ulong]](System.Runtime.Intrinsics.Vector512`1[double]):System.Runtime.Intrinsics.Vector512`1[double] (FullOpts)
 ; ============================================================
 
 Unwind Info:

+1 (+0.15%) : 211948.dasm - System.Runtime.Intrinsics.VectorMath:Log2Double[System.Runtime.Intrinsics.Vector512`1[double],System.Runtime.Intrinsics.Vector512`1[long],System.Runtime.Intrinsics.Vector512`1[ulong]](System.Runtime.Intrinsics.Vector512`1[double]):System.Runtime.Intrinsics.Vector512`1[double] (FullOpts)

@@ -161,7 +161,7 @@ G_M63855_IG04:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0006 {rcx rdx},
        vmovups  zmmword ptr [rcx], zmm2
        mov      rax, rcx
        ; byrRegs +[rax]
-						;; size=437 bbWeight=1 PerfScore 201.42
+						;; size=438 bbWeight=1 PerfScore 201.42
 G_M63855_IG05:        ; bbWeight=1, epilog, nogc, extend
        vzeroupper 
        ret      
@@ -203,7 +203,7 @@ RWD488 	dq	3ECB295C17F0BBBEh	; 3.23791045e-06
 RWD496 	dq	3FF7154400000000h	;    1.4426918
 
 
-; Total bytes of code 665, prolog size 0, PerfScore 237.75, instruction count 91, allocated bytes for code 667 (MethodHash=4d7b0690) for method System.Runtime.Intrinsics.VectorMath:Log2Double[System.Runtime.Intrinsics.Vector512`1[double],System.Runtime.Intrinsics.Vector512`1[long],System.Runtime.Intrinsics.Vector512`1[ulong]](System.Runtime.Intrinsics.Vector512`1[double]):System.Runtime.Intrinsics.Vector512`1[double] (FullOpts)
+; Total bytes of code 666, prolog size 0, PerfScore 237.75, instruction count 91, allocated bytes for code 668 (MethodHash=4d7b0690) for method System.Runtime.Intrinsics.VectorMath:Log2Double[System.Runtime.Intrinsics.Vector512`1[double],System.Runtime.Intrinsics.Vector512`1[long],System.Runtime.Intrinsics.Vector512`1[ulong]](System.Runtime.Intrinsics.Vector512`1[double]):System.Runtime.Intrinsics.Vector512`1[double] (FullOpts)
 ; ============================================================
 
 Unwind Info:

+4 (+80.00%) : 18935.dasm - LibraryImportGenerator.IntegrationTests.FunctionPointerTests:g__Callback|2_0(int,int):int (FullOpts)

@@ -18,12 +18,12 @@ G_M41111_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
 G_M41111_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        mov      eax, ecx
        sub      eax, edx
-						;; size=4 bbWeight=1 PerfScore 0.50
+						;; size=8 bbWeight=1 PerfScore 0.50
 G_M41111_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 5, prolog size 0, PerfScore 1.50, instruction count 3, allocated bytes for code 5 (MethodHash=f1415f68) for method LibraryImportGenerator.IntegrationTests.FunctionPointerTests:<CalledWithArgumentsInOrder>g__Callback|2_0(int,int):int (FullOpts)
+; Total bytes of code 9, prolog size 0, PerfScore 1.50, instruction count 3, allocated bytes for code 9 (MethodHash=f1415f68) for method LibraryImportGenerator.IntegrationTests.FunctionPointerTests:<CalledWithArgumentsInOrder>g__Callback|2_0(int,int):int (FullOpts)
 ; ============================================================
 
 Unwind Info:

+14 (+82.35%) : 146650.dasm - System.Linq.Parallel.Tests.JoinTests+LeftOrderingCollisionTest+<>c:b__0_0(int):int:this (FullOpts)

@@ -23,12 +23,12 @@ G_M5520_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        mov      ecx, edx
        sub      ecx, eax
        mov      eax, ecx
-						;; size=16 bbWeight=1 PerfScore 2.00
+						;; size=30 bbWeight=1 PerfScore 2.00
 G_M5520_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 17, prolog size 0, PerfScore 3.00, instruction count 8, allocated bytes for code 17 (MethodHash=3061ea6f) for method System.Linq.Parallel.Tests.JoinTests+LeftOrderingCollisionTest+<>c:<ReorderLeft>b__0_0(int):int:this (FullOpts)
+; Total bytes of code 31, prolog size 0, PerfScore 3.00, instruction count 8, allocated bytes for code 31 (MethodHash=3061ea6f) for method System.Linq.Parallel.Tests.JoinTests+LeftOrderingCollisionTest+<>c:<ReorderLeft>b__0_0(int):int:this (FullOpts)
 ; ============================================================
 
 Unwind Info:

+6 (+85.71%) : 176318.dasm - System.Numerics.Tensors.TensorPrimitives+InvertedBinaryOperator`2[System.Numerics.Tensors.TensorPrimitives+DivideOperator`1[uint],uint]:Invoke(uint,uint):uint (FullOpts)

@@ -20,12 +20,12 @@ G_M23137_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        mov      eax, edx
        xor      edx, edx
        div      edx:eax, ecx
-						;; size=6 bbWeight=1 PerfScore 25.50
+						;; size=12 bbWeight=1 PerfScore 25.50
 G_M23137_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 7, prolog size 0, PerfScore 26.50, instruction count 4, allocated bytes for code 7 (MethodHash=86f0a59e) for method System.Numerics.Tensors.TensorPrimitives+InvertedBinaryOperator`2[System.Numerics.Tensors.TensorPrimitives+DivideOperator`1[uint],uint]:Invoke(uint,uint):uint (FullOpts)
+; Total bytes of code 13, prolog size 0, PerfScore 26.50, instruction count 4, allocated bytes for code 13 (MethodHash=86f0a59e) for method System.Numerics.Tensors.TensorPrimitives+InvertedBinaryOperator`2[System.Numerics.Tensors.TensorPrimitives+DivideOperator`1[uint],uint]:Invoke(uint,uint):uint (FullOpts)
 ; ============================================================
 
 Unwind Info:

realworld.run.windows.x64.checked.mch

+1 (+0.11%) : 1372.dasm - BepuPhysics.Collidables.MeshInertiaHelper:ComputeTetrahedronContribution(byref,byref,byref,float,byref) (FullOpts)

@@ -56,7 +56,7 @@
 G_M56806_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
        mov      rax, bword ptr [rsp+0x28]
        ; byrRegs +[rax]
-						;; size=5 bbWeight=1 PerfScore 1.00
+						;; size=6 bbWeight=1 PerfScore 1.00
 G_M56806_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0107 {rax rcx rdx r8}, byref
        ; byrRegs +[rcx rdx r8]
        vmulss   xmm0, xmm3, dword ptr [reloc @RWD00]
@@ -236,7 +236,7 @@ RWD08  	dd	C0000000h		;        -2
 RWD12  	dd	40000000h		;         2
 
 
-; Total bytes of code 870, prolog size 0, PerfScore 531.00, instruction count 165, allocated bytes for code 870 (MethodHash=1b132219) for method BepuPhysics.Collidables.MeshInertiaHelper:ComputeTetrahedronContribution(byref,byref,byref,float,byref) (FullOpts)
+; Total bytes of code 871, prolog size 0, PerfScore 531.00, instruction count 165, allocated bytes for code 871 (MethodHash=1b132219) for method BepuPhysics.Collidables.MeshInertiaHelper:ComputeTetrahedronContribution(byref,byref,byref,float,byref) (FullOpts)
 ; ============================================================
 
 Unwind Info:

+1 (+0.21%) : 1203.dasm - BepuPhysics.Collidables.Compound:GetRotatedChildPose(byref,byref,byref,byref) (FullOpts)

@@ -100,7 +100,7 @@ G_M10677_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0306 {rcx rdx r
        vmulps   ymm17, ymm3, ymm0
        vmulps   ymm0, ymm5, ymm0
        vmulps   ymm18, ymm3, ymm2
-						;; size=275 bbWeight=1 PerfScore 256.50
+						;; size=276 bbWeight=1 PerfScore 256.50
 G_M10677_IG03:        ; bbWeight=1, extend
        vmulps   ymm2, ymm5, ymm2
        vmulps   ymm4, ymm5, ymm4
@@ -146,7 +146,7 @@ G_M10677_IG04:        ; bbWeight=1, epilog, nogc, extend
 RWD00  	dq	3F8000003F800000h, 3F8000003F800000h, 3F8000003F800000h, 3F8000003F800000h
 
 
-; Total bytes of code 476, prolog size 0, PerfScore 393.50, instruction count 97, allocated bytes for code 476 (MethodHash=d5c8d64a) for method BepuPhysics.Collidables.Compound:GetRotatedChildPose(byref,byref,byref,byref) (FullOpts)
+; Total bytes of code 477, prolog size 0, PerfScore 393.50, instruction count 97, allocated bytes for code 477 (MethodHash=d5c8d64a) for method BepuPhysics.Collidables.Compound:GetRotatedChildPose(byref,byref,byref,byref) (FullOpts)
 ; ============================================================
 
 Unwind Info:

+5 (+0.36%) : 1326.dasm - BepuUtilities.Symmetric6x6Wide:LDLTSolve(byref,byref,byref,byref,byref,byref,byref) (FullOpts)

@@ -59,7 +59,7 @@ G_M52182_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
        ; byrRegs +[rax]
        mov      r10, bword ptr [rsp+0x90]
        ; byrRegs +[r10]
-						;; size=57 bbWeight=1 PerfScore 13.25
+						;; size=61 bbWeight=1 PerfScore 13.25
 G_M52182_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0F07 {rax rcx rdx r8 r9 r10 r11}, byref
        ; byrRegs +[rcx rdx r8-r9]
        vmovups  ymm0, ymmword ptr [r8]
@@ -305,11 +305,11 @@ G_M52182_IG06:        ; bbWeight=1, epilog, nogc, extend
        vmovaps  xmm10, xmmword ptr [rsp]
        add      rsp, 88
        ret      
-						;; size=37 bbWeight=1 PerfScore 22.25
+						;; size=38 bbWeight=1 PerfScore 22.25
 RWD00  	dq	3F8000003F800000h, 3F8000003F800000h, 3F8000003F800000h, 3F8000003F800000h
 
 
-; Total bytes of code 1406, prolog size 33, PerfScore 914.50, instruction count 244, allocated bytes for code 1406 (MethodHash=9c303429) for method BepuUtilities.Symmetric6x6Wide:LDLTSolve(byref,byref,byref,byref,byref,byref,byref) (FullOpts)
+; Total bytes of code 1411, prolog size 34, PerfScore 914.50, instruction count 244, allocated bytes for code 1411 (MethodHash=9c303429) for method BepuUtilities.Symmetric6x6Wide:LDLTSolve(byref,byref,byref,byref,byref,byref,byref) (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -317,19 +317,19 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x21
+  SizeOfProlog      : 0x22
   CountOfUnwindCodes: 11
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x21 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM10 (10)
+    CodeOffset: 0x22 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM10 (10)
       Scaled Small Offset: 0 * 16 = 0 = 0x00000
-    CodeOffset: 0x1C UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM9 (9)
+    CodeOffset: 0x1D UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM9 (9)
       Scaled Small Offset: 1 * 16 = 16 = 0x00010
-    CodeOffset: 0x16 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM8 (8)
+    CodeOffset: 0x17 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM8 (8)
       Scaled Small Offset: 2 * 16 = 32 = 0x00020
-    CodeOffset: 0x10 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM7 (7)
+    CodeOffset: 0x11 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM7 (7)
       Scaled Small Offset: 3 * 16 = 48 = 0x00030
-    CodeOffset: 0x0A UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM6 (6)
+    CodeOffset: 0x0B UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM6 (6)
       Scaled Small Offset: 4 * 16 = 64 = 0x00040
-    CodeOffset: 0x04 UnwindOp: UWOP_ALLOC_SMALL (2)     OpInfo: 10 * 8 + 8 = 88 = 0x58
+    CodeOffset: 0x05 UnwindOp: UWOP_ALLOC_SMALL (2)     OpInfo: 10 * 8 + 8 = 88 = 0x58

+6 (+75.00%) : 8466.dasm - Microsoft.ML.Data.VectorDataViewType+<>c:<.ctor>b__4_0(int):ubyte:this (FullOpts)

@@ -19,12 +19,12 @@ G_M20425_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        mov      eax, edx
        not      eax
        shr      eax, 31
-						;; size=7 bbWeight=1 PerfScore 1.00
+						;; size=13 bbWeight=1 PerfScore 1.00
 G_M20425_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 8, prolog size 0, PerfScore 2.00, instruction count 4, allocated bytes for code 8 (MethodHash=067ab036) for method Microsoft.ML.Data.VectorDataViewType+<>c:<.ctor>b__4_0(int):ubyte:this (FullOpts)
+; Total bytes of code 14, prolog size 0, PerfScore 2.00, instruction count 4, allocated bytes for code 14 (MethodHash=067ab036) for method Microsoft.ML.Data.VectorDataViewType+<>c:<.ctor>b__4_0(int):ubyte:this (FullOpts)
 ; ============================================================
 
 Unwind Info:

+10 (+76.92%) : 20363.dasm - Microsoft.CodeAnalysis.CSharp.NullableWalker+LocalState:get_Capacity():int:this (FullOpts)

@@ -25,12 +25,12 @@ G_M6705_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0002 {rcx}, byre
        shr      ecx, 31
        add      eax, ecx
        sar      eax, 1
-						;; size=12 bbWeight=1 PerfScore 3.50
+						;; size=22 bbWeight=1 PerfScore 3.50
 G_M6705_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 13, prolog size 0, PerfScore 4.50, instruction count 6, allocated bytes for code 13 (MethodHash=ab86e5ce) for method Microsoft.CodeAnalysis.CSharp.NullableWalker+LocalState:get_Capacity():int:this (FullOpts)
+; Total bytes of code 23, prolog size 0, PerfScore 4.50, instruction count 6, allocated bytes for code 23 (MethodHash=ab86e5ce) for method Microsoft.CodeAnalysis.CSharp.NullableWalker+LocalState:get_Capacity():int:this (FullOpts)
 ; ============================================================
 
 Unwind Info:

+4 (+80.00%) : 17010.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.ErrorTypeSymbol:HasInlineArrayAttribute(byref):ubyte:this (FullOpts)

@@ -19,13 +19,13 @@ G_M52199_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0004 {rdx}, byr
        ; byrRegs +[rdx]
        xor      eax, eax
        mov      dword ptr [rdx], eax
-						;; size=4 bbWeight=1 PerfScore 1.25
+						;; size=8 bbWeight=1 PerfScore 1.25
 G_M52199_IG03:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, epilog, nogc
        ; byrRegs -[rdx]
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 5, prolog size 0, PerfScore 2.25, instruction count 3, allocated bytes for code 5 (MethodHash=79e23418) for method Microsoft.CodeAnalysis.CSharp.Symbols.ErrorTypeSymbol:HasInlineArrayAttribute(byref):ubyte:this (FullOpts)
+; Total bytes of code 9, prolog size 0, PerfScore 2.25, instruction count 3, allocated bytes for code 9 (MethodHash=79e23418) for method Microsoft.CodeAnalysis.CSharp.Symbols.ErrorTypeSymbol:HasInlineArrayAttribute(byref):ubyte:this (FullOpts)
 ; ============================================================
 
 Unwind Info:

smoke_tests.nativeaot.windows.x64.checked.mch

-1 (-16.67%) : 14296.dasm - Program:X86SerializeX64IsSupported():ubyte (FullOpts)

@@ -14,13 +14,13 @@
 G_M13406_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
 						;; size=0 bbWeight=1 PerfScore 0.00
 G_M13406_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       mov      eax, 1
-						;; size=5 bbWeight=1 PerfScore 0.25
+       xor      eax, eax
+						;; size=4 bbWeight=1 PerfScore 0.25
 G_M13406_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 6, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 6 (MethodHash=00a5cba1) for method Program:X86SerializeX64IsSupported():ubyte (FullOpts)
+; Total bytes of code 5, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 5 (MethodHash=00a5cba1) for method Program:X86SerializeX64IsSupported():ubyte (FullOpts)
 ; ============================================================
 
 Unwind Info:

-1 (-16.67%) : 21532.dasm - Program:AesX64IsSupported():ubyte (FullOpts)

@@ -14,13 +14,13 @@
 G_M55817_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
 						;; size=0 bbWeight=1 PerfScore 0.00
 G_M55817_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       mov      eax, 1
-						;; size=5 bbWeight=1 PerfScore 0.25
+       xor      eax, eax
+						;; size=4 bbWeight=1 PerfScore 0.25
 G_M55817_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 6, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 6 (MethodHash=c5da25f6) for method Program:AesX64IsSupported():ubyte (FullOpts)
+; Total bytes of code 5, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 5 (MethodHash=c5da25f6) for method Program:AesX64IsSupported():ubyte (FullOpts)
 ; ============================================================
 
 Unwind Info:

-1 (-16.67%) : 19229.dasm - Program:X86SerializeX64IsSupported():ubyte (FullOpts)

@@ -14,13 +14,13 @@
 G_M13406_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
 						;; size=0 bbWeight=1 PerfScore 0.00
 G_M13406_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       mov      eax, 1
-						;; size=5 bbWeight=1 PerfScore 0.25
+       xor      eax, eax
+						;; size=4 bbWeight=1 PerfScore 0.25
 G_M13406_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 6, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 6 (MethodHash=00a5cba1) for method Program:X86SerializeX64IsSupported():ubyte (FullOpts)
+; Total bytes of code 5, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 5 (MethodHash=00a5cba1) for method Program:X86SerializeX64IsSupported():ubyte (FullOpts)
 ; ============================================================
 
 Unwind Info:

+5 (+166.67%) : 19199.dasm - Program:AvxVnniX64IsSupported():ubyte (FullOpts)

@@ -14,13 +14,13 @@
 G_M60430_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
 						;; size=0 bbWeight=1 PerfScore 0.00
 G_M60430_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+       mov      eax, 1
+						;; size=7 bbWeight=1 PerfScore 0.25
 G_M60430_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 3, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 3 (MethodHash=e20b13f1) for method Program:AvxVnniX64IsSupported():ubyte (FullOpts)
+; Total bytes of code 8, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 8 (MethodHash=e20b13f1) for method Program:AvxVnniX64IsSupported():ubyte (FullOpts)
 ; ============================================================
 
 Unwind Info:

+5 (+166.67%) : 21515.dasm - Program:FmaX64IsSupported():ubyte (FullOpts)

@@ -14,13 +14,13 @@
 G_M2260_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
 						;; size=0 bbWeight=1 PerfScore 0.00
 G_M2260_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+       mov      eax, 1
+						;; size=7 bbWeight=1 PerfScore 0.25
 G_M2260_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 3, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 3 (MethodHash=36a7f72b) for method Program:FmaX64IsSupported():ubyte (FullOpts)
+; Total bytes of code 8, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 8 (MethodHash=36a7f72b) for method Program:FmaX64IsSupported():ubyte (FullOpts)
 ; ============================================================
 
 Unwind Info:

+5 (+166.67%) : 21526.dasm - Program:Avx2X64IsSupported():ubyte (FullOpts)

@@ -14,13 +14,13 @@
 G_M13187_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
 						;; size=0 bbWeight=1 PerfScore 0.00
 G_M13187_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+       mov      eax, 1
+						;; size=7 bbWeight=1 PerfScore 0.25
 G_M13187_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 3, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 3 (MethodHash=f683cc7c) for method Program:Avx2X64IsSupported():ubyte (FullOpts)
+; Total bytes of code 8, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 8 (MethodHash=f683cc7c) for method Program:Avx2X64IsSupported():ubyte (FullOpts)
 ; ============================================================
 
 Unwind Info:

Details

Size improvements/regressions per collection

Collection	Contexts with diffs	Improvements	Regressions	Improvements (bytes)	Regressions (bytes)
aspnet.run.windows.x64.checked.mch	140,527	0	140,527	-0	+10,392,179
benchmarks.run.windows.x64.checked.mch	37,922	0	37,922	-0	+3,013,399
benchmarks.run_pgo.windows.x64.checked.mch	120,020	0	120,020	-0	+8,962,474
benchmarks.run_tiered.windows.x64.checked.mch	76,575	0	76,575	-0	+4,199,746
coreclr_tests.run.windows.x64.checked.mch	639,890	3	639,887	-14	+84,314,025
libraries.crossgen2.windows.x64.checked.mch	274,848	0	274,848	-0	+11,739,139
libraries.pmi.windows.x64.checked.mch	309,149	1	309,148	-1	+15,125,726
libraries_tests.run.windows.x64.Release.mch	811,914	0	811,914	-0	+69,356,394
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	339,284	0	339,284	-0	+32,521,941
realworld.run.windows.x64.checked.mch	28,087	0	28,087	-0	+2,545,126
smoke_tests.nativeaot.windows.x64.checked.mch	30,547	8	30,539	-8	+1,394,449
	2,808,763	12	2,808,751	-23	+243,564,598

PerfScore improvements/regressions per collection

Collection	Contexts with diffs	Improvements	Regressions	Same PerfScore	Improvements (PerfScore)	Regressions (PerfScore)	PerfScore Overall in FullOpts
aspnet.run.windows.x64.checked.mch	140,527	0	0	140,527	0.00%	0.00%	0.0000%
benchmarks.run.windows.x64.checked.mch	37,922	0	0	37,922	0.00%	0.00%	0.0000%
benchmarks.run_pgo.windows.x64.checked.mch	120,020	0	0	120,020	0.00%	0.00%	0.0000%
benchmarks.run_tiered.windows.x64.checked.mch	76,575	0	0	76,575	0.00%	0.00%	0.0000%
coreclr_tests.run.windows.x64.checked.mch	639,890	2	1	639,887	-7.87%	+0.03%	0.0000%
libraries.crossgen2.windows.x64.checked.mch	274,848	0	0	274,848	0.00%	0.00%	0.0000%
libraries.pmi.windows.x64.checked.mch	309,149	0	0	309,149	0.00%	0.00%	0.0000%
libraries_tests.run.windows.x64.Release.mch	811,914	0	0	811,914	0.00%	0.00%	0.0000%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	339,284	0	0	339,284	0.00%	0.00%	0.0000%
realworld.run.windows.x64.checked.mch	28,087	0	0	28,087	0.00%	0.00%	0.0000%
smoke_tests.nativeaot.windows.x64.checked.mch	30,547	0	0	30,547	0.00%	0.00%	0.0000%

Context information

Collection	Diffed contexts	MinOpts	FullOpts	Missed, base	Missed, diff
aspnet.run.windows.x64.checked.mch	141,224	77,324	63,900	0 (0.00%)	0 (0.00%)
benchmarks.run.windows.x64.checked.mch	38,352	6	38,346	0 (0.00%)	0 (0.00%)
benchmarks.run_pgo.windows.x64.checked.mch	120,280	68,103	52,177	0 (0.00%)	0 (0.00%)
benchmarks.run_tiered.windows.x64.checked.mch	76,876	56,358	20,518	0 (0.00%)	0 (0.00%)
coreclr_tests.run.windows.x64.checked.mch	642,813	393,776	249,037	0 (0.00%)	5 (0.00%)
libraries.crossgen2.windows.x64.checked.mch	276,889	15	276,874	0 (0.00%)	2 (0.00%)
libraries.pmi.windows.x64.checked.mch	316,010	6	316,004	0 (0.00%)	1 (0.00%)
libraries_tests.run.windows.x64.Release.mch	814,679	567,674	247,005	0 (0.00%)	0 (0.00%)
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	343,895	21,994	321,901	0 (0.00%)	0 (0.00%)
realworld.run.windows.x64.checked.mch	28,368	3	28,365	0 (0.00%)	0 (0.00%)
smoke_tests.nativeaot.windows.x64.checked.mch	31,202	10	31,192	0 (0.00%)	3 (0.01%)
	2,830,588	1,185,269	1,645,319	0 (0.00%)	11 (0.00%)

jit-analyze output

Comments: No Decode Failure or assertion failure is reported in the logs, only except some assert fails about unsupported ISAs, this should is also attributed to the APX CPUID changes. The huge code size is expected as we are forcing all the compatible legacy instructions to be encoded in REX2 regradless if it is needed.

2.3 TpDiff - REX2 off (no or little tp impact expected)

TP impact with REX2 off compared with base main:

Overall (+0.08% to +0.19%)

Collection	PDIFF
aspnet.run.windows.x64.checked.mch	+0.13%
benchmarks.run.windows.x64.checked.mch	+0.08%
benchmarks.run_pgo.windows.x64.checked.mch	+0.12%
benchmarks.run_tiered.windows.x64.checked.mch	+0.19%
coreclr_tests.run.windows.x64.checked.mch	+0.18%
libraries.crossgen2.windows.x64.checked.mch	+0.11%
libraries.pmi.windows.x64.checked.mch	+0.09%
libraries_tests.run.windows.x64.Release.mch	+0.15%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	+0.10%
realworld.run.windows.x64.checked.mch	+0.09%
smoke_tests.nativeaot.windows.x64.checked.mch	+0.08%

MinOpts (+0.24% to +0.43%)

Collection	PDIFF
aspnet.run.windows.x64.checked.mch	+0.37%
benchmarks.run.windows.x64.checked.mch	+0.36%
benchmarks.run_pgo.windows.x64.checked.mch	+0.35%
benchmarks.run_tiered.windows.x64.checked.mch	+0.34%
coreclr_tests.run.windows.x64.checked.mch	+0.27%
libraries.crossgen2.windows.x64.checked.mch	+0.36%
libraries.pmi.windows.x64.checked.mch	+0.24%
libraries_tests.run.windows.x64.Release.mch	+0.37%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	+0.30%
realworld.run.windows.x64.checked.mch	+0.43%
smoke_tests.nativeaot.windows.x64.checked.mch	+0.29%

FullOpts (+0.07% to +0.11%)

Collection	PDIFF
aspnet.run.windows.x64.checked.mch	+0.08%
benchmarks.run.windows.x64.checked.mch	+0.08%
benchmarks.run_pgo.windows.x64.checked.mch	+0.07%
benchmarks.run_tiered.windows.x64.checked.mch	+0.08%
coreclr_tests.run.windows.x64.checked.mch	+0.10%
libraries.crossgen2.windows.x64.checked.mch	+0.11%
libraries.pmi.windows.x64.checked.mch	+0.09%
libraries_tests.run.windows.x64.Release.mch	+0.08%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	+0.10%
realworld.run.windows.x64.checked.mch	+0.08%
smoke_tests.nativeaot.windows.x64.checked.mch	+0.08%

Details

All contexts:

Collection	Base # instructions	Diff # instructions	PDIFF
aspnet.run.windows.x64.checked.mch	142,394,809,547	142,582,075,252	+0.13%
benchmarks.run.windows.x64.checked.mch	55,370,986,624	55,417,510,475	+0.08%
benchmarks.run_pgo.windows.x64.checked.mch	121,883,543,862	122,027,057,184	+0.12%
benchmarks.run_tiered.windows.x64.checked.mch	34,231,112,724	34,297,405,288	+0.19%
coreclr_tests.run.windows.x64.checked.mch	809,468,778,745	810,902,734,493	+0.18%
libraries.crossgen2.windows.x64.checked.mch	154,853,677,569	155,028,932,749	+0.11%
libraries.pmi.windows.x64.checked.mch	269,020,941,364	269,270,900,769	+0.09%
libraries_tests.run.windows.x64.Release.mch	815,708,776,365	816,960,864,737	+0.15%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	577,085,986,658	577,668,396,449	+0.10%
realworld.run.windows.x64.checked.mch	49,400,011,363	49,442,097,903	+0.09%
smoke_tests.nativeaot.windows.x64.checked.mch	22,690,369,631	22,708,625,728	+0.08%

MinOpts contexts:

Collection	Base # instructions	Diff # instructions	PDIFF
aspnet.run.windows.x64.checked.mch	24,056,213,491	24,145,743,102	+0.37%
benchmarks.run.windows.x64.checked.mch	705,633	708,145	+0.36%
benchmarks.run_pgo.windows.x64.checked.mch	19,880,799,806	19,950,333,595	+0.35%
benchmarks.run_tiered.windows.x64.checked.mch	15,022,302,541	15,073,432,822	+0.34%
coreclr_tests.run.windows.x64.checked.mch	347,233,426,241	348,186,424,612	+0.27%
libraries.crossgen2.windows.x64.checked.mch	2,084,909	2,092,477	+0.36%
libraries.pmi.windows.x64.checked.mch	132,525,396	132,849,510	+0.24%
libraries_tests.run.windows.x64.Release.mch	206,423,819,906	207,191,499,654	+0.37%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	12,115,105,172	12,152,028,039	+0.30%
realworld.run.windows.x64.checked.mch	348,063,722	349,547,131	+0.43%
smoke_tests.nativeaot.windows.x64.checked.mch	1,254,167	1,257,840	+0.29%

FullOpts contexts:

Collection	Base # instructions	Diff # instructions	PDIFF
aspnet.run.windows.x64.checked.mch	118,338,596,056	118,436,332,150	+0.08%
benchmarks.run.windows.x64.checked.mch	55,370,280,991	55,416,802,330	+0.08%
benchmarks.run_pgo.windows.x64.checked.mch	102,002,744,056	102,076,723,589	+0.07%
benchmarks.run_tiered.windows.x64.checked.mch	19,208,810,183	19,223,972,466	+0.08%
coreclr_tests.run.windows.x64.checked.mch	462,235,352,504	462,716,309,881	+0.10%
libraries.crossgen2.windows.x64.checked.mch	154,851,592,660	155,026,840,272	+0.11%
libraries.pmi.windows.x64.checked.mch	268,888,415,968	269,138,051,259	+0.09%
libraries_tests.run.windows.x64.Release.mch	609,284,956,459	609,769,365,083	+0.08%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	564,970,881,486	565,516,368,410	+0.10%
realworld.run.windows.x64.checked.mch	49,051,947,641	49,092,550,772	+0.08%
smoke_tests.nativeaot.windows.x64.checked.mch	22,689,115,464	22,707,367,888	+0.08%

3. JIT unit tests

Comments: We are not using the full JIT test suite because the emulator has its own limitation and when the test sets is too big, emulator itself will have some non-deterministic behaviors, to avoid it, we did some effort to figure out the best coverage that will generate stable testing results.

Comments: Within this subset shown in the screen shot, all the tests are passing without REX2 (DOTNET_JitStressRex2Encoding=0) and with REX2 (DOTNET_JitStressRex2Encoding = 1) with some know exceptions caused by the emulator itself, i.e. CodegenBringUpTests and IL_Comformance will break due to the fact that there are some existing try-catch structures, and some exceptions are supposed to be caught by the runtime, but first caught by the emulator.

dotnet-policy-service · 2024-08-16T18:11:25Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Ruihan-Yin · 2024-08-16T18:18:38Z

The base of 2 APX related PRs( CPUID: #104637 and REX2: #106557) is outdated, I will work offline to resolve the conflicts and rebase the branch.

We are willing to discuss the design and tests here, please feel free to leave a comment if any question or suggestion

dotnet-policy-service · 2024-10-05T21:35:58Z

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

anthonycanino · 2024-10-21T21:35:36Z

@dotnet/avx512-contrib can we reopen this as a PR ready to review?

BruceForstall · 2024-10-21T22:05:38Z

@anthonycanino I re-opened it (it wasn't clear to me if your question implied you did not have permission to do so). Either you or @Ruihan-Yin need to update to latest main and resolve the conflicts, then mark it ready-for-review.

Ruihan-Yin · 2024-11-18T18:59:22Z

Hi @tannergooding, thanks for the reviews in #104637, it seems like the CPUID changes are just pending merge and there should be no major changes expected, so while waiting, I wonder if we can start the conversion on this PR?

tannergooding · 2024-11-18T20:24:34Z

@Ruihan-Yin, just got #104637 merged. If we could get this PR updated so it contains just the new changes, that should make it a lot simpler to review and get in.

Ruihan-Yin · 2024-11-18T20:26:43Z

Thanks! I will work on it soon.

BruceForstall · 2024-11-20T19:46:30Z

Do you have plans to bump the LLVM version for .NET 10?

Yes. Also, I expect to re-tool the cordistools build process to make it easier than it is currently to update the dependent LLVM version.

@MichalPetryka thanks for that link. That might make it easier to build cordistools with our AzureLinux containers (currently, the libc in Ubuntu 16.04 I think is limiting our ability to build new LLVM). However, note that coredistools version of LLVM and .NET 10 version of LLVM are not currently related, and hopefully can still be chosen independently, in the future.

Ruihan-Yin · 2024-11-22T19:49:17Z

Hi @tannergooding, I tried to refactor the stress mode for REX2, can you please check if it is as expected?

Plus, there are a few questions regarding to the comments, would appreciate it if you can take a look.

Thanks!

BruceForstall · 2024-12-02T18:51:18Z

@tannergooding ping on @Ruihan-Yin 's latest questions

tannergooding · 2024-12-04T17:09:04Z

src/coreclr/jit/compiler.cpp

@@ -2297,6 +2297,10 @@ void Compiler::compSetProcessor()
            codeGen->GetEmitter()->SetUseEvexEncoding(true);
            // TODO-XArch-AVX512 : Revisit other flags to be set once avx512 instructions are added.


Not related to this PR, but this comment is likely irrelevant now.

tannergooding · 2024-12-04T17:13:12Z

src/coreclr/jit/emitxarch.cpp

+    // TODO-apx:
+    // there are duplicated stress logics here and in HasExtendedGPReg()
+    // need to clean up later.


This comment looks outdated now

tannergooding · 2024-12-04T17:14:56Z

src/coreclr/jit/emitxarch.cpp

+// TODO-apx: It would be better to have stress mode on LSRA to forcely allocate EGPRs,
+//           instead of stressing here.
+#if defined(DEBUG)
+    if (emitComp->DoJitStressRex2Encoding())
+    {
+        return true;
+    }
+#endif // DEBUG


We should make sure a tracking issue exists for this.

Actually, I'm not quite sure why this check is needed. We don't have/need such a path for EVEX, we simply emit the EVEX encoding always under stress mode.

We then use the JitStressReg knobs to preference different register sets.

Yes, I agree that this part is now not needed. I will remove it.

tannergooding · 2024-12-04T17:18:46Z

src/coreclr/jit/emitxarch.cpp

+    }
+#endif // DEBUG
+
+    if (UseRex2Encoding())


Is this check actually necessary?

IsExtendedReg freely returns true for XMM16-XMM31 without a corresponding EVEX check. So I would have expected this can simply be (reg >= REG_R16) && (reg <= REG_R31) -- with a temporary path doing || (DoJitStressRex2Encoding() && (reg >= REG_RAX) && (reg <= REG_R15)) until LSRA is properly updated, but as per the above I'm not sure that's really needed

IsExtendedGPReg is only used in HasExtendedGPReg, which is then used in TakesRex2Prefix, I think it should be fine to leave a simple check like (reg >= REG_R16) && (reg <= REG_R31) there. If we stress REX2 encoding, TakesRex2Prefix will return true by DoJitStressRex2Encoding check and never reach the register check.

For now, since the EGPR definition is still missing, I will simply return false there, and come back when EGPRs are defined. I will document this with an issue. (I'd expect this could be addressed within the PR for register allocator

tannergooding

The changes generally look good/correct to me now for a foundational PR.

I think we need at least one tracking issue covering the TODOs in here and ensuring they're cleaned up as the other work gets completed.

This needs an additional review from @dotnet/jit-contrib

Ruihan-Yin · 2024-12-04T19:55:21Z

Thanks for the review!

TODOs are now tracked in #110414.

BruceForstall

A few somewhat minor suggestions and comments.

BruceForstall · 2024-12-14T00:34:23Z

src/coreclr/jit/codegenxarch.cpp

+    genDefineTempLabel(genCreateTempLabel());
+
+    // This test suite needs REX2 enabled.
+    assert(theEmitter->UseRex2Encoding() || theEmitter->emitComp->DoJitStressRex2Encoding());


Is it the case that these are all "normal" non-APX / non-REX2 instructions, that only test REX2 if REX2 stress is enabled? So to use this, you need to set DOTNET_JitStressRex2Encoding=1 and DOTNET_JitEmitUnitTestsSections=apx?

Note that these asserts will cause DOTNET_JitEmitUnitTestsSections=all (without DOTNET_JitStressRex2Encoding=1 set) to assert, which is undesirable. Instead of assert, maybe just return if REX2 is not being stressed? Or maybe it's possible that if you get here, to temporarily "turn on" REX2 stress/encoding for the duration of this function?

Presumably there will be additional cases, with EGPR high registers, where the REX2 encodings will be required.

If we temporarily turn on REX2 stress only in this function, then only the code generator will do its work using REX2 encoding, after we exit the function and turn off REX2 stress, the corresponding emitter will emit code under REX2 off.

I can make it an early return if REX2 is not stressed.

And yes, currently the emitter unit test can only perform encoding on GPR0~15, while we are still waiting for the EGPR definition, that will come with the PR for LSRA. I can add a TODO item for more unit tests with EGPRs to #110414 if needed.

BruceForstall · 2024-12-14T00:41:30Z

src/coreclr/jit/codegenxarch.cpp

+    // it might fail due to stack value unavailable/mismatch, since these tests are mainly for
+    // encoding correctness check, this kind of failures may be considered as not harmful.
+
+    GenTree* stkNum = theEmitter->emitComp->stackState.esStack[0].val;


I don't see the relationship to the stackState variable.

Couldn't you:

GenTreePhysReg physReg(REG_EDX); GenTreeIndir load(TYP_INT, &physReg);

to create [EDX] addressing mode?

BruceForstall · 2024-12-14T00:54:43Z

src/coreclr/jit/compiler.h

+        {
+            // we should make sure EVEX is also stressed when REX2 is stressed, as we will need to guarantee EGPR
+            // functionality is properly turned on for every instructions when REX2 is stress.
+            assert(JitConfig.JitStressEvexEncoding());


This means that if you set DOTNET_JitStressRex2Encoding=1 but don't set DOTNET_JitStressEvexEncoding=1 you will get an assert. That's annoying. Is it really necessary? That is, can you stress REX2 without also stressing EVEX?

If it's required to also stress EVEX, I suggest creating a small helper function:

bool JitStressEvexEncoding() const { return JitConfig.JitStressEvexEncoding() || JitConfig.JitStressRex2Encoding(); }

and replace all occurrences of calls to JitConfig.JitStressEvexEncoding() to calls to this new helper function. Then you can remove this assert.

We had more discussion on this here: #106557 (comment)

I will replace the EVEX encoding check with the suggested helper, thanks for pointing out.

BruceForstall · 2024-12-14T01:17:34Z

src/coreclr/jit/emitxarch.cpp

+        // 2-byte
+        return true;
+    }
+    if ((code & 0xFF0000) == 0x0F0000)


Does this need to be:

if ((code & 0xFFFF0000) == 0x000F0000)

(that is, check that high byte 4 is zero)? If we don't check this, couldn't a 4-byte code with 0F second byte be caught?

BruceForstall · 2024-12-14T01:18:18Z

src/coreclr/jit/emitxarch.cpp

+
+    if ((code & 0xFF000000) == 0x0F000000)
+    {
+        // 4-byte, need to check if PP is prefixs


nit

Suggested change

// 4-byte, need to check if PP is prefixs

// 4-byte, need to check if PP is a prefix

BruceForstall · 2024-12-14T01:25:36Z

src/coreclr/jit/emitxarch.cpp

+    // TODO-xarch-apx:
+    // At this stage, we are only using REX2 in the case that non-simd integer instructions
+    // with EGPRs being used in its operands, it could be either direct register uses, or
+    // memory addresssig operands, i.e. index and base.


typo

Suggested change

// memory addresssig operands, i.e. index and base.

// memory addressing operands, i.e. index and base.

BruceForstall · 2024-12-14T01:26:56Z

src/coreclr/jit/emitxarch.cpp

+        return false;
+    }
+
+#if defined(DEBUG)


It doesn't really matter, but it seems like the stress check should be last, just before the final return false;

BruceForstall · 2024-12-14T01:30:29Z

src/coreclr/jit/emitxarch.cpp

@@ -1657,6 +1785,36 @@ bool emitter::HasHighSIMDReg(const instrDesc* id) const
    return false;
 }

+//------------------------------------------------------------------------
+// HasExtendedGPReg: Checks if an instruction uses a extended general purpose registers - EGPRs (r16-r31)


nit

Suggested change

// HasExtendedGPReg: Checks if an instruction uses a extended general purpose registers - EGPRs (r16-r31)

// HasExtendedGPReg: Checks if an instruction uses an extended general-purpose register - EGPR (r16-r31)

BruceForstall · 2024-12-14T01:32:52Z

src/coreclr/jit/emitxarch.cpp

+        if ((code & 0xFF) == 0x0F)
+        {
+            // some map-1 instructions have opcode in forms like:
+            // XX0F, remove the leading 0x0F byte as it have been recoreded in REX2.


Suggested change

// XX0F, remove the leading 0x0F byte as it have been recoreded in REX2.

// XX0F, remove the leading 0x0F byte as it has been recorded in REX2.

BruceForstall · 2024-12-14T01:35:46Z

src/coreclr/jit/emitxarch.cpp

@@ -3233,12 +3554,18 @@ inline unsigned emitter::insEncodeReg012(const instrDesc* id, regNumber reg, emi
        {
            *code = AddRexBPrefix(id, *code); // REX.B
        }
+        if (false /*reg >= REG_R16 && reg <= REG_R31*/)
+        {
+            // seperate the encoding for REX2.B3/B4, REX2.B3 will be handled in `AddRexBPrefix`.


Suggested change

// seperate the encoding for REX2.B3/B4, REX2.B3 will be handled in `AddRexBPrefix`.

// Separate the encoding for REX2.B3/B4, REX2.B3 will be handled in `AddRexBPrefix`.

Ruihan-Yin · 2024-12-17T01:06:51Z

Thanks all for the reviews and suggestions!

ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Aug 16, 2024

dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Aug 16, 2024

Ruihan-Yin mentioned this pull request Aug 16, 2024

Update the CPUID and XSAVE logics for APX #104637

Merged

BruceForstall added the apx Related to the Intel Advanced Performance Extensions (APX) label Sep 5, 2024

dotnet-policy-service bot closed this Oct 5, 2024

Ruihan-Yin mentioned this pull request Oct 11, 2024

[JIT] Add legacy extended EVEX encoding and EVEX.ND/NF feature to x64 emitter backend #108796

Merged

BruceForstall mentioned this pull request Oct 15, 2024

Intel architecture improvements for .NET 10 #108869

Open

46 tasks

BruceForstall reopened this Oct 21, 2024

Ruihan-Yin marked this pull request as ready for review November 13, 2024 01:27

Ruihan-Yin requested a review from MichalStrehovsky as a code owner November 13, 2024 01:27

Ruihan-Yin force-pushed the apx-rex2-july branch from 866253d to 42c6cfc Compare November 20, 2024 18:55

Ruihan-Yin added 2 commits November 22, 2024 10:13

resolve comments

2e2eb01

add more emitter tests.

3d298b7

resolve comments.

25a54d3

build-analysis bot mentioned this pull request Dec 3, 2024

System.Formats.Nrbf.Tests timeouts #110285

Closed

Ruihan-Yin requested a review from tannergooding December 3, 2024 22:57

tannergooding reviewed Dec 4, 2024

View reviewed changes

tannergooding approved these changes Dec 4, 2024

View reviewed changes

Ruihan-Yin added 2 commits December 4, 2024 11:32

clean up some comments and tweak the REX2 stress logic

791b505

clean up

094e76b

Ruihan-Yin mentioned this pull request Dec 4, 2024

[APX-REX2] Left-over TODOs after REX2 encoding changes #110414

Closed

3 tasks

formatting.

6502ae1

This was referenced Dec 4, 2024

iOS test fails with "App is not signed" #110395

Closed

[OSX]: AMDeviceSecureInstallApplicationBundle returned: 0xe800801c #110403

Open

BruceForstall reviewed Dec 14, 2024

View reviewed changes

resolve comments.

5d3cca2

BruceForstall approved these changes Dec 16, 2024

View reviewed changes

BruceForstall merged commit 3410c76 into dotnet:main Dec 17, 2024
115 checks passed

This was referenced Dec 18, 2024

[JIT] Add ccmp and enable conditional compares for X64 #110826

Closed

[JIT] Enable ccmp in X86 emitter backend. #110881

Closed

anthonycanino mentioned this pull request Jan 3, 2025

[JIT] Enable conditional chaining for Intel APX #111072

Merged

github-actions bot locked and limited conversation to collaborators Jan 16, 2025

		@@ -2297,6 +2297,10 @@ void Compiler::compSetProcessor()
		codeGen->GetEmitter()->SetUseEvexEncoding(true);
		// TODO-XArch-AVX512 : Revisit other flags to be set once avx512 instructions are added.

	// 4-byte, need to check if PP is prefixs
	// 4-byte, need to check if PP is a prefix

	// memory addresssig operands, i.e. index and base.
	// memory addressing operands, i.e. index and base.

	// HasExtendedGPReg: Checks if an instruction uses a extended general purpose registers - EGPRs (r16-r31)
	// HasExtendedGPReg: Checks if an instruction uses an extended general-purpose register - EGPR (r16-r31)

	// XX0F, remove the leading 0x0F byte as it have been recoreded in REX2.
	// XX0F, remove the leading 0x0F byte as it has been recorded in REX2.

	// seperate the encoding for REX2.B3/B4, REX2.B3 will be handled in `AddRexBPrefix`.
	// Separate the encoding for REX2.B3/B4, REX2.B3 will be handled in `AddRexBPrefix`.

[JIT] Enable EGPRs in JIT by adding REX2 encoding to the backend. #106557

[JIT] Enable EGPRs in JIT by adding REX2 encoding to the backend. #106557

Uh oh!

Conversation

Ruihan-Yin commented Aug 16, 2024

Overview

Specification

Design

Testing

1. Emitter unit tests

2. SuperPMI

3. JIT unit tests

Follow-up plans

Uh oh!

Ruihan-Yin commented Aug 16, 2024

Testing results

1. Emitter unit tests

2. SuperPMI

2.1 AsmDiffs - REX2 off (No diffs expected)

Size improvements/regressions per collection

PerfScore improvements/regressions per collection

Context information

jit-analyze output

2.2 AsmDiffs - REX2 on

Size improvements/regressions per collection

PerfScore improvements/regressions per collection

Context information

jit-analyze output

2.3 TpDiff - REX2 off (no or little tp impact expected)

3. JIT unit tests

Uh oh!

dotnet-policy-service bot commented Aug 16, 2024

Uh oh!

Ruihan-Yin commented Aug 16, 2024

Uh oh!

dotnet-policy-service bot commented Oct 5, 2024

Uh oh!

anthonycanino commented Oct 21, 2024

Uh oh!

BruceForstall commented Oct 21, 2024

Uh oh!

Ruihan-Yin commented Nov 18, 2024

Uh oh!

tannergooding commented Nov 18, 2024

Uh oh!

Ruihan-Yin commented Nov 18, 2024

Uh oh!

BruceForstall commented Nov 20, 2024

Uh oh!

Ruihan-Yin commented Nov 22, 2024

Uh oh!

BruceForstall commented Dec 2, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tannergooding left a comment

Choose a reason for hiding this comment

Uh oh!

Ruihan-Yin commented Dec 4, 2024

Uh oh!

BruceForstall left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment