RyuJIT: Optimize -X and MathF.Abs(X) for floats #42164

EgorBo · 2020-09-12T19:13:20Z

Contributes to #1342

public static float XorTest(float x) => -x;

public static float AbsTest(float x) => MathF.Abs(x);

Current codegen:

; Method Test:XorTest(float):float
       C5F877               vzeroupper 
       C5FA100D0D000000     vmovss   xmm1, dword ptr [reloc @RWD00]
       C5F857C1             vxorps   xmm0, xmm1
       C3                   ret      
RWD00  dd	80000000h
; Total bytes of code: 16


; Method Test:AbsTest(float):float
       C5F877               vzeroupper 
       C5FA100D0D000000     vmovss   xmm1, dword ptr [reloc @RWD00]
       C5F854C1             vandps   xmm0, xmm1
       C3                   ret      
RWD00  dd	7FFFFFFFh
; Total bytes of code: 16

New codegen:

; Method XorTest(float):float
       C5F877               vzeroupper 
       C5F8570505000000     vxorps   xmm0, xmm0, dword ptr [reloc @RWD00]
       C3                   ret      
RWD00  dd	80000000h
; Total bytes of code: 12


; Method AbsTest(float):float
       C5F877               vzeroupper 
       C5F8540505000000     vandps   xmm0, xmm0, dword ptr [reloc @RWD00]
       C3                   ret      
RWD00  dd	7FFFFFFFh
; Total bytes of code: 12

godbolt: https://godbolt.org/z/5K5dGr

jit-diff:

C:\prj>jit-diff diff --output C:\prj\jitdiffs -f --core_root C:\prj\runtime-1\artifacts\tests\coreclr\Windows_NT.x64.Release\Tests\Core_Root --base C:\prj\runtime-1\artifacts\bin\coreclr\Windows_NT.x64.Checked_base --diff C:\prj\runtime-1\artifacts\bin\coreclr\Windows_NT.x64.Checked --pmi
Beginning PMI CodeSize Diffs for System.Private.CoreLib.dll, framework assemblies
- Finished 267/267 Base 267/267 Diff [490.6 sec]
Completed PMI CodeSize Diffs for System.Private.CoreLib.dll, framework assemblies in 490.77s
Diffs (if any) can be viewed by comparing: C:\prj\jitdiffs\dasmset_10\base C:\prj\jitdiffs\dasmset_10\diff
Analyzing CodeSize diffs...
Found 14 files with textual diffs.
PMI CodeSize Diffs for System.Private.CoreLib.dll, framework assemblies for  default jit
Summary of Code Size diffs:
(Lower is better)
Total bytes of diff: -1097 (-0.00% of base)
    diff is an improvement.
Top file improvements (bytes):
        -484 : System.Private.CoreLib.dasm (-0.01% of base)
        -153 : Microsoft.VisualBasic.Core.dasm (-0.03% of base)
        -152 : System.Runtime.Numerics.dasm (-0.21% of base)
         -92 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (-0.00% of base)
         -84 : System.Drawing.Common.dasm (-0.03% of base)
         -36 : Newtonsoft.Json.dasm (-0.00% of base)
         -28 : System.Data.Common.dasm (-0.00% of base)
         -20 : Microsoft.CodeAnalysis.VisualBasic.dasm (-0.00% of base)
         -16 : System.Private.Xml.dasm (-0.00% of base)
          -8 : FSharp.Core.dasm (-0.00% of base)
          -8 : System.Linq.Expressions.dasm (-0.00% of base)
          -8 : System.Private.DataContractSerialization.dasm (-0.00% of base)
          -4 : Microsoft.CodeAnalysis.CSharp.dasm (-0.00% of base)
          -4 : System.Net.Mail.dasm (-0.00% of base)
14 total files with Code Size differences (14 improved, 0 regressed), 253 unchanged.
Top method improvements (bytes):
         -41 (-1.90% of base) : System.Private.CoreLib.dasm - Matrix4x4:<Invert>g__SoftwareFallback|65_1(Matrix4x4,byref):bool
         -36 (-6.19% of base) : System.Private.CoreLib.dasm - Matrix4x4:CreateShadow(Vector3,Plane):Matrix4x4
         -28 (-1.36% of base) : Microsoft.VisualBasic.Core.dasm - ObjectType:InternalNegObj(Object,IConvertible,int):Object
         -25 (-7.84% of base) : System.Private.CoreLib.dasm - MathF:IEEERemainder(float,float):float
         -24 (-1.21% of base) : Microsoft.VisualBasic.Core.dasm - Operators:NegateObject(Object):Object
         -24 (-5.93% of base) : System.Drawing.Common.dasm - Matrix:RotateAt(float,PointF,int):this
         -24 (-14.20% of base) : System.Private.CoreLib.dasm - Matrix3x2:op_UnaryNegation(Matrix3x2):Matrix3x2
         -21 (-7.00% of base) : System.Private.CoreLib.dasm - Math:IEEERemainder(double,double):double
         -20 (-1.43% of base) : Microsoft.VisualBasic.Core.dasm - Financial:IRR(byref,double):double
         -20 (-1.21% of base) : System.Private.CoreLib.dasm - Matrix4x4:Decompose(Matrix4x4,byref,byref,byref):bool
         -20 (-9.39% of base) : System.Runtime.Numerics.dasm - Complex:Asin(Complex):Complex
         -20 (-3.77% of base) : System.Runtime.Numerics.dasm - Complex:Sqrt(Complex):Complex
         -16 (-0.57% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - ExpressionEvaluator:EvaluateUnaryExpression(UnaryExpressionSyntax):CConst:this
         -16 (-9.64% of base) : System.Drawing.Common.dasm - PrinterSettings:CreateMeasurementGraphics(bool):Graphics:this
         -16 (-17.20% of base) : System.Private.CoreLib.dasm - Quaternion:op_UnaryNegation(Quaternion):Quaternion
         -16 (-17.20% of base) : System.Private.CoreLib.dasm - Quaternion:Negate(Quaternion):Quaternion
         -16 (-13.68% of base) : System.Private.CoreLib.dasm - Vector4:Abs(Vector4):Vector4
         -16 (-2.26% of base) : System.Private.CoreLib.dasm - Vector`1:Abs(Vector`1):Vector`1 (6 methods)
         -16 (-11.51% of base) : System.Runtime.Numerics.dasm - Complex:op_Division(double,Complex):Complex
         -16 (-7.24% of base) : System.Runtime.Numerics.dasm - Complex:Acos(Complex):Complex
Top method improvements (percentages):
          -4 (-25.00% of base) : FSharp.Core.dasm - -cctor@6135-13:Invoke(double):double:this
          -4 (-25.00% of base) : FSharp.Core.dasm - -cctor@6136-15:Invoke(float):float:this
          -4 (-25.00% of base) : System.Private.CoreLib.dasm - MathF:Abs(float):float
         -16 (-17.20% of base) : System.Private.CoreLib.dasm - Quaternion:op_UnaryNegation(Quaternion):Quaternion
         -16 (-17.20% of base) : System.Private.CoreLib.dasm - Quaternion:Negate(Quaternion):Quaternion
          -8 (-16.33% of base) : System.Runtime.Numerics.dasm - Complex:Negate(Complex):Complex
          -8 (-16.33% of base) : System.Runtime.Numerics.dasm - Complex:op_UnaryNegation(Complex):Complex
          -8 (-15.38% of base) : Microsoft.VisualBasic.Core.dasm - Conversion:Fix(double):double
         -12 (-14.81% of base) : System.Private.CoreLib.dasm - Quaternion:Conjugate(Quaternion):Quaternion
         -24 (-14.20% of base) : System.Private.CoreLib.dasm - Matrix3x2:op_UnaryNegation(Matrix3x2):Matrix3x2
         -16 (-13.68% of base) : System.Private.CoreLib.dasm - Vector4:Abs(Vector4):Vector4
          -8 (-12.50% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - <>c:<ByIDSortedExclusiveMetric>b__15_0(CallTreeNodeBase,CallTreeNodeBase):int:this
          -8 (-12.50% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - <>c:<.ctor>b__0_0(CallTreeNodeBase,CallTreeNodeBase):int:this
          -8 (-12.50% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - <>c:<.ctor>b__0_1(CallTreeNodeBase,CallTreeNodeBase):int:this
          -8 (-12.50% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - <>c:<GetCallees>b__5_0(CallTreeNode,CallTreeNode):int:this
         -12 (-12.37% of base) : Newtonsoft.Json.dasm - MathUtils:ApproxEquals(double,double):bool
         -12 (-11.88% of base) : System.Private.CoreLib.dasm - Vector3:Abs(Vector3):Vector3
          -8 (-11.76% of base) : Microsoft.VisualBasic.Core.dasm - Conversion:Fix(float):float
          -4 (-11.76% of base) : Newtonsoft.Json.dasm - JsonValidatingReader:IsZero(double):bool
         -16 (-11.51% of base) : System.Runtime.Numerics.dasm - Complex:op_Division(double,Complex):Complex
127 total methods with Code Size differences (127 improved, 0 regressed), 258503 unchanged.
Completed analysis in 28.55s

/cc @tannergooding

Saving 0.0 constant into a temp reg makes sense when it's used more than once but it doesn't work anyway: https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABAJgEYBYAKGIGYACMhgYQYG8aHunGAzADYRsGBgFls2ABSDhohGgayRDAJ4BKBgF4AfA0QMAVPtUBuLjwD0lhgEtRQ7ABNcDAEQAGAHQe3DfHgA1gwYAO62YDCKAJIM2PgMAHYQojAAbjCJ/pkYthCJtokA5voIhnCqDLgAFhAArgJODMAwDBAADrn4tgBeME0YEAxlqjQAvkA===

am11 · 2020-09-12T21:43:37Z

godbolt: https://godbolt.org/z/5K5dGr

@EgorBo, do you know why clang is producing vxorps, for double precision (as well as for float); while gcc seems to be doing the right thing, producing vxorpd with double and vxorps for float?

EgorBo · 2020-09-12T22:05:47Z

godbolt: https://godbolt.org/z/5K5dGr

@EgorBo, do you know why clang is producing vxorps, for double precision (as well as for float); while gcc seems to be doing the right thing, producing vxorpd with double and vxorps for float?

I guess there is no difference for xor what to xor - it xors bits https://godbolt.org/z/9TsWq1 🙂
UPD: it looks like there is a small difference for the 256bit xor: https://stackoverflow.com/questions/26942952/difference-between-the-avx-instructions-vxorpd-and-vpxor

src/coreclr/src/jit/codegenxarch.cpp

tannergooding · 2020-09-12T22:19:09Z

do you know why clang is producing vxorps, for double precision (as well as for float); while gcc seems to be doing the right thing, producing vxorpd with double and vxorps for float?

For the legacy encoding, the ps version is generally 1 byte smaller than the pd or integral versions. For VEX they should be the same size.
movaps, movups, movntps, xorps, orps, andps, and shufps are all other instructions that can also be substituted due to being a byte smaller but functionally operating on bits rather than on "32-bit floats".

sandreenko · 2020-09-13T23:37:52Z

@dotnet/jit-contrib

EgorBo · 2020-09-14T15:06:35Z

some tests fail, a simplified repro:

using System;
using System.Runtime.CompilerServices;

class Prog
{
    static void Main()
    {
        Console.WriteLine(
            BitConverter.DoubleToInt64Bits(Egor(0, double.NaN)));
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    static double Egor(double xmm0, double xmm1) => -xmm1;
}

happens only when AVX (VEX) is not available (e.g R2R).
Output:

Unhandled exception. System.NullReferenceException: Object reference not set to an instance of an object.
   at Prog.Egor(Double xmm0, Double xmm1)
   at Prog.Main()

codegen for Egor:

G_M17335_IG01:
						;; bbWeight=1    PerfScore 0.00
G_M17335_IG02:
       0F28C1               movaps   xmm0, xmm1
       0F57051E000000       xorps    xmm0, qword ptr [reloc @RWD24]
						;; bbWeight=1    PerfScore 2.25
G_M17335_IG03:
       C3                   ret
						;; bbWeight=1    PerfScore 1.00
RWD00  dq	0000000000000000h
RWD08  dq	8000000000000000h
RWD16  dq	0000000000000000h
RWD24  dq	8000000000000000h

tannergooding · 2020-09-14T15:11:06Z

Shouldn't that be reloc @RWD00, not @RWD24?

EgorBo · 2020-09-14T15:15:54Z

@tannergooding 🤔 hm.. shouldn't the constant be 0x8000000000000000 (with sign bit on) but the section looks weird indeed.

fixed my code a bit so now it's:

; Assembly listing for method Prog:Egor(double,double):double
; Emitting BLENDED_CODE for X64 CPU with SSE2 - Unix
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;* V00 arg0         [V00    ] (  0,  0   )  double  ->  zero-ref
;  V01 arg1         [V01,T00] (  3,  3   )  double  ->  mm1
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [rsp+0x00]   "OutgoingArgSpace"
;
; Lcl frame size = 0

G_M17335_IG01:
						;; bbWeight=1    PerfScore 0.00
G_M17335_IG02:
       0F28C1               movaps   xmm0, xmm1
       0F57050E000000       xorps    xmm0, qword ptr [reloc @RWD08]
						;; bbWeight=1    PerfScore 2.25
G_M17335_IG03:
       C3                   ret
						;; bbWeight=1    PerfScore 1.00
RWD00  dq	0000000000000000h
RWD08  dq	8000000000000000h

but it still NREs.
here is the old (working) codegen that works:

; Assembly listing for method Prog:Egor(double,double):double
; Emitting BLENDED_CODE for X64 CPU with SSE2 - Unix
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;* V00 arg0         [V00    ] (  0,  0   )  double  ->  zero-ref
;  V01 arg1         [V01,T00] (  3,  3   )  double  ->  mm1
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [rsp+0x00]   "OutgoingArgSpace"
;
; Lcl frame size = 0

G_M17335_IG01:
						;; bbWeight=1    PerfScore 0.00
G_M17335_IG02:
       F20F100510000000     movsd    xmm0, qword ptr [reloc @RWD08]
       0F57C1               xorps    xmm0, xmm1
						;; bbWeight=1    PerfScore 2.33
G_M17335_IG03:
       C3                   ret
						;; bbWeight=1    PerfScore 1.00
RWD00  dq	0000000000000000h
RWD08  dq	8000000000000000h


; Total bytes of code 12, prolog size 0, PerfScore 4.53, instruction count 3 (MethodHash=71e0bc48) for method Prog:Egor(double,double):double
; ============================================================

tannergooding · 2020-09-14T15:19:13Z

There is no scalar xor operation, just a packed version, so the constant must be 16-bytes with at least element 0 being -0.0 (although we currently set all elements to be -0.0 or -0.0f).

EgorBo · 2020-09-14T15:22:32Z

@tannergooding I've updated my previous comment - do you see why the second (current) codegen works fine?
(it saves the constant to the targetReg and then does xorps reg/reg instead of xorps reg/mem for the this PR version)

tannergooding · 2020-09-14T15:48:48Z

movsd xmm0, qword ptr [reloc @RWD08] reads a scalar double (hence sd), so it will only read 8-bytes and using RWD08 when the constant at that address is 8-bytes is fine.
xorps xmm0, qword ptr [reloc @RWD08] reads a packed single (hence ps), so it reads 16-bytes. You are reading from RWD08 and so it is reading RWD08-RWD24 (where there is no data at RWD16).

The failure, however, isn't due to the overreading (in this case), its because the data isn't aligned. The VEX encoding allows contained memory operands to be unaligned, while the legacy encoding (generally speaking) requires them to be aligned.

Fixing the overreading issue should also fix the alignment issue. You just need to read from RWD00.

tannergooding · 2020-09-14T15:53:25Z

Fixing the overreading issue should also fix the alignment issue

Assuming you are specifying the constant should be 16-byte aligned when being emitted, there is an option for this (emitAnyCns takes a cnsAlign parameter that is respected).

EgorBo · 2020-09-14T17:29:08Z

@tannergooding thank you for the explanation!
not sure my fix for it looks good but at least it works now

src/coreclr/src/jit/instr.cpp

src/coreclr/src/jit/emit.cpp

EgorBo · 2020-09-14T18:12:10Z

@tannergooding here is the current codegen:

Double:

static double Test(double xmm0, double xmm1) => -xmm1;

; VEX
       C5F877               vzeroupper
       C5F0570505000000     vxorps   xmm0, xmm1, qword ptr [reloc @RWD00]
       C3                   ret
RWD00  dq       8000000000000000h
RWD08  dq       8000000000000000h


; legacy
       0F28C1               movaps   xmm0, xmm1
       0F570516000000       xorps    xmm0, qword ptr [reloc @RWD16]
       C3                   ret
RWD00  db       000h, 000h, 000h, 000h, 000h, 000h, 000h, 000h, 000h, 000h, 000h, 000h, 000h, 000h, 000h, 000h
RWD16  dq       8000000000000000h
RWD24  dq       8000000000000000h

Float:

static float Test(float xmm0, float xmm1) => -xmm1;

; VEX
       C5F877               vzeroupper
       C5F0570505000000     vxorps   xmm0, xmm1, dword ptr [reloc @RWD00]
       C3                   ret
RWD00  dd       80000000h
RWD04  dd       80000000h
RWD08  dd       80000000h
RWD12  dd       80000000h


; legacy
       0F28C1               movaps   xmm0, xmm1
       0F570516000000       xorps    xmm0, dword ptr [reloc @RWD16]
       C3                   ret
RWD00  db       000h, 000h, 000h, 000h, 000h, 000h, 000h, 000h, 000h, 000h, 000h, 000h, 000h, 000h, 000h, 000h
RWD16  dd       80000000h
RWD20  dd       80000000h
RWD24  dd       80000000h
RWD28  dd       80000000h

tannergooding · 2020-09-14T18:15:14Z

👍, that looks correct now. However, I'm unsure why there are 16-bytes of padding before the constant. It looks like it should just be able to start at RWD00 (with that being properly aligned).

EgorBo · 2020-09-14T19:07:03Z

👍, that looks correct now. However, I'm unsure why there are 16-bytes of padding before the constant. It looks like it should just be able to start at RWD00 (with that being properly aligned).

@tannergooding Fixed! as a bonus it fixes redundant paddings everywhere, e.g:

static double Test(double x) => x * 10 * 10;

       C5F877               vzeroupper 
       C5FB590515000000     vmulsd   xmm0, xmm0, qword ptr [reloc @RWD08]
       C5FB59051D000000     vmulsd   xmm0, xmm0, qword ptr [reloc @RWD24]
       C3                   ret      
RWD00  dq	0000000000000000h
RWD08  dq	4024000000000000h
RWD16  dq	0000000000000000h
RWD24  dq	4024000000000000h

to this:

       C5F877               vzeroupper 
       C5FB59050D000000     vmulsd   xmm0, xmm0, qword ptr [reloc @RWD00]
       C5FB59050D000000     vmulsd   xmm0, xmm0, qword ptr [reloc @RWD08]
       C3                   ret      
RWD00  dq	4024000000000000h
RWD08  dq	4024000000000000h

(I guess jit-diff ignores data sections to get the actual impact of this change)

…loats

src/coreclr/src/jit/codegenxarch.cpp

EgorBo · 2020-09-15T13:17:08Z

@dotnet/jit-contrib @tannergooding PTAL, it's ready for final review

Summary:
This PR cleans up genSSE2BitwiseOp and it now emits 3-operands versions (VEX) of xorps/andps instead of movsd+xorps
Also, it fixes alignment for emitter::emitDataGenBeg, it used to emit paddings even if offset is already aligned or zero.
(my System.Private.Corelib.dll(R2R) is now 4kb less in size, jit-diff doesn't take data section into account) it's also responsible for redundant gaps between other types of constants, e.g. vectors.

A sample to cover both problems this PR fixes:

double Test(double x, double y) => -x * 10 * 5;

Current asm:

       C5F877               vzeroupper 
       C5FB100525000000     vmovsd   xmm0, qword ptr [reloc @RWD08] ; <-- can be eliminated for vxorps reg,reg,[mem]
       C5F857C1             vxorps   xmm0, xmm1
       C5FB590529000000     vmulsd   xmm0, xmm0, qword ptr [reloc @RWD24]
       C5FB590531000000     vmulsd   xmm0, xmm0, qword ptr [reloc @RWD40]
       C3                   ret      

RWD00  dq	0000000000000000h ; <-- redundant padding
RWD08  dq	8000000000000000h
RWD16  dq	0000000000000000h ; <-- redundant padding
RWD24  dq	4024000000000000h
RWD32  dq	0000000000000000h ; <-- redundant padding
RWD40  dq	4014000000000000h

New asm:

G_M60258_IG01:
       C5F877               vzeroupper 
G_M60258_IG02:
       C5F0570515000000     vxorps   xmm0, xmm1, qword ptr [reloc @RWD00]
       C5FB59051D000000     vmulsd   xmm0, xmm0, qword ptr [reloc @RWD16]
       C5FB59051D000000     vmulsd   xmm0, xmm0, qword ptr [reloc @RWD24]
G_M60258_IG03:
       C3                   ret      

RWD00  db	000h, 000h, 000h, 000h, 000h, 000h, 000h, 080h, 000h, 000h, 000h, 000h, 000h, 000h, 000h, 080h
RWD16  dq	4024000000000000h
RWD24  dq	4014000000000000h

New asm for non-VEX cpu:

       0F28C1               movaps   xmm0, xmm1
       0F570516000000       xorps    xmm0, qword ptr [reloc @RWD00]
       F20F59051E000000     mulsd    xmm0, qword ptr [reloc @RWD16]
       F20F59051E000000     mulsd    xmm0, qword ptr [reloc @RWD24]
       C3                   ret      

RWD00  db	000h, 000h, 000h, 000h, 000h, 000h, 000h, 080h, 000h, 000h, 000h, 000h, 000h, 000h, 000h, 080h
RWD16  dq	4024000000000000h
RWD24  dq	4014000000000000h

RWD00 is a 16bytes pack of masks for xorps (it's also 16bytes aligned)

src/coreclr/src/jit/codegenxarch.cpp

Co-authored-by: Tanner Gooding <tagoo@outlook.com>

CarolEidt

LGTM - thanks!

Optimize -X and MathF.Abs(X) for floats

2ec23e6

EgorBo force-pushed the opt-xor-and-floats branch from 214efa9 to 2ec23e6 Compare September 12, 2020 20:05

tannergooding reviewed Sep 12, 2020

View reviewed changes

src/coreclr/src/jit/codegenxarch.cpp Outdated Show resolved Hide resolved

EgorBo added 2 commits September 13, 2020 01:35

Address feedback

c28bbce

Fix constant

7af07bf

jkotas added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 12, 2020

Clean up

4275557

EgorBo mentioned this pull request Sep 13, 2020

RyuJIT: inst_RV_RV_TT doesn't check if constants already exists in the data section #42178

Closed

cache NEG and ABS masks

12f4d94

fix mask alignment for double for xorps/andps

d62fe88

tannergooding reviewed Sep 14, 2020

View reviewed changes

src/coreclr/src/jit/instr.cpp Outdated Show resolved Hide resolved

tannergooding reviewed Sep 14, 2020

View reviewed changes

src/coreclr/src/jit/emit.cpp Outdated Show resolved Hide resolved

Address feedback

73fe0a5

EgorBo added 2 commits September 14, 2020 22:00

don't insert padding for 0 offset

7734862

fix alignment

1d392da

EgorBo added 6 commits September 14, 2020 22:21

Merge branch 'master' of github.com:dotnet/runtime into opt-xor-and-f…

f53d882

…loats

code formatting

d94e5ff

Clean up

3e1d67a

fix build issue

c21217c

Refactoring

65a8ace

Clean up

492f6bd

am11 reviewed Sep 15, 2020

View reviewed changes

src/coreclr/src/jit/codegenxarch.cpp Outdated Show resolved Hide resolved

EgorBo added 6 commits September 15, 2020 15:15

Refactoring

3b1b8a3

Refactoring

8c99b5b

Fix alignment in emitDataGenBeg

cc6d0ad

Add comments back

ff007e8

jit-format

ec5526a

jit-format

33ceca8

fix failing tests

0773861

tannergooding reviewed Sep 15, 2020

View reviewed changes

src/coreclr/src/jit/codegenxarch.cpp Outdated Show resolved Hide resolved

Update src/coreclr/src/jit/codegenxarch.cpp

e3e3be3

Co-authored-by: Tanner Gooding <tagoo@outlook.com>

tannergooding approved these changes Sep 15, 2020

View reviewed changes

CarolEidt approved these changes Oct 2, 2020

View reviewed changes

CarolEidt merged commit 271f008 into dotnet:master Oct 2, 2020

ghost locked as resolved and limited conversation to collaborators Dec 7, 2020

RyuJIT: Optimize -X and MathF.Abs(X) for floats #42164

RyuJIT: Optimize -X and MathF.Abs(X) for floats #42164

Uh oh!

Conversation

EgorBo commented Sep 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Current codegen:

New codegen:

jit-diff:

Uh oh!

am11 commented Sep 12, 2020

Uh oh!

EgorBo commented Sep 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

tannergooding commented Sep 12, 2020

Uh oh!

sandreenko commented Sep 13, 2020

Uh oh!

EgorBo commented Sep 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tannergooding commented Sep 14, 2020

Uh oh!

EgorBo commented Sep 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tannergooding commented Sep 14, 2020

Uh oh!

EgorBo commented Sep 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tannergooding commented Sep 14, 2020

Uh oh!

tannergooding commented Sep 14, 2020

Uh oh!

EgorBo commented Sep 14, 2020

Uh oh!

Uh oh!

Uh oh!

EgorBo commented Sep 14, 2020

Uh oh!

tannergooding commented Sep 14, 2020

Uh oh!

EgorBo commented Sep 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

EgorBo commented Sep 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

CarolEidt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

EgorBo commented Sep 12, 2020 •

edited

Loading

EgorBo commented Sep 12, 2020 •

edited

Loading

EgorBo commented Sep 14, 2020 •

edited

Loading

EgorBo commented Sep 14, 2020 •

edited

Loading

EgorBo commented Sep 14, 2020 •

edited

Loading

EgorBo commented Sep 14, 2020 •

edited

Loading

EgorBo commented Sep 15, 2020 •

edited

Loading