Skip to content

Improve RA for LowerBlockStore #83627

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 21, 2023
Merged

Improve RA for LowerBlockStore #83627

merged 1 commit into from
Mar 21, 2023

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Mar 17, 2023

This PR does two things:

  1. getUnrollThreshold was always invoked with /*canUseSimd*/ false so the threshold was 128b instead of 256b for x64
  2. We don't need int register if we can init the whole thing via SIMD

An example for 2nd:

struct MyStruct
{
    public fixed byte Data[30];
}

MyStruct Test() => new MyStruct();
; Method Test():Program+MyStruct:this
G_M45940_IG01:              
       4883EC38             sub      rsp, 56
       C5F877               vzeroupper 
       48B878563412F0DEBC9A mov      rax, 0x9ABCDEF012345678
       4889442430           mov      qword ptr [rsp+30H], rax
G_M45940_IG02:              
-      33C0                 xor      eax, eax  ;; we don't use RAX to zero the struct
       C5F857C0             vxorps   xmm0, xmm0
       C5FA7F02             vmovdqu  xmmword ptr [rdx], xmm0
       C5FA7F420E           vmovdqu  xmmword ptr [rdx+0EH], xmm0   ;; overlapped with previous mov
       488BC2               mov      rax, rdx
       48B978563412F0DEBC9A mov      rcx, 0x9ABCDEF012345678
       48394C2430           cmp      qword ptr [rsp+30H], rcx
       7405                 je       SHORT G_M45940_IG03
       E8C2CB4B5F           call     CORINFO_HELP_FAIL_FAST
G_M45940_IG03:              
       90                   nop      
G_M45940_IG04:              
       4883C438             add      rsp, 56
       C3                   ret      
-; Total bytes of code: 68
+; Total bytes of code: 66

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 17, 2023
@ghost ghost assigned EgorBo Mar 17, 2023
@ghost
Copy link

ghost commented Mar 17, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

This PR does two things:

  1. getUnrollThreshold was always invoked with /*canUseSimd*/ false so the threshold was 128b instead of 256b for x64
  2. We don't need int register if we can init the whole thing via SIMD

An example for 2nd:

struct MyStruct
{
    public fixed byte Data[30]; // and other sizes
}

MyStruct InitMemory() => new MyStruct();
; Method Program:InitMemory():Program+MyStruct:this
G_M45940_IG01:              
       4883EC38             sub      rsp, 56
       C5F877               vzeroupper 
       48B878563412F0DEBC9A mov      rax, 0x9ABCDEF012345678
       4889442430           mov      qword ptr [rsp+30H], rax
G_M45940_IG02:              
-      33C0                 xor      eax, eax
       C5F857C0             vxorps   xmm0, xmm0
       C5FA7F02             vmovdqu  xmmword ptr [rdx], xmm0
       C5FA7F420E           vmovdqu  xmmword ptr [rdx+0EH], xmm0
       488BC2               mov      rax, rdx
       48B978563412F0DEBC9A mov      rcx, 0x9ABCDEF012345678
       48394C2430           cmp      qword ptr [rsp+30H], rcx
       7405                 je       SHORT G_M45940_IG03
       E8C2CB4B5F           call     CORINFO_HELP_FAIL_FAST
G_M45940_IG03:              
       90                   nop      
G_M45940_IG04:              
       4883C438             add      rsp, 56
       C3                   ret      
-; Total bytes of code: 68
+; Total bytes of code: 66
Author: EgorBo
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@EgorBo
Copy link
Member Author

EgorBo commented Mar 18, 2023

@TIHan @dotnet/jit-contrib PTAL

Diffs

Size regressions are due to unrolled memset, e.g.

-       xor      edx, edx
-       lea      rcx, bword ptr [rsp+38H]
-       ; byrRegs +[rcx]
-       mov      r8d, 152
-       call     CORINFO_HELP_MEMSET
-       ; byrRegs -[rcx]
-       ; gcr arg pop 0
        xor      r9d, r9d
+       vxorps   ymm0, ymm0
+       vmovdqu  ymmword ptr[rsp+38H], ymm0
+       vmovdqu  ymmword ptr[rsp+58H], ymm0
+       vmovdqu  ymmword ptr[rsp+78H], ymm0
+       vmovdqu  ymmword ptr[rsp+98H], ymm0
+       vmovdqu  ymmword ptr[rsp+B0H], ymm0
        mov      dword ptr [rsp+20H], r9d
        mov      dword ptr [rsp+28H], r9d

Copy link
Contributor

@TIHan TIHan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regressions make sense. LGTM.

@EgorBo EgorBo merged commit 0c9568a into dotnet:main Mar 21, 2023
@EgorBo EgorBo deleted the fix-unrolling branch March 21, 2023 00:49
@ghost ghost locked as resolved and limited conversation to collaborators Apr 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants