Skip to content

JIT: Support bitwise field insertions for return registers #113178

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Mar 8, 2025

Conversation

jakobbotsch
Copy link
Member

@jakobbotsch jakobbotsch commented Mar 5, 2025

Based on the new FIELD_LIST support for returns this PR adds support for the JIT to combine smaller fields via bitwise operations when returned, instead of spilling these to stack.

win-x64 examples:

static int? Test()
{
    return Environment.TickCount;
}
        call     System.Environment:get_TickCount():int
-       mov      dword ptr [rsp+0x24], eax
-       mov      byte  ptr [rsp+0x20], 1
-       mov      rax, qword ptr [rsp+0x20]
-						;; size=19 bbWeight=1 PerfScore 4.00
+       mov      eax, eax
+       shl      rax, 32
+       or       rax, 1
+						;; size=15 bbWeight=1 PerfScore 2.00

(the mov eax, eax is unnecessary, but not that simple to get rid of)

static (int x, float y) Test(int x, float y)
{
   return (x, y);
}
-       mov      dword ptr [rsp], ecx
-       vmovss   dword ptr [rsp+0x04], xmm1
-       mov      rax, qword ptr [rsp]
+       vmovd    eax, xmm1
+       shl      rax, 32
+       mov      ecx, ecx
+       or       rax, rcx
 						;; size=13 bbWeight=1 PerfScore 3.00

An arm64 example:

static Memory<int> ToMemory(int[] arr)
{
    return arr.AsMemory();
}
 G_M45070_IG01:  ;; offset=0x0000
-            stp     fp, lr, [sp, #-0x20]!
+            stp     fp, lr, [sp, #-0x10]!
             mov     fp, sp
-            str     xzr, [fp, #0x10]	// [V03 tmp2]
-						;; size=12 bbWeight=1 PerfScore 2.50
-G_M45070_IG02:  ;; offset=0x000C
+						;; size=8 bbWeight=1 PerfScore 1.50
+G_M45070_IG02:  ;; offset=0x0008
             cbz     x0, G_M45070_IG06
 						;; size=4 bbWeight=1 PerfScore 1.00
-G_M45070_IG03:  ;; offset=0x0010
-            str     x0, [fp, #0x10]	// [V07 tmp6]
-            str     wzr, [fp, #0x18]	// [V08 tmp7]
-            ldr     x0, [fp, #0x10]	// [V07 tmp6]
-            ldr     w0, [x0, #0x08]
-            str     w0, [fp, #0x1C]	// [V09 tmp8]
-						;; size=20 bbWeight=0.80 PerfScore 6.40
-G_M45070_IG04:  ;; offset=0x0024
-            ldp     x0, x1, [fp, #0x10]	// [V03 tmp2], [V03 tmp2+0x08]
-						;; size=4 bbWeight=1 PerfScore 3.00
-G_M45070_IG05:  ;; offset=0x0028
-            ldp     fp, lr, [sp], #0x20
+G_M45070_IG03:  ;; offset=0x000C
+            ldr     w1, [x0, #0x08]
+						;; size=4 bbWeight=0.80 PerfScore 2.40
+G_M45070_IG04:  ;; offset=0x0010
+            mov     w1, w1
+            mov     x2, xzr
+            orr     x1, x2, x1,  LSL #32
+						;; size=12 bbWeight=1 PerfScore 2.00
+G_M45070_IG05:  ;; offset=0x001C
+            ldp     fp, lr, [sp], #0x10
             ret     lr
 						;; size=8 bbWeight=1 PerfScore 2.00
-G_M45070_IG06:  ;; offset=0x0030
-            str     xzr, [fp, #0x10]	// [V07 tmp6]
-            str     xzr, [fp, #0x18]
+G_M45070_IG06:  ;; offset=0x0024
+            mov     x0, xzr
+            mov     w1, wzr
             b       G_M45070_IG04
-						;; size=12 bbWeight=0.20 PerfScore 0.60
+						;; size=12 bbWeight=0.20 PerfScore 0.40

(sneak peek -- this codegen requires some supplementary changes, and there's additional opportunities here)

This is the return counterpart to #112740. That PR has a bunch of regressions that makes it look like we need to support returns/call arguments first, before we try to support parameters.

There's a few follow-ups here:

  • Support for float->float insertions (when a float value needs to be returned as the 1st, 2nd, .... field of a SIMD register)
  • Support for coalescing memory loads, particularly because the fields of the FIELD_LIST come from a promoted struct that ended up DNER. In those cases we should be able to recombine the fields back to a single large field, instead of combining them with bitwise operations.
  • Support for constant folding the bitwise insertions. This requires some more constant folding support in lowering.
  • The JIT has lots of (now outdated) restrictions based around multi-reg returns that get in the way. Lifting these should improve things considerably.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 5, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@neon-sunset
Copy link
Contributor

Incredible, thank you so much!

@jakobbotsch jakobbotsch marked this pull request as ready for review March 7, 2025 21:13
@Copilot Copilot AI review requested due to automatic review settings March 7, 2025 21:13
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.

@jakobbotsch
Copy link
Member Author

cc @dotnet/jit-contrib PTAL @EgorBo

Diffs. Generally size and perfscore improvements.

There are a number of regressions. I have looked through and written a follow-up list above -- many of them being missing constant folding in lowering.

There's also some bad looking diffs from this one:

The JIT has lots of (now outdated) restrictions based around multi-reg returns that get in the way. Lifting these should improve things considerably.

This is the cause of diffs like
image

The underlying problem is this restriction of LSRA:

// The current implementation of multi-reg structs that are referenced collectively
// (i.e. by referring to the parent lclVar rather than each field separately) relies
// on all or none of the fields being candidates.
if (varDsc->lvIsStructField)
{
LclVarDsc* parentVarDsc = compiler->lvaGetDesc(varDsc->lvParentLcl);
if (parentVarDsc->lvIsMultiRegRet && !parentVarDsc->lvDoNotEnregister)
{
JITDUMP("Setting multi-reg struct V%02u as not enregisterable:", varDsc->lvParentLcl);
compiler->lvaSetVarDoNotEnregister(varDsc->lvParentLcl DEBUGARG(DoNotEnregisterReason::BlockOp));

The new implementation expands returned promoted locals into FIELD_LIST of the fields in morph. Frequently we are then able to constant propagate or otherwise eliminate one of these fields. As soon as we eliminate all uses of one of the fields that field becomes untracked. Then the above restriction kicks in and DNERs the entire struct, which affects all the other fields. I think the future solution is going to be to avoid marking these locals as lvIsMultiRegRet entirely.

}

GenTree* folded = comp->gtFoldExprConst(node);
assert(folded == node);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I understand why it calls gtFoldExprConst here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just because I didn't want to handle CAST(CNS_INT) for all combinations here

Copy link
Member

@EgorBo EgorBo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Looking forward to seeing the same for struct args

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants