Skip to content

Conversation

tannergooding
Copy link
Member

No description provided.

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 30, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@tannergooding tannergooding marked this pull request as ready for review June 30, 2024 05:29
@tannergooding
Copy link
Member Author

CC. @EgorBo, this fixes the minor codegen issue we found

@tannergooding
Copy link
Member Author

tannergooding commented Jun 30, 2024

Guess it's not so minor, since we weren't doing containment at all (for the address or for the index):

Overall (-34,888 bytes)
MinOpts (-5,128 bytes)
FullOpts (-29,760 bytes)

Most of these are simple cases such as:

-            mov     w0, #2
             dup     s16, v16.s[2]

-or-

-            mov     w6, #1
             umov    x6, v16.d[1]

But then we also get cases such as:

-            ldr     q17, [fp, #0x30]	// [V02 loc0]
-            mov     w0, #3
-            dup     s17, v17.s[3]
+            ldr     s17, [fp, #0x3C]	// [V02 loc0+0x0c]

There's a lot of code that ends up extracting or moving elements, so this gets some decent payoffs overall.

Edit: The below is part of a blr (method call) which is why its still getting loaded, it is actually used so we're all good 🎉
It does, however, look like there's some other missed containment opportunity as well:

-            mov     w1, #2
             mov     v8.d[1], v9.d[0]
             dup     s0, v8.s[2]
+            mov     w1, #2
             ldr     x0, [x20, #0x08]

Going to see if I can identify where that is, but I wouldn't block this on that.

HARDWARE_INTRINSIC(Vector128, Floor, 16, 1, false, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_Helper, HW_Flag_InvalidNodeId)
HARDWARE_INTRINSIC(Vector128, FusedMultiplyAdd, 16, 3, false, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_Helper, HW_Flag_InvalidNodeId)
HARDWARE_INTRINSIC(Vector128, GetElement, 16, 2, true, {INS_smov, INS_umov, INS_smov, INS_umov, INS_smov, INS_umov, INS_umov, INS_umov, INS_dup, INS_dup}, HW_Category_Helper, HW_Flag_SpecialImport|HW_Flag_SpecialCodeGen|HW_Flag_BaseTypeFromFirstArg)
HARDWARE_INTRINSIC(Vector128, GetElement, 16, 2, true, {INS_smov, INS_umov, INS_smov, INS_umov, INS_smov, INS_umov, INS_umov, INS_umov, INS_dup, INS_dup}, HW_Category_Helper, HW_Flag_SpecialImport|HW_Flag_SpecialCodeGen|HW_Flag_BaseTypeFromFirstArg|HW_Flag_SupportsContainment)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one thing I noticed that GetElement is not marked as HW_Flag_SupportsContainment for x64 but it still works there 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They’re functionally inverses of each other.

On Arm64 containment is a rare consideration and so is opt-in, while on x64 it’s a common thing for almost every operation and is instead opt-out

@tannergooding tannergooding merged commit a7709e5 into dotnet:main Jun 30, 2024
@tannergooding tannergooding deleted the arm-contain branch June 30, 2024 13:06
@github-actions github-actions bot locked and limited conversation to collaborators Jul 31, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants