Skip to content

ARM64-SVE: refactor lsra buildHWIntrinsic #107459

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 35 commits into from
Oct 7, 2024
Merged

Conversation

a74nh
Copy link
Contributor

@a74nh a74nh commented Sep 6, 2024

Fixes #104842

The logic for hwintrisics has become convoluted. Refactor it, for both SVE and AdvSimd.

Add functions to get the operand (if any) for each requirement - delay slot, consecutive registers, address, etc.

Then use a simple for loop to iterate through each operand and build depending on which requirements match for that operand.

Tested by using stress_test.py on the entire HardwareIntrinsics_Arm set.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 6, 2024
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Sep 6, 2024
@a74nh
Copy link
Contributor Author

a74nh commented Sep 6, 2024

@kunalspathak - Still WIP. For now ignore anything outside of lsraarm64.cpp and lsra.hpp. All the other changes are from other PRs and will be removed was those have been merged.

@a74nh a74nh marked this pull request as ready for review September 9, 2024 15:41
@a74nh
Copy link
Contributor Author

a74nh commented Sep 9, 2024

This PR is ready now.

Requires #107084, #107180 and a workaround for #107537 in order for all the hwintrinsic tests to pass.

Apologies, this is a large change to review, and the github diff is confused about functions I haven't touched. Probably best starting a review from the new version of BuildHWIntrinsic()

I recommend this is not merged until after we've gone past the Net9 RC2 deadline.

@dotnet/arm64-contrib

I'll do a spmidiff next.

@kunalspathak kunalspathak added the arm-sve Work related to arm64 SVE/SVE2 support label Sep 12, 2024
@a74nh
Copy link
Contributor Author

a74nh commented Sep 13, 2024

Got some asmdiffs for the SVE tests. Spotted two differences, and one of them is due to issues in HEAD.

I'll raise PRs to fix these (plus one for LoadAndInsertScalar), and then rebase this once merged. I'd like there to be no asmdiff differences in this PR

./4546.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_BitwiseClear_long RunClassFldScenario() this (FullOpts)
./4000.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_AddSaturate_byte RunBasicScenario_Load() this (FullOpts)
./4130.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_And_sbyte RunClassFldScenario() this (FullOpts)
./27034.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Xor_byte RunBasicScenario_Load() this (FullOpts)
./22619.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Or_sbyte RunClassFldScenario() this (FullOpts)
./22615.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Or_sbyte RunBasicScenario_Load() this (FullOpts)
./4170.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_And_int RunBasicScenario_Load() this (FullOpts)
./26818.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_SubtractSaturate_int RunClassFldScenario() this (FullOpts)
./26730.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Subtract_uint RunClassFldScenario() this (FullOpts)
./4026.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_AddSaturate_ushort RunClassFldScenario() this (FullOpts)
./22659.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Or_int RunBasicScenario_Load() this (FullOpts)
./4280.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_And_ulong RunBasicScenario_Load() this (FullOpts)
./4258.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_And_uint RunBasicScenario_Load() this (FullOpts)
./4192.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_And_long RunBasicScenario_Load() this (FullOpts)
./26726.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Subtract_uint RunBasicScenario_Load() this (FullOpts)
./26906.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_SubtractSaturate_uint RunClassFldScenario() this (FullOpts)
./3481.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Add_long RunBasicScenario_Load() this (FullOpts)
./26642.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Subtract_int RunClassFldScenario() this (FullOpts)
./3547.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Add_uint RunBasicScenario_Load() this (FullOpts)
./4608.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_BitwiseClear_uint RunBasicScenario_Load() this (FullOpts)
./26946.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Xor_sbyte RunBasicScenario_Load() this (FullOpts)
./27082.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Xor_uint RunClassFldScenario() this (FullOpts)
./3529.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Add_ushort RunClassFldScenario() this (FullOpts)
./26968.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Xor_short RunBasicScenario_Load() this (FullOpts)
./27012.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Xor_long RunBasicScenario_Load() this (FullOpts)
./22681.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Or_long RunBasicScenario_Load() this (FullOpts)
./22769.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Or_ulong RunBasicScenario_Load() this (FullOpts)
./4218.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_And_byte RunClassFldScenario() this (FullOpts)
./27056.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Xor_ushort RunBasicScenario_Load() this (FullOpts)
./4520.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_BitwiseClear_int RunBasicScenario_Load() this (FullOpts)
./26704.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Subtract_ushort RunBasicScenario_Load() this (FullOpts)
./26994.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Xor_int RunClassFldScenario() this (FullOpts)
./4214.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_And_byte RunBasicScenario_Load() this (FullOpts)
./22747.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Or_uint RunBasicScenario_Load() this (FullOpts)
./22729.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Or_ushort RunClassFldScenario() this (FullOpts)
./26792.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_SubtractSaturate_short RunBasicScenario_Load() this (FullOpts)
./3415.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Add_sbyte RunBasicScenario_Load() this (FullOpts)
./3441.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Add_short RunClassFldScenario() this (FullOpts)
./26554.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Subtract_float RunClassFldScenario() this (FullOpts)
./26902.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_SubtractSaturate_uint RunBasicScenario_Load() this (FullOpts)
./4634.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_BitwiseClear_ulong RunClassFldScenario() this (FullOpts)
./22637.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Or_short RunBasicScenario_Load() this (FullOpts)
./3507.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Add_byte RunClassFldScenario() this (FullOpts)
./26616.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Subtract_short RunBasicScenario_Load() this (FullOpts)
./26880.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_SubtractSaturate_ushort RunBasicScenario_Load() this (FullOpts)
./22707.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Or_byte RunClassFldScenario() this (FullOpts)
./22703.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Or_byte RunBasicScenario_Load() this (FullOpts)
./3459.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Add_int RunBasicScenario_Load() this (FullOpts)
./22641.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Or_short RunClassFldScenario() this (FullOpts)
./3503.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Add_byte RunBasicScenario_Load() this (FullOpts)
./26990.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Xor_int RunBasicScenario_Load() this (FullOpts)
./3419.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Add_sbyte RunClassFldScenario() this (FullOpts)
./26814.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_SubtractSaturate_int RunBasicScenario_Load() this (FullOpts)
./27078.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Xor_uint RunBasicScenario_Load() this (FullOpts)

@a74nh
Copy link
Contributor Author

a74nh commented Sep 13, 2024

Latest version fixes up the diffs there were caused by this PR.
Once #107786 and #107791 are merged there should be no remaining diffs in this PR.

@kunalspathak
Copy link
Member

Latest version fixes up the diffs there were caused by this PR. Once #107786 and #107791 are merged there should be no remaining diffs in this PR.

Let's rebase this PR once the above mentioned PRs are merged to confirm there is zero asmdiff.

@a74nh
Copy link
Contributor Author

a74nh commented Sep 26, 2024

Rebased on top of the other fixes. As mentioned in #107786, fixed it so that BuildDelayFreeUses() is only called for matching register types. Need to confirm that there are no spmi diffs

@a74nh
Copy link
Contributor Author

a74nh commented Sep 27, 2024

No asm diffs now:

❯ python3 ./src/coreclr/scripts/superpmi.py collect $CORE_ROOT/corerun "./artifacts/tests/coreclr/linux.arm64.Checked/JIT/HardwareIntrinsics/HardwareIntrinsics_Arm_ro/HardwareIntrinsics_Arm_ro.dll"
[08:27:14] ================ Logging to /home/alahay01/dotnet/runtime_sve_api/artifacts/spmi/superpmi.log
[08:27:14] SuperPMI collect
[08:27:14] SuperPMI JIT Path: /home/alahay01/dotnet/runtime_sve_api/artifacts/tests/coreclr/linux.arm64.Checked/Tests/Core_Root/libclrjit.so
[08:27:14] Collecting using command:
[08:27:14]   /home/alahay01/dotnet/runtime_sve_api/artifacts/tests/coreclr/linux.arm64.Checked/Tests/Core_Root/corerun ./artifacts/tests/coreclr/linux.arm64.Checked/JIT/HardwareIntrinsics/HardwareIntrinsics_Arm_ro/HardwareIntrinsics_Arm_ro.dll
[08:30:53] Merging MC files
[08:30:57] Copy base MCH file to final MCH file
[08:31:08] Creating TOC file
[08:31:10] Generated MCH file: /home/alahay01/dotnet/runtime_sve_api/linux.arm64.Checked.mch

❯ python3 ./src/coreclr/scripts/superpmi.py asmdiffs -mch_files /home/alahay01/dotnet/runtime_sve_api/linux.arm64.Checked.mch
[08:35:57] ================ Logging to /home/alahay01/dotnet/runtime_sve_api/artifacts/spmi/superpmi.1.log
[08:35:57] Using JIT/EE Version from jiteeversionguid.h: b75a5475-ff22-4078-9551-2024ce03d383
[08:35:58] Baseline hash: cdc8418a7f4e51b771db2ae7ee5cde5f479cde7e
[08:35:58] Download: https://clrjit2.blob.core.windows.net/jitrollingbuild/builds/f1bcbeb5fa2fe84698b62d88dd35199f0d7fbedb/linux/arm64/Checked/libclrjit.so -> /home/alahay01/dotnet/runtime_sve_api/artifacts/spmi/basejit/f1bcbeb5fa2fe84698b62d88dd35199f0d7fbedb.linux.arm64.Checked/libclrjit.so
Downloading 5.6/5.6 MB...
[08:35:59] Downloaded https://clrjit2.blob.core.windows.net/jitrollingbuild/builds/f1bcbeb5fa2fe84698b62d88dd35199f0d7fbedb/linux/arm64/Checked/libclrjit.so
[08:35:59] Using baseline /home/alahay01/dotnet/runtime_sve_api/artifacts/spmi/basejit/f1bcbeb5fa2fe84698b62d88dd35199f0d7fbedb.linux.arm64.Checked/libclrjit.so
[08:35:59] Using coredistools found at /home/alahay01/dotnet/runtime_sve_api/artifacts/tests/coreclr/linux.arm64.Checked/Tests/Core_Root/libcoredistools.so
[08:35:59] SuperPMI ASM diffs
[08:35:59] Base JIT Path: /home/alahay01/dotnet/runtime_sve_api/artifacts/spmi/basejit/f1bcbeb5fa2fe84698b62d88dd35199f0d7fbedb.linux.arm64.Checked/libclrjit.so
[08:35:59] Diff JIT Path: /home/alahay01/dotnet/runtime_sve_api/artifacts/tests/coreclr/linux.arm64.Checked/Tests/Core_Root/libclrjit.so
[08:35:59] Using MCH files:
[08:35:59]   /home/alahay01/dotnet/runtime_sve_api/linux.arm64.Checked.mch
[08:35:59] Running asm diffs of /home/alahay01/dotnet/runtime_sve_api/linux.arm64.Checked.mch
[08:36:39] Clean SuperPMI diff (72927 contexts processed)
[08:36:39] Asm diffs summary:
[08:36:39]   Summary Markdown file: /home/alahay01/dotnet/runtime_sve_api/artifacts/spmi/diff_summary.md
[08:36:39]   Short Summary Markdown file: /home/alahay01/dotnet/runtime_sve_api/artifacts/spmi/diff_short_summary.md
[08:36:39]   No asm diffs

❯ cat /home/alahay01/dotnet/runtime_sve_api/artifacts/spmi/diff_summary.md
Diffs are based on <span style="color:#1460aa">72,927</span> contexts (<span style="color:#1460aa">1</span> MinOpts, <span style="color:#1460aa">72,926</span> FullOpts).

No diffs found.

<details>
<summary>Details</summary>
<div style="margin-left:1em">

#### Context information

|Collection|Diffed contexts|MinOpts|FullOpts|Missed, base|Missed, diff|
|---|--:|--:|--:|--:|--:|
|linux.arm64.Checked.mch|72,927|1|72,926|0 (0.00%)|0 (0.00%)|




</div></details>

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some minor comments.


if (resultOpNum != 0)
{
embeddedDelayFreeOp = embeddedOpNode->Op(resultOpNum);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hhm, can you confirm if we just overwrite the embeddedDelayFreeOp that was passed as the parameter to this function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hhm, can you confirm if we just overwrite the embeddedDelayFreeOp that was passed as the parameter to this function?

Yes. the value being passed into BuildEmbeddedOperandUses() is the address of embeddedDelayFreeOp. That value can be changed within BuildEmbeddedOperandUses() and it won't effect any variables outside BuildEmbeddedOperandUses() that were also same to the same value.
I think that's what you were asking for.

@a74nh
Copy link
Contributor Author

a74nh commented Oct 2, 2024

All fixed up as suggested.

@kunalspathak
Copy link
Member

I was hoping this to be zero diff, but there are still diffs related to LoadAndInsertScalar intrinsics. Can you please double check?

image

@a74nh
Copy link
Contributor Author

a74nh commented Oct 3, 2024

I was hoping this to be zero diff, but there are still diffs related to LoadAndInsertScalar intrinsics. Can you please double check?

Looks like the reason for this is that in HEAD, op1 is generated via:

            assert(isRMW);
            assert(intrin.op1->OperIs(GT_FIELD_LIST));

            GenTreeFieldList* op1 = intrin.op1->AsFieldList();
            assert(compiler->info.compNeedsConsecutiveRegisters);

            for (GenTreeFieldList::Use& use : op1->Uses())
            {
                BuildDelayFreeUses(use.GetNode(), intrinsicTree);
                srcCount++;
            }

My patch generates op1 via:

            srcCount += BuildConsecutiveRegistersForUse(operand, delayFreeOp);

BuildConsecutiveRegistersForUse() was added for generating op1 for NI_AdvSimd_VectorTableLookup and NI_AdvSimd_Arm64_VectorTableLookup.

AIUI, BuildConsecutiveRegistersForUse() is a more generic function and so should be useable for LoadAndInsertScalarVector...

@a74nh
Copy link
Contributor Author

a74nh commented Oct 3, 2024

I was hoping this to be zero diff, but there are still diffs related to LoadAndInsertScalar intrinsics. Can you please double check?

With this patch there are no diffs again. It would be shame to put something so specific in this patch.

diff --git a/src/coreclr/jit/lsraarm64.cpp b/src/coreclr/jit/lsraarm64.cpp
index d124404beb0..6646791f790 100644
--- a/src/coreclr/jit/lsraarm64.cpp
+++ b/src/coreclr/jit/lsraarm64.cpp
@@ -1439,8 +1439,32 @@ int LinearScan::BuildHWIntrinsic(GenTreeHWIntrinsic* intrinsicTree, int* pDstCou
         {
             assert(candidates == RBM_NONE);

-            // Some operands have consective op which is also a delay free op
-            srcCount += BuildConsecutiveRegistersForUse(operand, delayFreeOp);
+            switch (intrin.id)
+            {
+                case NI_AdvSimd_LoadAndInsertScalarVector64x2:
+                case NI_AdvSimd_LoadAndInsertScalarVector64x3:
+                case NI_AdvSimd_LoadAndInsertScalarVector64x4:
+                case NI_AdvSimd_Arm64_LoadAndInsertScalarVector128x2:
+                case NI_AdvSimd_Arm64_LoadAndInsertScalarVector128x3:
+                case NI_AdvSimd_Arm64_LoadAndInsertScalarVector128x4:
+                {
+                    assert(consecutiveOp->OperIs(GT_FIELD_LIST));
+
+                    GenTreeFieldList* fieldOp = operand->AsFieldList();
+
+                    for (GenTreeFieldList::Use& use : fieldOp->Uses())
+                    {
+                        BuildDelayFreeUses(use.GetNode(), intrinsicTree);
+                        srcCount++;
+                    }
+                    break;
+                }
+
+                default:
+                    // Some operands have consective op which is also a delay free op
+                    srcCount += BuildConsecutiveRegistersForUse(operand, delayFreeOp);
+                    break;
+            }
         }
         else if (delayFreeOp == operand)
         {

@kunalspathak
Copy link
Member

With this patch there are no diffs again. It would be shame to put something so specific in this patch.

Alright, I did some digging and turns out that for upperVectorRestore, we were not marking them as "delay-free", which this PR does.

image

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kunalspathak kunalspathak merged commit 6c0b392 into dotnet:main Oct 7, 2024
109 of 111 checks passed
@a74nh a74nh deleted the rmwnum_github branch October 8, 2024 07:21
sirntar pushed a commit to sirntar/runtime that referenced this pull request Oct 8, 2024
* Add BuildConditionalSelectHWIntrinsic()

* Add GetRMWOp()

* Use GetDelayFreeOp() in BuildConditionalSelectWithEmbeddedOp()

* simplify op2 handling

* Add getVectorAddrOperand()

* Add getConsecutiveRegistersOperand

* Add BuildOperand()

* Use BuildOperand for op1

* Add buildHWIntrinsicImmediate

* Add getOperandCandidates()

* Remove BuildOperand()

* remove delayFreeMultiple

* Fixes from other PRs to be removed

* Fix formatting

* Use BuildHWIntrinsicImmediate for conditional select

* Remove IsRMW

* Replace BuildConditionalSelectWithEmbeddedOp() with BuildEmbeddedOperandUses()

* Revert "Fixes from other PRs to be removed"

* Move functions

* Move functions

* Remove failing unary tests

* Fix opNum type

* Revert "Remove failing unary tests"

* Remove cases from getDelayFreeOperand that are handled by default

* review cleanups

* Simplify masks in getOperandCandidates()

* Remove IsMaskedOperation()

* Check for optional embedded masks in getDelayFreeOperand

* Only call BuildDelayFreeUses when register types match

* Assert only on Arm64

* Comment fixups
@github-actions github-actions bot locked and limited conversation to collaborators Nov 7, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI arm-sve Work related to arm64 SVE/SVE2 support community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JIT: SVE Cleanup - Simplify handling of RMW intrinsics in LSRA
2 participants