Arm64/Sve: Fix overzealous assert for embedded mask intrinsics with RMW semantics #105541

kunalspathak · 2024-07-26T05:29:36Z

For RMW intrinsics, we delay free the SVE operands wrt to the RMW operand node, which means for some embedded mask operations like FMLA, the operands can be given the same register as the target of the parent CndSel node. Thus, relax some asserts around the embedded mask operands' registers not matching the CndSel's target register.

Resolves: #105474

wrap in TARGET_ARM64

kunalspathak · 2024-07-26T14:32:40Z

@dotnet/jit-contrib @dotnet/arm64-contrib

amanasifkhalid · 2024-07-26T14:56:55Z

src/tests/JIT/Regression/JitBlue/Runtime_105474/Runtime_105474.csproj

+<Project Sdk="Microsoft.NET.Sdk">
+  <PropertyGroup>
+    <Optimize>True</Optimize>
+  </PropertyGroup>


Should we add <CLRTestTargetUnsupported Condition="'$(TargetArchitecture)' != 'arm64'">true</CLRTestTargetUnsupported> to avoid building this test unnecessarily?

We could, but there are lot of tests in this folder that do not have this. So would rather skip it and do it together for all the tests.

tannergooding · 2024-07-26T16:20:10Z

src/coreclr/jit/lsraarm64.cpp

-            srcCount += BuildDelayFreeUses(emitOp2, emitOp1);
-            srcCount += BuildDelayFreeUses(emitOp3, emitOp1);
-            srcCount += BuildDelayFreeUses(intrin.op3, emitOp1);
+            srcCount += BuildDelayFreeUses(emitOp2, emitOp1, intrinsicTree);


I don't understand this change, FMLA doesn't have 2x RMW nodes, it has 1.

Given dst = CndSel(mask, fmla(op1, op2, op3), merge)

movprfx (predicated) fmla (predicated) - contained movprfx (unpredicated) fmla (predicated) - contained fmla (predicated) sel (predicated) - not contained

In the case of the first one, movprfx (predicated) works as dst = CndSel(mask, op1, dst), so therefore the merge value is actually the RMW node. We then emit fmla dst, mask, op2, op3 So inactive elements stay as is (values from dst) and active elements are overwritten (this is the form we emit for merge == zero and for other merge values in many cases).

In the case of the second one, movprfx (unpredicated) works simply as dst = op1, so therefore op1 is the RMW node. We then emit fmla dst, mask, op2, op3, so inactive elements stay as is (this is the form we emit for mask == ptrue or merge == op1)

In the case of the final one we have two independent instruction sequences that get emitted. So the first fmla only considers with regards to itself and will often execute the movprfx (unpredicated) path because the mask needs to be considered as all true only not emitting movprfx (unpredicated) when dst == op1 already. The sel then independently merges the result with merge and there is no RMW consideration

This reverts commit 547b8e5.

tannergooding · 2024-07-27T15:52:16Z

src/tests/JIT/Regression/JitBlue/Runtime_105474/Runtime_105474.cs

+    private static void TestMethod2(Vector<double> mask)
+    {
+        var vr1 = Vector128.CreateScalar((double)10).AsVector();
+        s_3 = Sve.ConditionalSelect(mask, Sve.FusedMultiplyAdd(vr1, s_3, s_3), s_3);


Is this testing what we want?

s_3 is a static field, so I'd expect this is relying extra on CSE and other opts to get the "right shape"

Sve.FusedMultiplyAdd should also be capable of emitting either FMAD (Zdn = Za + Zdn * Zm) or FMLA (Zda = Zda + Zn * Zm), which is to say between these two instructions any operand can be the RMW one, so it might be good to ensure we're covering the range here with some comments or even assertions (explicit validation using the disassembly checking functionality) of what codegen is expected

I hit an assert with this scenario so added it.

@tannergooding how would we go about explicitly validating the codegen for this? On my machine, the decision to use fmad or fmla for a test changes depending on if we're optimizing or not (though I'm not hitting asserts either way with this change).

By utilizing https://github.com/dotnet/runtime/blob/main/docs/workflow/testing/coreclr/disasm-checks.md

We should be able to add tests that validate each of the operands given Sve.FusedMultiplyAdd(a, b, c) end up as the target. We should then be able to further expand those base tests over ConditionalSelect to ensure the right behavior still occurs.

It might be a release and TieredCompilation=0 only test, but it should be possible to get setup and validated.

@tannergooding thanks for pointing that out -- I've updated the tests to check for expected codegen.

amanasifkhalid · 2024-08-06T14:26:53Z

@TIHan the disasm check logic in the test is failing because it cannot find the JitDisasm output file. I can't repro this failure locally, so I think the test script is handling paths correctly. Have you run into issues like this with disasm tests in CI before?

markples · 2024-08-06T17:04:43Z

@TIHan the disasm check logic in the test is failing because it cannot find the JitDisasm output file. I can't repro this failure locally, so I think the test script is handling paths correctly. Have you run into issues like this with disasm tests in CI before?

Is it possible that Sve.IsSupported is true on your machine and not on the CI machine? I don't think the file will exist if nothing is written.

TIHan · 2024-08-06T17:16:34Z

@markples I believe that's right. If the machine doesn't support it, it won't write it. For this scenario, we would need to support alt-jit to support SVE for disasm-checks which should be possible.

amanasifkhalid · 2024-08-06T18:35:30Z

@TIHan @markples good point, thank you for clarifying this -- I'm guessing there's no trivial way to conditionalize the disasm check, right? I tried naively setting DOTNET_AltJit in the csproj file to see if I can get this working locally on a x64 machine, and this breaks the dotnet.exe command for running SuperFileCheck, so you're right that some work will be needed to get this supported.

@tannergooding since the assert this PR fixes is popping up in Fuzzlyn and Antigen, are you ok with taking this without any disasm checks in the test? I can update the annotations to just be comments documenting the expected codegen.

markples · 2024-08-06T19:12:00Z

For x64 I think you could just disable the overall test. Or, as long as any output file gets generated, none of the // ARM64 checks apply anyway (not sure if we avoid opening the file if there are no checks -- maybe we do and that's why x64 worked in ci?)

I think it's going to be harder for arm64 machines that may or may not have sve support unless there's a way to not run the test at all based on sve rather than arm64. Or // SVE instead of // ARM64 and using whatever is making x64 work today?

Longer term, I wonder if there's a way to add first class support to our testing infrastructure.

amanasifkhalid · 2024-08-07T19:44:34Z

I think it's going to be harder for arm64 machines that may or may not have sve support unless there's a way to not run the test at all based on sve rather than arm64. Or // SVE instead of // ARM64 and using whatever is making x64 work today?

Yeah, this seems to be the primary blocker -- at the moment, I don't think the disasm check infra supports conditionalizing on the ISA, and I don't know of a way to skip this test in CI if the machine doesn't support SVE. I think we should pursue support for disasm checks in this test as a follow-up, since this PR is also fixing a blocking issue.

tannergooding · 2024-08-07T19:49:10Z

I'm fine with not adding it, but I think we should ensure an issue is logged to track the ability to add such support long term.

The main goal of requesting it be added was to ensure we're hitting the relevant code paths and generating the expected codegen for each of them, to help ensure no corner cases were missed

amanasifkhalid · 2024-08-07T20:07:15Z

I opened #106093 to track this.

markples · 2024-08-07T20:10:42Z

I agree to get this PR in. If you wanted a short-term fix for here, I think you could change CLRTest.Jit.targets to use (something like) $([<namespace>.SVE]::IsSupported) to conditionally construct an additional --check-prefixes value such as // ARM64-SVE and use that instead in the C# test code.

amanasifkhalid · 2024-08-08T16:49:18Z

@markples thank you for the suggestion! Do you know what subset of the runtime I need to build to pick up changes to CLRTest.Jit.targets? I tweaked one of the test cases to use ARM64-SVE-FULL-LINE and made the expected codegen nonsensical to try to get it to fail, but after building clr.runtime, regenerating Core_Root, and rebuilding the test, it's not failing

amanasifkhalid · 2024-08-08T19:06:07Z

I spoke with @markples offline, and the CLRTest.Jit.targets solution would require adding a new property function to check Sve.IsSupported, so it's not as clean of a fix as we initially thought. Supporting AltJits with disasm checks seems like a simpler and more flexible solution.

@tannergooding are you ok with me merging this as-is?

amanasifkhalid · 2024-08-09T17:00:35Z

ping @tannergooding

amanasifkhalid · 2024-08-13T16:15:05Z

ping @tannergooding

JulieLeeMSFT · 2024-08-14T13:18:03Z

Adding @AndyAyersMS as a reviewer.

AndyAyersMS

Since this PR is now just changing assert behavior, can we update the description to describe it more accurately?

I'm not at all familiar with this part of the jit, so while I'll conditionally approve this it would be good for somebody who is more familiar to look it over too.

amanasifkhalid · 2024-08-14T15:07:27Z

Since this PR is now just changing assert behavior, can we update the description to describe it more accurately?

Sure thing; fixed.

I'm not at all familiar with this part of the jit, so while I'll conditionally approve this it would be good for somebody who is more familiar to look it over too.

@TIHan @a74nh could you PTAL? Thanks

tannergooding

LGTM.

-- Sorry for the delay, missed this one in the swarm of other PRs I had been also reviewing

amanasifkhalid · 2024-08-14T17:32:54Z

Sorry for the delay, missed this one in the swarm of other PRs I had been also reviewing

No worries, thanks!

kunalspathak added 2 commits July 25, 2024 22:26

DelayFreeUses wrt CndSel node

547b8e5

wrap in TARGET_ARM64

Add test case

20477eb

ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 26, 2024

dotnet-policy-service bot assigned kunalspathak Jul 26, 2024

This was referenced Jul 26, 2024

System.IO.Net5Compat.Tests and System.IO.Tests suddenly exiting with error 137 #100558

Open

msbuild crashes with "MSB0001: Internal MSBuild Error: must be valid" dotnet/dnceng#3304

Open

fix test

7ffd332

amanasifkhalid approved these changes Jul 26, 2024

View reviewed changes

kunalspathak added 2 commits July 26, 2024 08:21

fix a type-cast

025f7e5

jit format

3c60b9d

tannergooding reviewed Jul 26, 2024

View reviewed changes

kunalspathak added 4 commits July 26, 2024 11:10

Revert "DelayFreeUses wrt CndSel node"

2567404

This reverts commit 547b8e5.

relax some asserts

101f666

some more test coverage

8fe1e9c

jit format

07e0133

This was referenced Jul 26, 2024

TimeProviderTests.TestProviderTimer failed in CI #103459

Closed

System.Numerics.Tensors.Tests.TensorSpanTests test failure #103525

Closed

System.IO.Tests crash in CI (Linux arm64) #100441

Closed

tannergooding reviewed Jul 27, 2024

View reviewed changes

jakobbotsch mentioned this pull request Jul 30, 2024

JIT: SVE Assertion failed 'targetReg != embMaskOp2Reg' during 'Generate code' #105474

Closed

amanasifkhalid added 3 commits July 30, 2024 11:29

temp move

6830b8a

Merge branch 'main' into delayfree-fmla

a296e40

Fix test naming

574242c

JulieLeeMSFT assigned amanasifkhalid and unassigned kunalspathak Jul 30, 2024

JulieLeeMSFT added this to the 9.0.0 milestone Jul 30, 2024

amanasifkhalid added 2 commits August 5, 2024 17:50

Use disasm checker in tests

239940c

Suppress experimental warning

a79a2fd

Remove disasm check

fb1d411

amanasifkhalid mentioned this pull request Aug 7, 2024

Enable AltJit usage with disassembly-checked tests #106093

Open

JulieLeeMSFT requested review from tannergooding and AndyAyersMS August 14, 2024 11:57

AndyAyersMS approved these changes Aug 14, 2024

View reviewed changes

amanasifkhalid changed the title ~~Arm64/Sve: DelayfreeUses for FMLA should take into account CndSel for RMW~~ Arm64/Sve: Fix overzealous assert for embedded mask intrinsics with RMW semantics Aug 14, 2024

JulieLeeMSFT requested a review from a74nh August 14, 2024 16:31

tannergooding approved these changes Aug 14, 2024

View reviewed changes

amanasifkhalid merged commit d03b2d0 into dotnet:main Aug 14, 2024
117 checks passed

github-actions bot locked and limited conversation to collaborators Sep 14, 2024

Arm64/Sve: Fix overzealous assert for embedded mask intrinsics with RMW semantics #105541

Arm64/Sve: Fix overzealous assert for embedded mask intrinsics with RMW semantics #105541

Uh oh!

Conversation

kunalspathak commented Jul 26, 2024 • edited by amanasifkhalid Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kunalspathak commented Jul 26, 2024

Uh oh!

amanasifkhalid Jul 26, 2024

Choose a reason for hiding this comment

Uh oh!

kunalspathak Jul 26, 2024

Choose a reason for hiding this comment

Uh oh!

tannergooding Jul 26, 2024

Choose a reason for hiding this comment

Uh oh!

tannergooding Jul 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kunalspathak Jul 27, 2024

Choose a reason for hiding this comment

Uh oh!

amanasifkhalid Jul 30, 2024

Choose a reason for hiding this comment

Uh oh!

tannergooding Jul 30, 2024

Choose a reason for hiding this comment

Uh oh!

amanasifkhalid Aug 5, 2024

Choose a reason for hiding this comment

Uh oh!

amanasifkhalid commented Aug 6, 2024

Uh oh!

markples commented Aug 6, 2024

Uh oh!

TIHan commented Aug 6, 2024

Uh oh!

amanasifkhalid commented Aug 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

markples commented Aug 6, 2024

Uh oh!

amanasifkhalid commented Aug 7, 2024

Uh oh!

tannergooding commented Aug 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amanasifkhalid commented Aug 7, 2024

Uh oh!

markples commented Aug 7, 2024

Uh oh!

amanasifkhalid commented Aug 8, 2024

Uh oh!

amanasifkhalid commented Aug 8, 2024

Uh oh!

amanasifkhalid commented Aug 9, 2024

Uh oh!

amanasifkhalid commented Aug 13, 2024

Uh oh!

JulieLeeMSFT commented Aug 14, 2024

Uh oh!

AndyAyersMS left a comment

Choose a reason for hiding this comment

Uh oh!

amanasifkhalid commented Aug 14, 2024

Uh oh!

tannergooding left a comment

Choose a reason for hiding this comment

Uh oh!

amanasifkhalid commented Aug 14, 2024

Uh oh!

Uh oh!

Uh oh!

kunalspathak commented Jul 26, 2024 •

edited by amanasifkhalid

Loading

tannergooding Jul 27, 2024 •

edited

Loading

amanasifkhalid commented Aug 6, 2024 •

edited

Loading

tannergooding commented Aug 7, 2024 •

edited

Loading