Skip to content

Arm64/Sve: Fix overzealous assert for embedded mask intrinsics with RMW semantics #105541

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Aug 14, 2024

Conversation

kunalspathak
Copy link
Member

@kunalspathak kunalspathak commented Jul 26, 2024

For RMW intrinsics, we delay free the SVE operands wrt to the RMW operand node, which means for some embedded mask operations like FMLA, the operands can be given the same register as the target of the parent CndSel node. Thus, relax some asserts around the embedded mask operands' registers not matching the CndSel's target register.

Resolves: #105474

@kunalspathak
Copy link
Member Author

@dotnet/jit-contrib @dotnet/arm64-contrib

<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<Optimize>True</Optimize>
</PropertyGroup>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add <CLRTestTargetUnsupported Condition="'$(TargetArchitecture)' != 'arm64'">true</CLRTestTargetUnsupported> to avoid building this test unnecessarily?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but there are lot of tests in this folder that do not have this. So would rather skip it and do it together for all the tests.

srcCount += BuildDelayFreeUses(emitOp2, emitOp1);
srcCount += BuildDelayFreeUses(emitOp3, emitOp1);
srcCount += BuildDelayFreeUses(intrin.op3, emitOp1);
srcCount += BuildDelayFreeUses(emitOp2, emitOp1, intrinsicTree);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this change, FMLA doesn't have 2x RMW nodes, it has 1.

Given dst = CndSel(mask, fmla(op1, op2, op3), merge)

movprfx (predicated)
fmla (predicated) - contained

movprfx (unpredicated)
fmla (predicated) - contained

fmla (predicated)
sel (predicated) - not contained

In the case of the first one, movprfx (predicated) works as dst = CndSel(mask, op1, dst), so therefore the merge value is actually the RMW node. We then emit fmla dst, mask, op2, op3 So inactive elements stay as is (values from dst) and active elements are overwritten (this is the form we emit for merge == zero and for other merge values in many cases).

In the case of the second one, movprfx (unpredicated) works simply as dst = op1, so therefore op1 is the RMW node. We then emit fmla dst, mask, op2, op3, so inactive elements stay as is (this is the form we emit for mask == ptrue or merge == op1)

In the case of the final one we have two independent instruction sequences that get emitted. So the first fmla only considers with regards to itself and will often execute the movprfx (unpredicated) path because the mask needs to be considered as all true only not emitting movprfx (unpredicated) when dst == op1 already. The sel then independently merges the result with merge and there is no RMW consideration

private static void TestMethod2(Vector<double> mask)
{
var vr1 = Vector128.CreateScalar((double)10).AsVector();
s_3 = Sve.ConditionalSelect(mask, Sve.FusedMultiplyAdd(vr1, s_3, s_3), s_3);
Copy link
Member

@tannergooding tannergooding Jul 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this testing what we want?

s_3 is a static field, so I'd expect this is relying extra on CSE and other opts to get the "right shape"

Sve.FusedMultiplyAdd should also be capable of emitting either FMAD (Zdn = Za + Zdn * Zm) or FMLA (Zda = Zda + Zn * Zm), which is to say between these two instructions any operand can be the RMW one, so it might be good to ensure we're covering the range here with some comments or even assertions (explicit validation using the disassembly checking functionality) of what codegen is expected

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hit an assert with this scenario so added it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tannergooding how would we go about explicitly validating the codegen for this? On my machine, the decision to use fmad or fmla for a test changes depending on if we're optimizing or not (though I'm not hitting asserts either way with this change).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By utilizing https://github.com/dotnet/runtime/blob/main/docs/workflow/testing/coreclr/disasm-checks.md

We should be able to add tests that validate each of the operands given Sve.FusedMultiplyAdd(a, b, c) end up as the target. We should then be able to further expand those base tests over ConditionalSelect to ensure the right behavior still occurs.

It might be a release and TieredCompilation=0 only test, but it should be possible to get setup and validated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tannergooding thanks for pointing that out -- I've updated the tests to check for expected codegen.

@JulieLeeMSFT JulieLeeMSFT added this to the 9.0.0 milestone Jul 30, 2024
@amanasifkhalid
Copy link
Member

@TIHan the disasm check logic in the test is failing because it cannot find the JitDisasm output file. I can't repro this failure locally, so I think the test script is handling paths correctly. Have you run into issues like this with disasm tests in CI before?

@markples
Copy link
Contributor

markples commented Aug 6, 2024

@TIHan the disasm check logic in the test is failing because it cannot find the JitDisasm output file. I can't repro this failure locally, so I think the test script is handling paths correctly. Have you run into issues like this with disasm tests in CI before?

Is it possible that Sve.IsSupported is true on your machine and not on the CI machine? I don't think the file will exist if nothing is written.

@TIHan
Copy link
Contributor

TIHan commented Aug 6, 2024

@markples I believe that's right. If the machine doesn't support it, it won't write it. For this scenario, we would need to support alt-jit to support SVE for disasm-checks which should be possible.

@amanasifkhalid
Copy link
Member

amanasifkhalid commented Aug 6, 2024

@TIHan @markples good point, thank you for clarifying this -- I'm guessing there's no trivial way to conditionalize the disasm check, right? I tried naively setting DOTNET_AltJit in the csproj file to see if I can get this working locally on a x64 machine, and this breaks the dotnet.exe command for running SuperFileCheck, so you're right that some work will be needed to get this supported.

@tannergooding since the assert this PR fixes is popping up in Fuzzlyn and Antigen, are you ok with taking this without any disasm checks in the test? I can update the annotations to just be comments documenting the expected codegen.

@markples
Copy link
Contributor

markples commented Aug 6, 2024

For x64 I think you could just disable the overall test. Or, as long as any output file gets generated, none of the // ARM64 checks apply anyway (not sure if we avoid opening the file if there are no checks -- maybe we do and that's why x64 worked in ci?)

I think it's going to be harder for arm64 machines that may or may not have sve support unless there's a way to not run the test at all based on sve rather than arm64. Or // SVE instead of // ARM64 and using whatever is making x64 work today?

Longer term, I wonder if there's a way to add first class support to our testing infrastructure.

@amanasifkhalid
Copy link
Member

I think it's going to be harder for arm64 machines that may or may not have sve support unless there's a way to not run the test at all based on sve rather than arm64. Or // SVE instead of // ARM64 and using whatever is making x64 work today?

Yeah, this seems to be the primary blocker -- at the moment, I don't think the disasm check infra supports conditionalizing on the ISA, and I don't know of a way to skip this test in CI if the machine doesn't support SVE. I think we should pursue support for disasm checks in this test as a follow-up, since this PR is also fixing a blocking issue.

@tannergooding
Copy link
Member

tannergooding commented Aug 7, 2024

I'm fine with not adding it, but I think we should ensure an issue is logged to track the ability to add such support long term.

The main goal of requesting it be added was to ensure we're hitting the relevant code paths and generating the expected codegen for each of them, to help ensure no corner cases were missed

@amanasifkhalid
Copy link
Member

I opened #106093 to track this.

@markples
Copy link
Contributor

markples commented Aug 7, 2024

I agree to get this PR in. If you wanted a short-term fix for here, I think you could change CLRTest.Jit.targets to use (something like) $([<namespace>.SVE]::IsSupported) to conditionally construct an additional --check-prefixes value such as // ARM64-SVE and use that instead in the C# test code.

@amanasifkhalid
Copy link
Member

@markples thank you for the suggestion! Do you know what subset of the runtime I need to build to pick up changes to CLRTest.Jit.targets? I tweaked one of the test cases to use ARM64-SVE-FULL-LINE and made the expected codegen nonsensical to try to get it to fail, but after building clr.runtime, regenerating Core_Root, and rebuilding the test, it's not failing

@amanasifkhalid
Copy link
Member

I spoke with @markples offline, and the CLRTest.Jit.targets solution would require adding a new property function to check Sve.IsSupported, so it's not as clean of a fix as we initially thought. Supporting AltJits with disasm checks seems like a simpler and more flexible solution.

@tannergooding are you ok with me merging this as-is?

@amanasifkhalid
Copy link
Member

ping @tannergooding

1 similar comment
@amanasifkhalid
Copy link
Member

ping @tannergooding

@JulieLeeMSFT
Copy link
Member

Adding @AndyAyersMS as a reviewer.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this PR is now just changing assert behavior, can we update the description to describe it more accurately?

I'm not at all familiar with this part of the jit, so while I'll conditionally approve this it would be good for somebody who is more familiar to look it over too.

@amanasifkhalid amanasifkhalid changed the title Arm64/Sve: DelayfreeUses for FMLA should take into account CndSel for RMW Arm64/Sve: Fix overzealous assert for embedded mask intrinsics with RMW semantics Aug 14, 2024
@amanasifkhalid
Copy link
Member

Since this PR is now just changing assert behavior, can we update the description to describe it more accurately?

Sure thing; fixed.

I'm not at all familiar with this part of the jit, so while I'll conditionally approve this it would be good for somebody who is more familiar to look it over too.

@TIHan @a74nh could you PTAL? Thanks

@JulieLeeMSFT JulieLeeMSFT requested a review from a74nh August 14, 2024 16:31
Copy link
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

-- Sorry for the delay, missed this one in the swarm of other PRs I had been also reviewing

@amanasifkhalid
Copy link
Member

Sorry for the delay, missed this one in the swarm of other PRs I had been also reviewing

No worries, thanks!

@amanasifkhalid amanasifkhalid merged commit d03b2d0 into dotnet:main Aug 14, 2024
117 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Sep 14, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JIT: SVE Assertion failed 'targetReg != embMaskOp2Reg' during 'Generate code'
7 participants