Skip to content

Arm64/Sve: Fix overzealous assert for embedded mask intrinsics with RMW semantics #105541

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Aug 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions src/coreclr/jit/hwintrinsiccodegenarm64.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -522,7 +522,7 @@ void CodeGen::genHWIntrinsic(GenTreeHWIntrinsic* node)
// If `falseReg` is zero, then move the first operand of `intrinEmbMask` in the
// destination using /Z.

assert(targetReg != embMaskOp2Reg);
assert((targetReg != embMaskOp2Reg) || (embMaskOp1Reg == embMaskOp2Reg));
assert(intrin.op3->isContained() || !intrin.op1->IsMaskAllBitsSet());
GetEmitter()->emitInsSve_R_R_R(INS_sve_movprfx, emitSize, targetReg, maskReg, embMaskOp1Reg, opt);
}
Expand Down Expand Up @@ -806,7 +806,6 @@ void CodeGen::genHWIntrinsic(GenTreeHWIntrinsic* node)

if (HWIntrinsicInfo::IsFmaIntrinsic(intrinEmbMask.id))
{
assert(falseReg != embMaskOp3Reg);
// For FMA, the operation we are trying to perform is:
// result = op1 + (op2 * op3)
//
Expand Down
75 changes: 53 additions & 22 deletions src/tests/JIT/Regression/JitBlue/Runtime_105474/Runtime_105474.cs
Original file line number Diff line number Diff line change
Expand Up @@ -3,42 +3,73 @@

using System;
using System.Runtime.CompilerServices;
using System.Numerics;
using System.Runtime.Intrinsics;
using System.Runtime.Intrinsics.Arm;
using System.Runtime.Intrinsics.X86;
using Xunit;

#nullable disable

public class Runtime_105474_A
public class Runtime_105474
{
private void Method0()
private static Vector<double> s_3;

[Fact]
public static void TestEntryPoint()
{
Vector128<ulong> vr0 = Vector128.CreateScalar(1698800584428641629UL);
AdvSimd.ShiftLeftLogicalSaturate(vr0, 229);
if (Sve.IsSupported)
{
TestMethod1();
TestMethod2(Vector<double>.Zero);
TestMethod3(Vector<double>.Zero);
TestMethod4(Vector<double>.Zero);
TestMethod5(Vector<double>.Zero);
TestMethod6(Vector<double>.Zero);
}
}

private void Method1()
[method: MethodImpl(MethodImplOptions.NoInlining)]
private static void TestMethod1()
{
Vector128<float> vr1 = default;
Avx.Compare(vr1, vr1, (FloatComparisonMode)255);
// Expected codegen: fmad z17.d, p0/m, z17.d, z16.d
var vr1 = Vector128.CreateScalar((double)10).AsVector();
s_3 = Sve.FusedMultiplyAdd(vr1, s_3, s_3);
}

[Fact]
public static void TestEntryPointArm()
[method: MethodImpl(MethodImplOptions.NoInlining)]
private static void TestMethod2(Vector<double> mask)
{
if (AdvSimd.IsSupported)
{
Assert.Throws<ArgumentOutOfRangeException>(() => new Runtime_105474_A().Method0());
}
// Expected codegen: fmla z16.d, p0/m, z17.d, z17.d
var vr1 = Vector128.CreateScalar((double)10).AsVector();
s_3 = Sve.ConditionalSelect(mask, Sve.FusedMultiplyAdd(vr1, s_3, s_3), s_3);
Copy link
Member

@tannergooding tannergooding Jul 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this testing what we want?

s_3 is a static field, so I'd expect this is relying extra on CSE and other opts to get the "right shape"

Sve.FusedMultiplyAdd should also be capable of emitting either FMAD (Zdn = Za + Zdn * Zm) or FMLA (Zda = Zda + Zn * Zm), which is to say between these two instructions any operand can be the RMW one, so it might be good to ensure we're covering the range here with some comments or even assertions (explicit validation using the disassembly checking functionality) of what codegen is expected

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hit an assert with this scenario so added it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tannergooding how would we go about explicitly validating the codegen for this? On my machine, the decision to use fmad or fmla for a test changes depending on if we're optimizing or not (though I'm not hitting asserts either way with this change).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By utilizing https://github.com/dotnet/runtime/blob/main/docs/workflow/testing/coreclr/disasm-checks.md

We should be able to add tests that validate each of the operands given Sve.FusedMultiplyAdd(a, b, c) end up as the target. We should then be able to further expand those base tests over ConditionalSelect to ensure the right behavior still occurs.

It might be a release and TieredCompilation=0 only test, but it should be possible to get setup and validated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tannergooding thanks for pointing that out -- I've updated the tests to check for expected codegen.

}

[Fact]
public static void TestEntryPoint()
[method: MethodImpl(MethodImplOptions.NoInlining)]
private static void TestMethod3(Vector<double> mask)
{
if (Avx.IsSupported)
{
Assert.Throws<ArgumentOutOfRangeException>(() => new Runtime_105474_A().Method1());
}
// Expected codegen: fmad z16.d, p0/m, z16.d, z16.d
s_3 = Sve.ConditionalSelect(mask, Sve.FusedMultiplyAdd(s_3, s_3, s_3), s_3);
}

[method: MethodImpl(MethodImplOptions.NoInlining)]
private static void TestMethod4(Vector<double> mask)
{
// Expected codegen: fmad z16.d, p0/m, z17.d, z16.d
var vr1 = Vector128.CreateScalar((double)10).AsVector();
s_3 = Sve.ConditionalSelect(mask, Sve.FusedMultiplyAdd(s_3, vr1, s_3), s_3);
}

[method: MethodImpl(MethodImplOptions.NoInlining)]
private static void TestMethod5(Vector<double> mask)
{
// Expected codegen: fmad z16.d, p0/m, z16.d, z17.d
var vr1 = Vector128.CreateScalar((double)10).AsVector();
s_3 = Sve.ConditionalSelect(mask, Sve.FusedMultiplyAdd(s_3, vr1, vr1), s_3);
}

[method: MethodImpl(MethodImplOptions.NoInlining)]
private static void TestMethod6(Vector<double> mask)
{
// Expected codegen: fmad z16.d, p0/m, z16.d, z16.d
var vr1 = Vector128.CreateScalar((double)10).AsVector();
s_3 = Sve.ConditionalSelect(mask, Sve.FusedMultiplyAdd(vr1, vr1, vr1), s_3);
}
}
Original file line number Diff line number Diff line change
@@ -1,8 +1,14 @@
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<!-- Needed for CLRTestEnvironmentVariable -->
<RequiresProcessIsolation>true</RequiresProcessIsolation>
<DebugType>None</DebugType>
<Optimize>True</Optimize>
<NoWarn>$(NoWarn);SYSLIB5003</NoWarn>
</PropertyGroup>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add <CLRTestTargetUnsupported Condition="'$(TargetArchitecture)' != 'arm64'">true</CLRTestTargetUnsupported> to avoid building this test unnecessarily?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but there are lot of tests in this folder that do not have this. So would rather skip it and do it together for all the tests.

<ItemGroup>
<Compile Include="$(MSBuildProjectName).cs" />
<CLRTestEnvironmentVariable Include="DOTNET_TieredCompilation" Value="0" />
<CLRTestEnvironmentVariable Include="DOTNET_JITMinOpts" Value="0" />
</ItemGroup>
</Project>
44 changes: 44 additions & 0 deletions src/tests/JIT/Regression/JitBlue/Runtime_105479/Runtime_105479.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.

using System;
using System.Runtime.CompilerServices;
using System.Runtime.Intrinsics;
using System.Runtime.Intrinsics.Arm;
using System.Runtime.Intrinsics.X86;
using Xunit;

#nullable disable

public class Runtime_105479_A
{
private void Method0()
{
Vector128<ulong> vr0 = Vector128.CreateScalar(1698800584428641629UL);
AdvSimd.ShiftLeftLogicalSaturate(vr0, 229);
}

private void Method1()
{
Vector128<float> vr1 = default;
Avx.Compare(vr1, vr1, (FloatComparisonMode)255);
}

[Fact]
public static void TestEntryPointArm()
{
if (AdvSimd.IsSupported)
{
Assert.Throws<ArgumentOutOfRangeException>(() => new Runtime_105479_A().Method0());
}
}

[Fact]
public static void TestEntryPoint()
{
if (Avx.IsSupported)
{
Assert.Throws<ArgumentOutOfRangeException>(() => new Runtime_105479_A().Method1());
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<Optimize>True</Optimize>
</PropertyGroup>
<ItemGroup>
<Compile Include="$(MSBuildProjectName).cs" />
</ItemGroup>
</Project>
Loading