Fixing the InstructionSetDesc implications #86486

tannergooding · 2023-05-19T05:42:46Z

Extracted from #85551

This primarily handles ensuring the InstructionSetDesc implications are correct. In particular it ensures that Avx512F is dependent on both AVX2 and FMA. It likewise ensures that the nested VL classes are dependent on both the base VL class and the containing class (e.g. Avx512BW.VL is dependent on both Avx512BW and Avx512F.BW).

It also adds a recursive implication that ensures that F + BW + CD + DQ + VL are only ever enabled as a set. This is done to simplify the current implementation of the JIT without tying us to that behavior permanently. It would be non-trivial work and not "pay for play" to allow this to be represented as a single R2R flag instead.

Finally, as part of fixing this there were a couple minor bugs surfaced. In particular, avx-vnni was marked as opportunistic which in turn marked avx2 as opportunistic. This is problematic if the user decides to target avx but not avx2 and was causing Vector<T> and Vector256<T> to behave incorrectly. This was handled by ensuring avx-vnni is only marked as opportunistic if avx2 is explicitly supported and ensuring that the NI_IsSupported_Dynamic is only returned for opportunistic APIs.

Tests were added covering NAOT, CG2, and JIT validating that the full set of currently exposed ISAs behave correctly. This includes when users explicitly opt into or out-of a given ISA.

tannergooding · 2023-05-19T05:43:33Z

CC. @jkotas, @davidwrighton

This is the last bit to break apart from #85551 and will leave that as just the Vector<T> changes.

src/coreclr/tools/Common/InstructionSetHelpers.cs

src/coreclr/vm/codeman.cpp

src/tests/nativeaot/SmokeTests/HardwareIntrinsics/Program.cs

src/coreclr/tools/Common/JitInterface/ThunkGenerator/InstructionSetDesc.txt

jkotas · 2023-05-19T17:21:20Z

src/coreclr/tools/Common/JitInterface/ThunkGenerator/InstructionSetDesc.txt

+implication        ,X86   ,AVX512VBMI_VL        ,AVX512VBMI
 implication        ,X86   ,AVX512VBMI_VL        ,AVX512BW_VL

+; While the AVX-512 ISAs can be individually lit-up, they really


@davidwrighton Could you please review these new implications?

These implications don't entirely make sense to me. What they indicate is that if any of avx512cd, bw, f, bmi, or the vl variants are enabled, then all the various instruction sets will be considered to be enabled. If that's the case, then why do we need so many flags? I don't understand. In addition, if you're making the implications have this circular flow that isn't actually required by the flags the processor specifies, we need to treat them similarly in the avx512 detection logic in codeman.cpp. Thus, we can't enable ANY of the AVX512 support in the runtime unless all of the associated cpu flags are enabled.

What they indicate is that if any of avx512cd, bw, f, bmi, or the vl variants are enabled, then all the various instruction sets will be considered to be enabled

Yes and the comment is meant to explain that, if you have any ideas on how to reword it then please feel free to suggest them.

If that's the case, then why do we need so many flags

We have an actual split in CPUID checks and in other corresponding logic that ties class names to flag names. We may also want to actually allow splitting this in the future, such as if we decide to support Knight's Landing or if some future hardware decides to only support a subset of AVX512.

In addition, if you're making the implications have this circular flow that isn't actually required by the flags the processor specifies, we need to treat them similarly in the avx512 detection logic in codeman.cpp

This is already handled in codeman by virtue of the:

CPUCompileFlags.Set64BitInstructionSetVariants(); CPUCompileFlags.EnsureValidInstructionSetSupport();

If Avx512F was supported but Avx512BW was not, then Avx512F would be turned off by the EnsureValidInstructionSetSupport call.

… work correctly for Vector128/256/512.IsHardwareAccelerated

MichalStrehovsky

The NativeAOT smoke test change looks good. Thanks for extending the coverage!

src/tests/nativeaot/SmokeTests/HardwareIntrinsics/X64Avx512.csproj

ghost · 2023-05-22T21:26:05Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Extracted from #85551

Author:	tannergooding
Assignees:	tannergooding
Labels:	`area-CodeGen-coreclr`, `arch-avx512`, `needs-area-label`
Milestone:	-

davidwrighton

I need a better explanation for what we're doing here. See my comments.

davidwrighton · 2023-05-30T17:04:03Z

src/coreclr/tools/Common/InstructionSetHelpers.cs

-                    optimisticInstructionSetSupportBuilder.AddSupportedInstructionSet("avxvnni");
+                }
+
+                if (supportedInstructionSet.HasInstructionSet(InstructionSet.X64_AVX2))


What is this for?

Avoiding having avx2 be opportunistic because there are some potentially deeper problems and involvement with how it impacts Vector<T>, as you called out on the other PR.

AvxVnni inherits Avx2 and so it being opportunistic means that Avx2 is also opportunistic. This shipped that way in .NET 7 and so its possible there is some R2R issue there.

davidwrighton · 2023-05-30T17:11:54Z

src/coreclr/jit/hwintrinsic.cpp

+        if (isIsaSupported && comp->compSupportsHWIntrinsic(isa))
        {
-            if (comp->compExactlyDependsOn(isa))
+            if (!comp->IsTargetAbi(CORINFO_NATIVEAOT_ABI) || comp->compExactlyDependsOn(isa))


Could you explain what change is happening here? I don't see a description of what we're changing around NativeAOT behavior in the change description.

There was an existing bug here that was surfaced if crossgen/naot targeted avx but not avx2

For most cases, the ISA initially checked and tracked as part of isIsaSupported is the same as what is tracked by the InstructionSetDesc

However, for Vector256 in particular we have the case where the implication is on Avx and we will accelerate some APIs when only Avx is supported. But, we only want IsHardwareAccelerated to report true when Avx2 is also supported.

We were then ending up in a scenario where we'd end up failing to handle IsHardwareAccelerated for the recursive case when avx was supported but avx2 was not because isIsaSupported (AVX) would be true and then we'd fail the compExactlyDependsOn check for AVX2 and then return NI_IsSupported_Dynamic, which was incorrect since avx2 was not opportunistic.

This fixes that so we now ensure that we only go down the true/dynamic path if the compiler could support AVX2 at all.

davidwrighton · 2023-05-30T17:26:41Z

src/coreclr/tools/Common/JitInterface/ThunkGenerator/InstructionSetDesc.txt

+implication        ,X86   ,AVX512VBMI_VL        ,AVX512VBMI
 implication        ,X86   ,AVX512VBMI_VL        ,AVX512BW_VL

+; While the AVX-512 ISAs can be individually lit-up, they really


These implications don't entirely make sense to me. What they indicate is that if any of avx512cd, bw, f, bmi, or the vl variants are enabled, then all the various instruction sets will be considered to be enabled. If that's the case, then why do we need so many flags? I don't understand. In addition, if you're making the implications have this circular flow that isn't actually required by the flags the processor specifies, we need to treat them similarly in the avx512 detection logic in codeman.cpp. Thus, we can't enable ANY of the AVX512 support in the runtime unless all of the associated cpu flags are enabled.

…hierarchy

…ISA opt-in

…ported on MacOS

Fixing the InstructionSetDesc implications

d5e1c77

tannergooding requested a review from MichalStrehovsky as a code owner May 19, 2023 05:42

ghost assigned tannergooding May 19, 2023

ghost added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label May 19, 2023

tannergooding commented May 19, 2023

View reviewed changes

src/coreclr/tools/Common/InstructionSetHelpers.cs Show resolved Hide resolved

tannergooding added 3 commits May 19, 2023 06:59

Merge remote-tracking branch 'dotnet/main' into prefer-vector-width-4

2cf307d

Adding more NAOT smoke tests covering the missed instruction sets

8e518a6

Simplify the HasInstructionSet(Avx512F) check in compSetProcessor

5e4eb47

tannergooding force-pushed the prefer-vector-width-4 branch 2 times, most recently from be5596a to 381f759 Compare May 19, 2023 15:58

Fixing the NAOT smoke tests

89b5aff

tannergooding force-pushed the prefer-vector-width-4 branch from 381f759 to 89b5aff Compare May 19, 2023 16:49