Skip to content

Rework how EVEX+AVX512+AVX10 are supported by the JIT to greatly simplify things #115983

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
May 31, 2025

Conversation

tannergooding
Copy link
Member

@tannergooding tannergooding commented May 26, 2025

This is relying on the fact that we require AVX512 F+BW+CD+DQ+VL to all be provided simultaneously and that with the spec changes from Intel, AVX10 is now a strict superset of AVX512 and requires V512 support to be provided.

This namely allows us to reduce the number of checks we have in the JIT, reduce the number of named intrinsic table definitions required, and should provide a small throughput improvement.

A similar simplification was done for the xarch baseline ISAs (combining X86Base, SSE, and SSE2).

@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 26, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@tannergooding tannergooding force-pushed the simplify-isa-checks branch from e327263 to c50d6cd Compare May 27, 2025 16:10
@tannergooding tannergooding force-pushed the simplify-isa-checks branch from c50d6cd to 64eb932 Compare May 27, 2025 18:08
@tannergooding tannergooding marked this pull request as ready for review May 27, 2025 18:14
@Copilot Copilot AI review requested due to automatic review settings May 27, 2025 18:14
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR simplifies the way the JIT handles EVEX, AVX512, and AVX10 by consolidating multiple legacy ISA flags and intrinsic names into unified AVX512 support across the codebase. Key changes include:

  • Removing separate AVX512F, AVX512BW, AVX512CD, AVX512DQ, and related _VL configurations in favor of a single unified “EnableAVX512” flag.
  • Updating intrinsic names (e.g. NI_AVX512F_… → NI_AVX512_…) and associated ISA checks in various JIT source files.
  • Revising opportunistic dependency checks from AVX10 to AVX512 and updating related assertions and comments.

Reviewed Changes

Copilot reviewed 48 out of 48 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/coreclr/jit/jitconfigvalues.h Consolidates legacy AVX512 configuration flags into a unified flag.
src/coreclr/jit/instr.cpp Renames intrinsic case labels from AVX512F to AVX512.
src/coreclr/jit/importercalls.cpp Updates ISA dependency checks; replaces AVX10 references with AVX512.
src/coreclr/jit/hwintrinsiccodegenxarch.cpp Revises intrinsic assertions to use new AVX512 naming.
src/coreclr/jit/gentree.h Changes EVEX-based mask conversion intrinsics to AVX512 ones.
Comments suppressed due to low confidence (5)

src/coreclr/jit/instr.cpp:931

  • [nitpick] Renaming from NI_AVX512F_BroadcastScalarToVector512 to NI_AVX512_BroadcastScalarToVector512 improves naming consistency; ensure that all related intrinsic uses follow this unified naming convention.
case NI_AVX512_BroadcastScalarToVector512:

src/coreclr/jit/hwintrinsiccodegenxarch.cpp:422

  • Updating the assertion from NI_EVEX_BlendVariableMask to NI_AVX512_BlendVariableMask ensures consistent naming. Confirm that all similar intrinsic assertions have been updated accordingly.
assert(intrinsicId == NI_AVX512_BlendVariableMask);

src/coreclr/jit/gentree.h:6467

  • [nitpick] Replacing NI_EVEX_ConvertMaskToVector with NI_AVX512_ConvertMaskToVector aligns with the unified intrinsic naming. Please verify that downstream code correctly handles this change.
return OperIsHWIntrinsic(NI_AVX512_ConvertMaskToVector);

src/coreclr/jit/jitconfigvalues.h:402

  • [nitpick] The consolidation of individual AVX512 intrinsics flags into a single 'EnableAVX512' configuration simplifies the code. Please verify that all documentation and external references are updated to reflect this unified approach.
RELEASE_CONFIG_INTEGER(EnableAVX512,                "EnableAVX512",              1) // Allows AVX512+ hardware intrinsics to be disabled

src/coreclr/jit/importercalls.cpp:10082

  • Replacing opportunistic checks from AVX10 to a unified AVX512 simplifies the ISA dependency logic. Verify that this change correctly reflects the hardware capabilities and does not impact expected behavior.
if (compOpportunisticallyDependsOn(InstructionSet_AVX512))

@tannergooding
Copy link
Member Author

tannergooding commented May 27, 2025

CC. @dotnet/jit-contrib, @dotnet/intel

With the change that AVX10 now requires V512 support to be provided, we no longer need to have as many complex code paths in the JIT trying to handle AVX512.VL vs AVX10/256 support and no longer need a synthetic EVEX ISA.

This PR simplifies all of this to just have an AVX512 instruction set which comprises the AVX512F+BW+CD+DQ+VL ISAs, as they are provided together or not at all and thus allows significantly fewer checks. It additionally does something similar for SSE+SSE2, which are likewise provided together and which had a non-trivial number of code paths trying to support them being independent.

I did keep the various older config knobs and flags for backwards compatibility, but just maps them to the newer centralized disablement. There's likely more cleanup and simplification possible, but this represents the bulk of the work to get things normalized and cleaned up without doing a full audit of the code. Removed after it was suggested to be fine by Jan.

@tannergooding
Copy link
Member Author

@saucecontrol worth noting this doesn't entirely replace #115592 quite yet.

Once this goes in, we'd want to do a much smaller change to allow the baseline ISA usage in all phases of the JIT and only block the user calls to Isa.Method(...) if the relevant config knob is disabled

@tannergooding
Copy link
Member Author

@MihuBot

Comment on lines -125 to -127
<TestEnvironment Include="jitstress_isas_x86_nobmi1" EnableBMI1="0" /> <!-- No dependencies -->
<TestEnvironment Include="jitstress_isas_x86_nobmi2" EnableBMI2="0" /> <!-- No dependencies -->
<TestEnvironment Include="jitstress_isas_x86_nofma" EnableFMA="0" /> <!-- Depends on AVX -->
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of these ISAs aren't meaningful to test anymore, especially since we aren't doing bringup of the ISAs.

For the most part, we just care about the unique ISA paths used in corelib or the JIT, which is primarily the SSE and AVX families that are already covering cases like FMA or BMI being disabled.

I think we can even get it so that we don't need to test nossse3, nosse41, and nosse42 as well; but we'd need to do a bit of cleanup in the JIT to achieve that.

-- I'd like to get our support matrix down to effectively:

  • x86-64-v1: This is the baseline and is currently targeted by NAOT
  • x86-64-v2: This is everything up through SSE4.2+POPCNT
  • x86-64-v2 + AVX: This tests the VEX encoding and is known to have a decent user share
  • x86-64-v3: This is AVX2+FMA+BMI1+BMI2
  • x86-64-v4: This is AVX512 and covers the EVEX encoding

For x86-64-v2:

  • this is what is supported by the x64 on Arm emulation provided by Apple and Windows
  • it would be nice if NAOT could target this as the default, as its an 18 year old baseline
  • this is the target required by Win11 24H2 and boot is blocked if the ISAs aren't available
  • Win11 prior to 24H2 is documented as requiring SSE4.1 otherwise

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice if NAOT could target this as the default

I do not think we want to have the baseline supported piecemeal (it is unlikely we would be able to it correctly since it is impossible to test). If we want to raise the baseline, we should do it for the whole product. I would not be opposed to raising the product baseline to x86-64-v2.

its an 18 year old baseline

When did Intel/Amd stop selling the last processor without x86-64-v2? It is the more interesting date for this discussion.

Win11 prior to 24H2 is documented as requiring SSE4.1 otherwise

Where is it documented?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we want to have the baseline supported piecemeal (it is unlikely we would be able to it correctly since it is impossible to test). If we want to raise the baseline, we should do it for the whole product. I would not be opposed to raising the product baseline to x86-64-v2.

👍. I think it would provide a nice simplification in the JIT and libraries for what code paths we need to support. I don't want to raise the bar "too high" and negatively impact a significant number of customers, but I do think that x86-64-v2 and armv8.1-a are reasonable baselines at this point.

Where is it documented?

In https://learn.microsoft.com/en-us/windows-hardware/design/minimum/minimum-hardware-requirements-overview, specifically under the Windows 11 document (last updated 2021)

image

We got the notification that SSE4.2+POPCNT in 24H2+ when we had last reached out about Arm RDM support; although there's also been some news about the boot blocker that exists that went around last year as well.

Azure, AWS, Google Cloud, and other major cloud providers that specify what hardware they provide are entirely on x86-64-v3 or later. There are also other 3rd party numbers out there, such as the Steam Hardware Survey, which shows 99.78% of reporting users have x86-64-v2 (while 97.31% have AVX and 94.66% have x86-64-v3).

When did Intel/Amd stop selling the last processor without x86-64-v2? It is the more interesting date for this discussion.

For Intel the last CPUs pre x86-64-v2 were discontinued in 2013, around the time x86-64-v3 was launched. This would have been the Bonnell microarchitecture (part of the Intel Atom lineup for low end machines). The Conroe/Merom and Penryn/Wolfdale chipsets had been discontinued in 2011-2012. AMD discontinued their 10h and Bobcat lineups in 2012-2013 as well.

So if we did try to raise the baseline in .NET 11, it should only be for 12-18 year old computers which are no longer supported by the CPU manufacturers, by MacOS, or by current Windows.

@jkotas
Copy link
Member

jkotas commented May 28, 2025

Ensure the ARM64IntrinsicConstants continue matching the ARM64_*_FEATURE_FLAG_BIT

Would it better to update the FEATURE_FLAG_BIT value to avoid odd hole in the values?

@tannergooding tannergooding force-pushed the simplify-isa-checks branch from 0bb29fc to ec9d4fb Compare May 28, 2025 18:43
@tannergooding
Copy link
Member Author

Would it better to update the FEATURE_FLAG_BIT value to avoid odd hole in the values?

Ah, I mistook this to be one of the Linux defined flags and should've looked deeper. I've fixed it and the corresponding definition to remove the gap.

@EgorBo
Copy link
Member

EgorBo commented May 29, 2025

/azp run runtime-coreclr jitstress-isas-x86, Fuzzlyn

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@tannergooding
Copy link
Member Author

Fuzzlyn failure is tracked by #116125, it reproduces in main

JitStress ISAs failures are #116060 and #110173

@tannergooding
Copy link
Member Author

@jkotas, any other feedback on the VM/R2R changes, or do they look good now?

@@ -35,7 +35,7 @@ public static InstructionSetSupport ConfigureInstructionSetSupport(string instru
}
else
{
instructionSetSupportBuilder.AddSupportedInstructionSet("neon"); // Lower baselines included by implication
instructionSetSupportBuilder.AddSupportedInstructionSet("neon");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be "base"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not while we still have a separate InstructionSet_AdvSimd due to how the instruction set tracking works.

We could squash AdvSimd and Arm64Base together but there's also some complexities in doing so since Arm64 uses the same instructions and names (such as clz and LeadingZeroCount) across both SIMD and Scalar, so we'd need to add extra handling to differentiate the two. Also AdvSimd is fairly "self contained" and so you typically get all base types supported at once, which avoids the problem we had with SSE vs SSE2 or AVX vs AVX2 and needing to do a lot of redundant checks that weren't applicable to real world apps

@jkotas
Copy link
Member

jkotas commented May 30, 2025

VM/AOT changes LGTM. Nice simplification!

@tannergooding tannergooding merged commit c048ff0 into dotnet:main May 31, 2025
175 of 183 checks passed
@tannergooding tannergooding deleted the simplify-isa-checks branch May 31, 2025 01:35
@filipnavara
Copy link
Member

Congratulations on landing this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants