-
Notifications
You must be signed in to change notification settings - Fork 5k
Rework how EVEX+AVX512+AVX10 are supported by the JIT to greatly simplify things #115983
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
bbdb73f
to
917a050
Compare
be3662f
to
e327263
Compare
e327263
to
c50d6cd
Compare
c50d6cd
to
64eb932
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR simplifies the way the JIT handles EVEX, AVX512, and AVX10 by consolidating multiple legacy ISA flags and intrinsic names into unified AVX512 support across the codebase. Key changes include:
- Removing separate AVX512F, AVX512BW, AVX512CD, AVX512DQ, and related _VL configurations in favor of a single unified “EnableAVX512” flag.
- Updating intrinsic names (e.g. NI_AVX512F_… → NI_AVX512_…) and associated ISA checks in various JIT source files.
- Revising opportunistic dependency checks from AVX10 to AVX512 and updating related assertions and comments.
Reviewed Changes
Copilot reviewed 48 out of 48 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
src/coreclr/jit/jitconfigvalues.h | Consolidates legacy AVX512 configuration flags into a unified flag. |
src/coreclr/jit/instr.cpp | Renames intrinsic case labels from AVX512F to AVX512. |
src/coreclr/jit/importercalls.cpp | Updates ISA dependency checks; replaces AVX10 references with AVX512. |
src/coreclr/jit/hwintrinsiccodegenxarch.cpp | Revises intrinsic assertions to use new AVX512 naming. |
src/coreclr/jit/gentree.h | Changes EVEX-based mask conversion intrinsics to AVX512 ones. |
Comments suppressed due to low confidence (5)
src/coreclr/jit/instr.cpp:931
- [nitpick] Renaming from NI_AVX512F_BroadcastScalarToVector512 to NI_AVX512_BroadcastScalarToVector512 improves naming consistency; ensure that all related intrinsic uses follow this unified naming convention.
case NI_AVX512_BroadcastScalarToVector512:
src/coreclr/jit/hwintrinsiccodegenxarch.cpp:422
- Updating the assertion from NI_EVEX_BlendVariableMask to NI_AVX512_BlendVariableMask ensures consistent naming. Confirm that all similar intrinsic assertions have been updated accordingly.
assert(intrinsicId == NI_AVX512_BlendVariableMask);
src/coreclr/jit/gentree.h:6467
- [nitpick] Replacing NI_EVEX_ConvertMaskToVector with NI_AVX512_ConvertMaskToVector aligns with the unified intrinsic naming. Please verify that downstream code correctly handles this change.
return OperIsHWIntrinsic(NI_AVX512_ConvertMaskToVector);
src/coreclr/jit/jitconfigvalues.h:402
- [nitpick] The consolidation of individual AVX512 intrinsics flags into a single 'EnableAVX512' configuration simplifies the code. Please verify that all documentation and external references are updated to reflect this unified approach.
RELEASE_CONFIG_INTEGER(EnableAVX512, "EnableAVX512", 1) // Allows AVX512+ hardware intrinsics to be disabled
src/coreclr/jit/importercalls.cpp:10082
- Replacing opportunistic checks from AVX10 to a unified AVX512 simplifies the ISA dependency logic. Verify that this change correctly reflects the hardware capabilities and does not impact expected behavior.
if (compOpportunisticallyDependsOn(InstructionSet_AVX512))
CC. @dotnet/jit-contrib, @dotnet/intel With the change that AVX10 now requires V512 support to be provided, we no longer need to have as many complex code paths in the JIT trying to handle This PR simplifies all of this to just have an AVX512 instruction set which comprises the AVX512F+BW+CD+DQ+VL ISAs, as they are provided together or not at all and thus allows significantly fewer checks. It additionally does something similar for SSE+SSE2, which are likewise provided together and which had a non-trivial number of code paths trying to support them being independent.
|
@saucecontrol worth noting this doesn't entirely replace #115592 quite yet. Once this goes in, we'd want to do a much smaller change to allow the baseline ISA usage in all phases of the JIT and only block the user calls to |
src/coreclr/tools/Common/JitInterface/ThunkGenerator/InstructionSetDesc.txt
Show resolved
Hide resolved
<TestEnvironment Include="jitstress_isas_x86_nobmi1" EnableBMI1="0" /> <!-- No dependencies --> | ||
<TestEnvironment Include="jitstress_isas_x86_nobmi2" EnableBMI2="0" /> <!-- No dependencies --> | ||
<TestEnvironment Include="jitstress_isas_x86_nofma" EnableFMA="0" /> <!-- Depends on AVX --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of these ISAs aren't meaningful to test anymore, especially since we aren't doing bringup of the ISAs.
For the most part, we just care about the unique ISA paths used in corelib or the JIT, which is primarily the SSE and AVX families that are already covering cases like FMA or BMI being disabled.
I think we can even get it so that we don't need to test nossse3, nosse41, and nosse42
as well; but we'd need to do a bit of cleanup in the JIT to achieve that.
-- I'd like to get our support matrix down to effectively:
- x86-64-v1: This is the baseline and is currently targeted by NAOT
- x86-64-v2: This is everything up through SSE4.2+POPCNT
- x86-64-v2 + AVX: This tests the VEX encoding and is known to have a decent user share
- x86-64-v3: This is AVX2+FMA+BMI1+BMI2
- x86-64-v4: This is AVX512 and covers the EVEX encoding
For x86-64-v2:
- this is what is supported by the x64 on Arm emulation provided by Apple and Windows
- it would be nice if NAOT could target this as the default, as its an 18 year old baseline
- this is the target required by Win11 24H2 and boot is blocked if the ISAs aren't available
- Win11 prior to 24H2 is documented as requiring SSE4.1 otherwise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be nice if NAOT could target this as the default
I do not think we want to have the baseline supported piecemeal (it is unlikely we would be able to it correctly since it is impossible to test). If we want to raise the baseline, we should do it for the whole product. I would not be opposed to raising the product baseline to x86-64-v2.
its an 18 year old baseline
When did Intel/Amd stop selling the last processor without x86-64-v2? It is the more interesting date for this discussion.
Win11 prior to 24H2 is documented as requiring SSE4.1 otherwise
Where is it documented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think we want to have the baseline supported piecemeal (it is unlikely we would be able to it correctly since it is impossible to test). If we want to raise the baseline, we should do it for the whole product. I would not be opposed to raising the product baseline to x86-64-v2.
👍. I think it would provide a nice simplification in the JIT and libraries for what code paths we need to support. I don't want to raise the bar "too high" and negatively impact a significant number of customers, but I do think that x86-64-v2
and armv8.1-a
are reasonable baselines at this point.
Where is it documented?
In https://learn.microsoft.com/en-us/windows-hardware/design/minimum/minimum-hardware-requirements-overview, specifically under the Windows 11 document (last updated 2021)
We got the notification that SSE4.2+POPCNT in 24H2+ when we had last reached out about Arm RDM support; although there's also been some news about the boot blocker that exists that went around last year as well.
Azure, AWS, Google Cloud, and other major cloud providers that specify what hardware they provide are entirely on x86-64-v3 or later. There are also other 3rd party numbers out there, such as the Steam Hardware Survey, which shows 99.78% of reporting users have x86-64-v2 (while 97.31% have AVX and 94.66% have x86-64-v3).
When did Intel/Amd stop selling the last processor without x86-64-v2? It is the more interesting date for this discussion.
For Intel the last CPUs pre x86-64-v2 were discontinued in 2013, around the time x86-64-v3 was launched. This would have been the Bonnell microarchitecture (part of the Intel Atom lineup for low end machines). The Conroe/Merom and Penryn/Wolfdale chipsets had been discontinued in 2011-2012. AMD discontinued their 10h and Bobcat lineups in 2012-2013 as well.
So if we did try to raise the baseline in .NET 11, it should only be for 12-18 year old computers which are no longer supported by the CPU manufacturers, by MacOS, or by current Windows.
Would it better to update the |
0bb29fc
to
ec9d4fb
Compare
Ah, I mistook this to be one of the Linux defined flags and should've looked deeper. I've fixed it and the corresponding definition to remove the gap. |
/azp run runtime-coreclr jitstress-isas-x86, Fuzzlyn |
Azure Pipelines successfully started running 2 pipeline(s). |
@jkotas, any other feedback on the VM/R2R changes, or do they look good now? |
@@ -35,7 +35,7 @@ public static InstructionSetSupport ConfigureInstructionSetSupport(string instru | |||
} | |||
else | |||
{ | |||
instructionSetSupportBuilder.AddSupportedInstructionSet("neon"); // Lower baselines included by implication | |||
instructionSetSupportBuilder.AddSupportedInstructionSet("neon"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be "base"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not while we still have a separate InstructionSet_AdvSimd
due to how the instruction set tracking works.
We could squash AdvSimd
and Arm64Base
together but there's also some complexities in doing so since Arm64 uses the same instructions and names (such as clz
and LeadingZeroCount
) across both SIMD and Scalar, so we'd need to add extra handling to differentiate the two. Also AdvSimd
is fairly "self contained" and so you typically get all base types supported at once, which avoids the problem we had with SSE vs SSE2 or AVX vs AVX2 and needing to do a lot of redundant checks that weren't applicable to real world apps
VM/AOT changes LGTM. Nice simplification! |
Congratulations on landing this! |
This is relying on the fact that we require AVX512 F+BW+CD+DQ+VL to all be provided simultaneously and that with the spec changes from Intel, AVX10 is now a strict superset of AVX512 and requires V512 support to be provided.
This namely allows us to reduce the number of checks we have in the JIT, reduce the number of named intrinsic table definitions required, and should provide a small throughput improvement.
A similar simplification was done for the xarch baseline ISAs (combining
X86Base
,SSE
, andSSE2
).