-
Notifications
You must be signed in to change notification settings - Fork 5k
Add CPUID for AvxVnniInt8 and AvxVnniInt16 #113956
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
6714249
to
141d643
Compare
@tannergooding This is first of the 2 PRs needed for AVX VNNI INT* API introduction #112586 |
src/coreclr/tools/Common/JitInterface/ThunkGenerator/InstructionSetDesc.txt
Show resolved
Hide resolved
141d643
to
98fc970
Compare
@tannergooding @saucecontrol I have added the CPUID, API surface, JIT handling and template tests here. |
e6cf454
to
90fa072
Compare
@tannergooding this PR is in good shape now. CI failures look unrelated. Can you help review this PR? |
Co-authored-by: Tanner Gooding <tagoo@outlook.com>
bool emitter::IsAVXVNNIINT8Instruction(instruction ins) | ||
{ | ||
return (ins >= INS_FIRST_AVXVNNIINT8_INSTRUCTION) && (ins <= INS_LAST_AVXVNNIINT8_INSTRUCTION); | ||
} | ||
|
||
bool emitter::IsAVXVNNIINT16Instruction(instruction ins) | ||
{ | ||
return (ins >= INS_FIRST_AVXVNNIINT16_INSTRUCTION) && (ins <= INS_LAST_AVXVNNIINT16_INSTRUCTION); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the likelihood that these ISAs are provided separately in the real world? Is it something we want to support in the JIT today?
With the simplification I'm doing in: #115983, I'd basically like to consider places that are unlikely to be impactful to real world scenarios where we can simplify the JIT and overall support.
For example, 115983 is collapsing X86BASE+SSE+SSE2
into just X86BASE
because they are required to be provided together and form the baseline. It is collapsing AVX512F+BW+CD+DQ+VL
into simply AVX512
for similar reasons.
I expect we could also collapse AVX2+FMA+BMI1+BMI2
into a similar joined set since no hardware has ever existed that provided any of these independently and allowing individual light-up isn't meaningful for the JIT. Allowing for AVX
to be standalone from the rest is beneficial, however.
I would expect that allowing AVXVNNI
to be standalone is similarly beneficial. However, I'd expect that in the practical AVXVNNIINT8/16
will always be provided together and we might be able to represent these in the JIT as a single ISA.
It's also unclear given the change in unification strategy for Avx10 and it now requiring V512 support if we will ever actually see hardware that doesn't provide them together. -- We do want the managed API surface to be correct, we still want AvxVnniInt8 and AvxVnniInt16 to exist and expose the actual CPUID bits. These questions are more about if we can simplify the JIT support, the R2R and NAOT checks, etc to be a little more practical given expected real world usecases.
// Evex versions of AvxVnniInt8 and AvxVnniInt16 will be supported | ||
// with Avx10.2 ISA. | ||
return emitComp->compOpportunisticallyDependsOn(InstructionSet_AVX10v2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct?
Will there be hardware where EVEX is supported and AvxVnniInt8/16 are supported, but EVEX encoding of AvxVnniInt8/16 is not supported?
I'd expect that the nuance is much like Gfni
or Vpclmul
where if EVEX
is supported, then the EVEX encodings of all EVEX encodable instructions must be supported
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was kept due to the fact that in the PRE-AVX10.2 hardware, we only had VEX
support for AVXVNNIINT8/16
. If we look at the manual, AVXVNNIINT8/16
only have VEX
whereas AVX10.2
introduces same instructions with EVEX
support. Hence, here, when we are enabling AVXVNNIINT8/16
, we should expect that in PRE-AVX10.2 hardware, we will see only VEX
support.
I will be adding CPUID detection for AVX10_VNNI_INT
to enable the same ISAs. But at this point, I think it is better to have both VEX
and EVEX
support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was kept due to the fact that in the PRE-AVX10.2 hardware, we only had VEX support for AVXVNNIINT8/16. If we look at the manual, AVXVNNIINT8/16 only have VEX whereas AVX10.2 introduces same instructions with EVEX support. Hence, here, when we are enabling AVXVNNIINT8/16, we should expect that in PRE-AVX10.2 hardware, we will see only VEX support.
Right, but what I'm interested in is if we expect any hardware where EVEX
is supported and AVXVNNIINT8/16
is supported, but EVEX encoded AVXVNNIINT8/16
is not supported.
If we expect this to be non-existent, but technically allowed, then I think we can just configure the cpufeature checks to account for it and report the feature as unavailable in the unlikely scenario that occurs. This would allow us to simplify the JIT support while still supporting the primary real world scenarios.
If we expect it to actually occur in the wild, particularly if we know of real CPUs that will be in such a setup, then the more verbose handling is justified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So as per the manual, we can have EVEX
but no support for EVEX
encoded AVXVNNIINT8/16
. But I will let you know once I confirm that real world scenarios we can have in hardware.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tannergooding Ohkay. So we know the following things so far,
-
AVXVNNIINT8/16
cannot be merged because they are present on different machines.
-
There will not be a case where
EVEX
is supported butAVXVNNIINT8/16
are present (Prior toAvx10.2
) -
With
AVX10.2
, we will have a new CPUIDAVX10_VNNI_INT
which can enable these ISAs. So withAVX10.2
, we will haveEVEX
versions of these ISAs supported.
This means that we cannot combine the checks for these ISAs into a single ISA since there VEX
versions prior to AVX10.2
can be available independently.
With AVX10.2
we would enable them with AVX10_VNNI_INT
CPUID bit and handle EVEX
support as it is configured right now in HasEvexEncoding
Let me know if this makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tannergooding after our discussion last week, what do you think makes more sense?
- Have 2 separate ISAs, one for VEX(256) and one for EVEX(512)
- Have a single ISA but only have either Vex support or EVEX support
I am okay with both. Let me know what makes more sense considering the combinations we can have.
CPUID Avx_Vnni_INT8 (VEX)
CPUID AVX10.2 (EVEX)
CPUID AVX10_VNNI_INT (EVEX)
All of them can be independently available and hence I would like to go with option 1 above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to do it like AVX-VNNI
vs AVX512-VNNI
So, I think we should be able to have AVX-VNNI-INT
and AVX10.2
. We can potentially expand this in the future if required, but this gives us support for both VEX and EVEX scenarios without requiring us to use and track 4 separate CPUID bits.
That should allow core client and server scenarios to work. -- CC. @anthonycanino does this match what we had discussed in the last sync? Want to make sure I've not missed one of the scenarios that was indicated as important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to do it like
AVX-VNNI
vsAVX512-VNNI
So, I think we should be able to have
AVX-VNNI-INT
andAVX10.2
. We can potentially expand this in the future if required, but this gives us support for both VEX and EVEX scenarios without requiring us to use and track 4 separate CPUID bits.That should allow core client and server scenarios to work. -- CC. @anthonycanino does this match what we had discussed in the last sync? Want to make sure I've not missed one of the scenarios that was indicated as important.
@tannergooding can you explain what do you mean by
"We should be able to have AVX-VNNI-INT
and AVX10.2
"? So are we not splitting the V256
and V512
scenarios into different ISAs? Can you clarify that what ISAs we will have in JITwith what you are saying? And I a, assuming that the API surface will remain the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The managed API surface should remain the same, where we have AvxVnniInt8
, AvxVnniInt8.V512
, AvxVnniInt16
, and AvxVnniInt16.V512
.
For the JIT side, I believe we can simplify this down to two tracked instruction sets:
instructionset ,X86 ,AvxVnniInt8 , ,60 ,AVXVNNIINT ,avxvnniint
instructionset ,X86 ,AvxVnniInt8_V512 , ,61 ,AVXVNNIINT_V512 ,avxvnniint_v512
instructionset ,X86 ,AvxVnniInt16 , ,62 ,AVXVNNIINT ,avxvnniint
instructionset ,X86 ,AvxVnniInt16_V512 , ,63 ,AVXVNNIINT_V512 ,avxvnniint_v512
This would then have the following implications:
implication ,X86 ,AVXVNNIINT ,AVX2
implication ,X86 ,AVXVNNIINT_V512 ,AVXVNNIINT
implication ,X86 ,AVXVNNIINT_V512 ,AVX10v1
...
implication ,X86 ,AVX10v2 ,AVXVNNIINT_V512
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I will make the required changes and update the PR.
Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics |
This PR adds support for CPUID for
AVX-VNNI-INT8
&AVX-VNNI-INT16
ISAsDesign
The changes are made in a way to enable the 2 ISAs when
Avx10.2
is enabled orThis is w.r.t the discussions done in API proposal #112586
Testing
Note1: Emitter unit tests not ran since they are added and verified along with AVX10.2 PR #111209
Note2: Superpmi results are not accurate since we are adding a new CPUID and it leads to a new jiteeversionguid. Even after changing the jiteeversion manually, superpmi run shows errors and failures based on the old mch files which can be ignored.
Run JIT subtree with AVXVNNIINT* enabled / disabled
AVXVNNIINT* Enabled

AVXVNNIINT* disabled
