Skip to content

Add CPUID for AvxVnniInt8 and AvxVnniInt16 #113956

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 27 commits into
base: main
Choose a base branch
from

Conversation

khushal1996
Copy link
Member

@khushal1996 khushal1996 commented Mar 27, 2025

This PR adds support for CPUID for AVX-VNNI-INT8 & AVX-VNNI-INT16 ISAs

Design

image
image

The changes are made in a way to enable the 2 ISAs when

  1. Avx10.2 is enabled or
  2. CPUID for both ISAs are enabled

This is w.r.t the discussions done in API proposal #112586

Testing

Note1: Emitter unit tests not ran since they are added and verified along with AVX10.2 PR #111209

Note2: Superpmi results are not accurate since we are adding a new CPUID and it leads to a new jiteeversionguid. Even after changing the jiteeversion manually, superpmi run shows errors and failures based on the old mch files which can be ignored.

Run JIT subtree with AVXVNNIINT* enabled / disabled


AVXVNNIINT* Enabled
image

AVXVNNIINT* disabled
image

@khushal1996
Copy link
Member Author

@tannergooding This is first of the 2 PRs needed for AVX VNNI INT* API introduction #112586

@khushal1996 khushal1996 force-pushed the kcm-avxvnniint8-cpuid branch from 141d643 to 98fc970 Compare April 14, 2025 21:40
@khushal1996
Copy link
Member Author

@tannergooding @saucecontrol I have added the CPUID, API surface, JIT handling and template tests here.

@tannergooding tannergooding self-requested a review April 14, 2025 21:44
@tannergooding tannergooding self-assigned this Apr 14, 2025
@khushal1996
Copy link
Member Author

@tannergooding this PR is in good shape now. CI failures look unrelated. Can you help review this PR?

Co-authored-by: Tanner Gooding <tagoo@outlook.com>
Comment on lines +103 to +111
bool emitter::IsAVXVNNIINT8Instruction(instruction ins)
{
return (ins >= INS_FIRST_AVXVNNIINT8_INSTRUCTION) && (ins <= INS_LAST_AVXVNNIINT8_INSTRUCTION);
}

bool emitter::IsAVXVNNIINT16Instruction(instruction ins)
{
return (ins >= INS_FIRST_AVXVNNIINT16_INSTRUCTION) && (ins <= INS_LAST_AVXVNNIINT16_INSTRUCTION);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the likelihood that these ISAs are provided separately in the real world? Is it something we want to support in the JIT today?

With the simplification I'm doing in: #115983, I'd basically like to consider places that are unlikely to be impactful to real world scenarios where we can simplify the JIT and overall support.


For example, 115983 is collapsing X86BASE+SSE+SSE2 into just X86BASE because they are required to be provided together and form the baseline. It is collapsing AVX512F+BW+CD+DQ+VL into simply AVX512 for similar reasons.

I expect we could also collapse AVX2+FMA+BMI1+BMI2 into a similar joined set since no hardware has ever existed that provided any of these independently and allowing individual light-up isn't meaningful for the JIT. Allowing for AVX to be standalone from the rest is beneficial, however.

I would expect that allowing AVXVNNI to be standalone is similarly beneficial. However, I'd expect that in the practical AVXVNNIINT8/16 will always be provided together and we might be able to represent these in the JIT as a single ISA.

It's also unclear given the change in unification strategy for Avx10 and it now requiring V512 support if we will ever actually see hardware that doesn't provide them together. -- We do want the managed API surface to be correct, we still want AvxVnniInt8 and AvxVnniInt16 to exist and expose the actual CPUID bits. These questions are more about if we can simplify the JIT support, the R2R and NAOT checks, etc to be a little more practical given expected real world usecases.

Comment on lines +343 to +345
// Evex versions of AvxVnniInt8 and AvxVnniInt16 will be supported
// with Avx10.2 ISA.
return emitComp->compOpportunisticallyDependsOn(InstructionSet_AVX10v2);
Copy link
Member

@tannergooding tannergooding May 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct?

Will there be hardware where EVEX is supported and AvxVnniInt8/16 are supported, but EVEX encoding of AvxVnniInt8/16 is not supported?

I'd expect that the nuance is much like Gfni or Vpclmul where if EVEX is supported, then the EVEX encodings of all EVEX encodable instructions must be supported

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was kept due to the fact that in the PRE-AVX10.2 hardware, we only had VEX support for AVXVNNIINT8/16. If we look at the manual, AVXVNNIINT8/16 only have VEX whereas AVX10.2 introduces same instructions with EVEX support. Hence, here, when we are enabling AVXVNNIINT8/16, we should expect that in PRE-AVX10.2 hardware, we will see only VEX support.

AVXVNNIINT16
image
image

AVX10.2
image
image

CPUID
image

I will be adding CPUID detection for AVX10_VNNI_INT to enable the same ISAs. But at this point, I think it is better to have both VEX and EVEX support.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was kept due to the fact that in the PRE-AVX10.2 hardware, we only had VEX support for AVXVNNIINT8/16. If we look at the manual, AVXVNNIINT8/16 only have VEX whereas AVX10.2 introduces same instructions with EVEX support. Hence, here, when we are enabling AVXVNNIINT8/16, we should expect that in PRE-AVX10.2 hardware, we will see only VEX support.

Right, but what I'm interested in is if we expect any hardware where EVEX is supported and AVXVNNIINT8/16 is supported, but EVEX encoded AVXVNNIINT8/16 is not supported.

If we expect this to be non-existent, but technically allowed, then I think we can just configure the cpufeature checks to account for it and report the feature as unavailable in the unlikely scenario that occurs. This would allow us to simplify the JIT support while still supporting the primary real world scenarios.

If we expect it to actually occur in the wild, particularly if we know of real CPUs that will be in such a setup, then the more verbose handling is justified.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So as per the manual, we can have EVEX but no support for EVEX encoded AVXVNNIINT8/16. But I will let you know once I confirm that real world scenarios we can have in hardware.

Copy link
Member Author

@khushal1996 khushal1996 May 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tannergooding Ohkay. So we know the following things so far,

  • AVXVNNIINT8/16 cannot be merged because they are present on different machines.
    image

  • There will not be a case where EVEX is supported but AVXVNNIINT8/16 are present (Prior to Avx10.2)

  • With AVX10.2, we will have a new CPUID AVX10_VNNI_INT which can enable these ISAs. So with AVX10.2, we will have EVEX versions of these ISAs supported.

This means that we cannot combine the checks for these ISAs into a single ISA since there VEX versions prior to AVX10.2 can be available independently.

With AVX10.2 we would enable them with AVX10_VNNI_INT CPUID bit and handle EVEX support as it is configured right now in HasEvexEncoding

Let me know if this makes sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tannergooding after our discussion last week, what do you think makes more sense?

  1. Have 2 separate ISAs, one for VEX(256) and one for EVEX(512)
  2. Have a single ISA but only have either Vex support or EVEX support

I am okay with both. Let me know what makes more sense considering the combinations we can have.

CPUID Avx_Vnni_INT8 (VEX)
CPUID AVX10.2 (EVEX)
CPUID AVX10_VNNI_INT (EVEX)

All of them can be independently available and hence I would like to go with option 1 above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to do it like AVX-VNNI vs AVX512-VNNI

So, I think we should be able to have AVX-VNNI-INT and AVX10.2. We can potentially expand this in the future if required, but this gives us support for both VEX and EVEX scenarios without requiring us to use and track 4 separate CPUID bits.

That should allow core client and server scenarios to work. -- CC. @anthonycanino does this match what we had discussed in the last sync? Want to make sure I've not missed one of the scenarios that was indicated as important.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to do it like AVX-VNNI vs AVX512-VNNI

So, I think we should be able to have AVX-VNNI-INT and AVX10.2. We can potentially expand this in the future if required, but this gives us support for both VEX and EVEX scenarios without requiring us to use and track 4 separate CPUID bits.

That should allow core client and server scenarios to work. -- CC. @anthonycanino does this match what we had discussed in the last sync? Want to make sure I've not missed one of the scenarios that was indicated as important.

@tannergooding can you explain what do you mean by
"We should be able to have AVX-VNNI-INT and AVX10.2"? So are we not splitting the V256 and V512 scenarios into different ISAs? Can you clarify that what ISAs we will have in JITwith what you are saying? And I a, assuming that the API surface will remain the same.

Copy link
Member

@tannergooding tannergooding Jun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The managed API surface should remain the same, where we have AvxVnniInt8, AvxVnniInt8.V512, AvxVnniInt16, and AvxVnniInt16.V512.

For the JIT side, I believe we can simplify this down to two tracked instruction sets:

instructionset     ,X86   ,AvxVnniInt8          ,                   ,60 ,AVXVNNIINT            ,avxvnniint
instructionset     ,X86   ,AvxVnniInt8_V512     ,                   ,61 ,AVXVNNIINT_V512       ,avxvnniint_v512
instructionset     ,X86   ,AvxVnniInt16         ,                   ,62 ,AVXVNNIINT            ,avxvnniint
instructionset     ,X86   ,AvxVnniInt16_V512    ,                   ,63 ,AVXVNNIINT_V512       ,avxvnniint_v512

This would then have the following implications:

implication        ,X86   ,AVXVNNIINT           ,AVX2
implication        ,X86   ,AVXVNNIINT_V512      ,AVXVNNIINT
implication        ,X86   ,AVXVNNIINT_V512      ,AVX10v1
...
implication        ,X86   ,AVX10v2              ,AVXVNNIINT_V512

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I will make the required changes and update the PR.

@teo-tsirpanis teo-tsirpanis added area-System.Runtime.Intrinsics and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels May 31, 2025
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Runtime.Intrinsics community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants