Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Expose the x86 HWIntrinsics via a set of class hierarchies matching the underlying ISA hierarchies #19186

Merged
merged 1 commit into from
Jul 31, 2018
Merged

Conversation

tannergooding
Copy link
Member

@tannergooding tannergooding commented Jul 28, 2018

This implements https://github.com/dotnet/corefx/issues/29247 -- It was indicated that we would like to move forward with this.

@tannergooding
Copy link
Member Author

@@ -11,10 +11,10 @@ namespace System.Runtime.Intrinsics.X86
/// This class provides access to Intel AES hardware instructions via intrinsics
/// </summary>
[CLSCompliant(false)]
public static class Aes
public abstract class Aes : Sse2
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

12.13.4 Checking for AESNI Support

Before an application attempts to use AESNI instructions or PCLMULQDQ, the application should follow the steps
illustrated in Section 11.6.2, “Checking for SSE/SSE2 Support.” Next, use the additional step provided below:

Check that the processor supports AESNI (if CPUID.01H:ECX.AESNI[bit 25] = 1); check that the processor
supports PCLMULQDQ (if CPUID.01H:ECX.PCLMULQDQ[bit 1] = 1).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed with Intel that there is not a co-dependency between AESNI and PCLMULQDQ.

CC. @fiigii

@@ -11,10 +11,10 @@ namespace System.Runtime.Intrinsics.X86
/// This class provides access to Intel AVX hardware instructions via intrinsics
/// </summary>
[CLSCompliant(false)]
public static class Avx
public abstract class Avx : Sse42
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

14.3 DETECTION OF AVX INSTRUCTIONS

Intel AVX instructions operate on the 256-bit YMM register state. Application detection of new instruction extensions
operating on the YMM state follows the general procedural flow in Figure 14-2.

Prior to using AVX, the application must identify that the operating system supports the XGETBV instruction, the
YMM register state, in addition to processor’s support for YMM state management using XSAVE/XRSTOR and AVX
instructions. The following simplified sequence accomplishes both and is strongly recommended.

  1. Detect CPUID.1:ECX.OSXSAVE[bit 27] = 1 (XGETBV enabled for application use1)
  2. Issue XGETBV and verify that XCR0[2:1] = ‘11b’ (XMM state and YMM state are enabled by OS).
  3. detect CPUID.1:ECX.AVX[bit 28] = 1 (AVX instructions supported).

Copy link
Member Author

@tannergooding tannergooding Jul 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed with Intel that AVX has an implicit dependency on Sse42 (and prior) due to extending the existing instructions.

CC. @fiigii

@@ -11,9 +11,9 @@ namespace System.Runtime.Intrinsics.X86
/// This class provides access to Intel AVX2 hardware instructions via intrinsics
/// </summary>
[CLSCompliant(false)]
public static class Avx2
public abstract class Avx2 : Avx
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

14.7.1 Detection of AVX2

Hardware support for AVX2 is indicated by CPUID.(EAX=07H, ECX=0H):EBX.AVX2[bit 5]=1.

Application Software must identify that hardware supports AVX, after that it must also detect support for AVX2 by
checking CPUID.(EAX=07H, ECX=0H):EBX.AVX2[bit 5].

@@ -455,7 +455,7 @@ public static class Avx2
/// __m128i _mm256_extracti128_si256 (__m256i a, const int imm8)
/// VEXTRACTI128 m128, ymm, imm8
/// </summary>
public static unsafe void ExtractVector128(sbyte* address, Vector256<sbyte> value, byte index) { throw new PlatformNotSupportedException(); }
new public static unsafe void ExtractVector128(sbyte* address, Vector256<sbyte> value, byte index) { throw new PlatformNotSupportedException(); }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We expose both AVX and AVX2 versions of these functions...

The AVX version uses VEXTRACTF128 (@fiigii, does this incur any kind of stall?) while the AVX2 version uses VEXTRACTI128.

We should determine if we want to:

  • Expose as is
  • Expose only on AVX and document that it will use VEXTRACTI128 on AVX2 hardware
  • Don't expose these overloads on AVX, and require users reinterpret-cast to float or double

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this incur any kind of stall?

Yes, but just a little bit. Usually, that is just one cycle.

@@ -11,7 +11,7 @@ namespace System.Runtime.Intrinsics.X86
/// This class provides access to Intel BMI1 hardware instructions via intrinsics
/// </summary>
[CLSCompliant(false)]
public static class Bmi1
public abstract class Bmi1
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5.1.16.1 Detection of VEX-encoded GPR Instructions, LZCNT and TZCNT, PREFETCHW

VEX-encoded general-purpose instructions do not operate on any vector registers.

There are separate feature flags for the following subsets of instructions that operate on general purpose registers,
and the detection requirements for hardware support are:

CPUID.(EAX=07H, ECX=0H):EBX.BMI1[bit 3]: if 1 indicates the processor supports the first group of advanced bit
manipulation extensions (ANDN, BEXTR, BLSI, BLSMSK, BLSR, TZCNT);

CPUID.(EAX=07H, ECX=0H):EBX.BMI2[bit 8]: if 1 indicates the processor supports the second group of advanced
bit manipulation extensions (BZHI, MULX, PDEP, PEXT, RORX, SARX, SHLX, SHRX);

CPUID.EAX=80000001H:ECX.LZCNT[bit 5]: if 1 indicates the processor supports the LZCNT instruction.
CPUID.EAX=80000001H:ECX.PREFTEHCHW[bit 8]: if 1 indicates the processor supports the PREFTEHCHW instruction.

CPUID.(EAX=07H, ECX=0H):ECX.PREFTEHCHWT1[bit 0]: if 1 indicates the processor supports the
PREFTEHCHWT1 instruction.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel is following up to confirm that Bmi1 and Bmi2, despite using the VEX-encoding for several instructions, does not have a dependency on Avx. The Celeron G4920 is one such processor that appears to support BMI1/BMI2, but not AVX.

CC. @fiigii

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still waiting for the hardware to experiment, sorry for the delay.

@@ -11,7 +11,7 @@ namespace System.Runtime.Intrinsics.X86
/// This class provides access to Intel BMI2 hardware instructions via intrinsics
/// </summary>
[CLSCompliant(false)]
public static class Bmi2
public abstract class Bmi2
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5.1.16.1 Detection of VEX-encoded GPR Instructions, LZCNT and TZCNT, PREFETCHW

VEX-encoded general-purpose instructions do not operate on any vector registers.

There are separate feature flags for the following subsets of instructions that operate on general purpose registers,
and the detection requirements for hardware support are:

CPUID.(EAX=07H, ECX=0H):EBX.BMI1[bit 3]: if 1 indicates the processor supports the first group of advanced bit
manipulation extensions (ANDN, BEXTR, BLSI, BLSMSK, BLSR, TZCNT);

CPUID.(EAX=07H, ECX=0H):EBX.BMI2[bit 8]: if 1 indicates the processor supports the second group of advanced
bit manipulation extensions (BZHI, MULX, PDEP, PEXT, RORX, SARX, SHLX, SHRX);

CPUID.EAX=80000001H:ECX.LZCNT[bit 5]: if 1 indicates the processor supports the LZCNT instruction.
CPUID.EAX=80000001H:ECX.PREFTEHCHW[bit 8]: if 1 indicates the processor supports the PREFTEHCHW instruction.

CPUID.(EAX=07H, ECX=0H):ECX.PREFTEHCHWT1[bit 0]: if 1 indicates the processor supports the
PREFTEHCHWT1 instruction.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel is following up to confirm that Bmi1 and Bmi2, despite using the VEX-encoding for several instructions, does not have a dependency on Avx. The Celeron G4920 is one such processor that appears to support BMI1/BMI2, but not AVX.

CC. @fiigii

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite the names, there is no dependency for Bmi2 on Bmi1

@@ -10,9 +10,9 @@ namespace System.Runtime.Intrinsics.X86
/// <summary>
/// This class provides access to Intel FMA hardware instructions via intrinsics
/// </summary>
public static class Fma
public abstract class Fma : Avx
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

14.5.3 Detection of FMA

Hardware support for FMA is indicated by CPUID.1:ECX.FMA[bit 12]=1.

Application Software must identify that hardware supports AVX, after that it must also detect support for FMA by
CPUID.1:ECX.FMA[bit 12].

@@ -11,7 +11,7 @@ namespace System.Runtime.Intrinsics.X86
/// This class provides access to Intel LZCNT hardware instructions via intrinsics
/// </summary>
[CLSCompliant(false)]
public static class Lzcnt
public abstract class Lzcnt
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5.1.16.1 Detection of VEX-encoded GPR Instructions, LZCNT and TZCNT, PREFETCHW

VEX-encoded general-purpose instructions do not operate on any vector registers.

There are separate feature flags for the following subsets of instructions that operate on general purpose registers,
and the detection requirements for hardware support are:

CPUID.(EAX=07H, ECX=0H):EBX.BMI1[bit 3]: if 1 indicates the processor supports the first group of advanced bit
manipulation extensions (ANDN, BEXTR, BLSI, BLSMSK, BLSR, TZCNT);

CPUID.(EAX=07H, ECX=0H):EBX.BMI2[bit 8]: if 1 indicates the processor supports the second group of advanced
bit manipulation extensions (BZHI, MULX, PDEP, PEXT, RORX, SARX, SHLX, SHRX);

CPUID.EAX=80000001H:ECX.LZCNT[bit 5]: if 1 indicates the processor supports the LZCNT instruction.
CPUID.EAX=80000001H:ECX.PREFTEHCHW[bit 8]: if 1 indicates the processor supports the PREFTEHCHW instruction.

CPUID.(EAX=07H, ECX=0H):ECX.PREFTEHCHWT1[bit 0]: if 1 indicates the processor supports the
PREFTEHCHWT1 instruction.

@@ -11,9 +11,9 @@ namespace System.Runtime.Intrinsics.X86
/// This class provides access to Intel PCLMULQDQ hardware instructions via intrinsics
/// </summary>
[CLSCompliant(false)]
public static class Pclmulqdq
public abstract class Pclmulqdq : Sse2
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

12.13.4 Checking for AESNI Support

Before an application attempts to use AESNI instructions or PCLMULQDQ, the application should follow the steps
illustrated in Section 11.6.2, “Checking for SSE/SSE2 Support.” Next, use the additional step provided below:

Check that the processor supports AESNI (if CPUID.01H:ECX.AESNI[bit 25] = 1); check that the processor
supports PCLMULQDQ (if CPUID.01H:ECX.PCLMULQDQ[bit 1] = 1).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed with Intel that there is not a co-dependency between AESNI and PCLMULQDQ.

CC. @fiigii

@@ -10,9 +10,9 @@ namespace System.Runtime.Intrinsics.X86
/// This class provides access to Intel POPCNT hardware instructions via intrinsics
/// </summary>
[CLSCompliant(false)]
public static class Popcnt
public abstract class Popcnt : Sse42
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

12.12.3 Checking for SSE4.2 Support

Before an application attempts to use the following SSE4.2 instructions: PCMPESTRI/PCMPESTRM/PCMPISTRI/
PCMPISTRM, PCMPGTQ;the application should follow the steps illustrated in Section 11.6.2, “Checking for
SSE/SSE2 Support.” Next, use the additional step provided below:

Check that the processor supports SSE4.2 (if CPUID.01H:ECX.SSE4_2[bit 20] = 1), SSE4.1 (if
CPUID.01H:ECX.SSE4_1[bit 19] = 1), and SSSE3 (if CPUID.01H:ECX.SSSE3[bit 9] = 1).

Before an application attempts to use the CRC32 instruction, it must check that the processor supports SSE4.2 (if
CPUID.01H:ECX.SSE4_2[bit 20] = 1).

Before an application attempts to use the POPCNT instruction, it must check that the processor supports SSE4.2 (if
CPUID.01H:ECX.SSE4_2[bit 20] = 1) and POPCNT (if CPUID.01H:ECX.POPCNT[bit 23] = 1).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed with Intel that the SSE4.2 dependency includes SSE-SSE4.1.

CC. @fiigii

@@ -11,7 +11,7 @@ namespace System.Runtime.Intrinsics.X86
/// This class provides access to Intel SSE hardware instructions via intrinsics
/// </summary>
[CLSCompliant(false)]
public static class Sse
public abstract class Sse
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

11.6.2 Checking for SSE/SSE2 Support

Before an application attempts to use the SSE and/or SSE2 extensions, it should check that they are present on the
processor:

  1. Check that the processor supports the CPUID instruction. Bit 21 of the EFLAGS register can be used to check
    processor’s support the CPUID instruction.

  2. Check that the processor supports the SSE and/or SSE2 extensions (true if CPUID.01H:EDX.SSE[bit 25] = 1
    and/or CPUID.01H:EDX.SSE2[bit 26] = 1).

Operating system must provide system level support for handling SSE state, exceptions before an application can
use the SSE and/or SSE2 extensions (see Chapter 13 in the Intel® 64 and IA-32 Architectures Software Developer’s
Manual, Volume 3A).

@@ -11,10 +11,10 @@ namespace System.Runtime.Intrinsics.X86
/// This class provides access to Intel SSE2 hardware instructions via intrinsics
/// </summary>
[CLSCompliant(false)]
public static class Sse2
public abstract class Sse2 : Sse
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

11.6.2 Checking for SSE/SSE2 Support

Before an application attempts to use the SSE and/or SSE2 extensions, it should check that they are present on the
processor:

  1. Check that the processor supports the CPUID instruction. Bit 21 of the EFLAGS register can be used to check
    processor’s support the CPUID instruction.

  2. Check that the processor supports the SSE and/or SSE2 extensions (true if CPUID.01H:EDX.SSE[bit 25] = 1
    and/or CPUID.01H:EDX.SSE2[bit 26] = 1).

Operating system must provide system level support for handling SSE state, exceptions before an application can
use the SSE and/or SSE2 extensions (see Chapter 13 in the Intel® 64 and IA-32 Architectures Software Developer’s
Manual, Volume 3A).

@@ -11,10 +11,10 @@ namespace System.Runtime.Intrinsics.X86
/// This class provides access to Intel SSE3 hardware instructions via intrinsics
/// </summary>
[CLSCompliant(false)]
public static class Sse3
public abstract class Sse3 : Sse2
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

12.4.2 Checking for SSE3 Support

Before an application attempts to use the SIMD subset of SSE3 extensions, the application should follow the steps
illustrated in Section 11.6.2, “Checking for SSE/SSE2 Support.” Next, use the additional step provided below:

• Check that the processor supports the SIMD and x87 SSE3 extensions (if CPUID.01H:ECX.SSE3[bit 0] = 1).

An operating systems that provides application support for SSE, SSE2 also provides sufficient application support
for SSE3. To use FISTTP, software only needs to check support for SSE3.

In the initial implementation of MONITOR and MWAIT, these two instructions are available to ring 0 and conditionally
available at ring level greater than 0. Before an application attempts to use the MONITOR and MWAIT instructions,
the application should use the following steps:

  1. Check that the processor supports MONITOR and MWAIT. If CPUID.01H:ECX.MONITOR[bit 3] = 1, MONITOR
    and MWAIT are available at ring 0.
  2. Query the smallest and largest line size that MONITOR uses. Use CPUID.05H:EAX.smallest[bits
    15:0];EBX.largest[bits15:0]. Values are returned in bytes in EAX and EBX.
  3. Ensure the memory address range(s) that will be supplied to MONITOR meets memory type requirements.

MONITOR and MWAIT are targeted for system software that supports efficient thread synchronization, See Chapter
13 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A for details.

@@ -11,9 +11,9 @@ namespace System.Runtime.Intrinsics.X86
/// This class provides access to Intel SSE4.1 hardware instructions via intrinsics
/// </summary>
[CLSCompliant(false)]
public static class Sse41
public abstract class Sse41 : Ssse3
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

12.12.2 Checking for SSE4.1 Support

Before an application attempts to use SSE4.1 instructions, the application should follow the steps illustrated in
Section 11.6.2, “Checking for SSE/SSE2 Support.” Next, use the additional step provided below:

Check that the processor supports SSE4.1 (if CPUID.01H:ECX.SSE4_1[bit 19] = 1), SSE3 (if
CPUID.01H:ECX.SSE3[bit 0] = 1), and SSSE3 (if CPUID.01H:ECX.SSSE3[bit 9] = 1).

@@ -11,9 +11,9 @@ namespace System.Runtime.Intrinsics.X86
/// This class provides access to Intel SSE4.2 hardware instructions via intrinsics
/// </summary>
[CLSCompliant(false)]
public static class Sse42
public abstract class Sse42 : Sse41
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

12.12.3 Checking for SSE4.2 Support

Before an application attempts to use the following SSE4.2 instructions: PCMPESTRI/PCMPESTRM/PCMPISTRI/
PCMPISTRM, PCMPGTQ;the application should follow the steps illustrated in Section 11.6.2, “Checking for
SSE/SSE2 Support.” Next, use the additional step provided below:

Check that the processor supports SSE4.2 (if CPUID.01H:ECX.SSE4_2[bit 20] = 1), SSE4.1 (if
CPUID.01H:ECX.SSE4_1[bit 19] = 1), and SSSE3 (if CPUID.01H:ECX.SSSE3[bit 9] = 1).

Before an application attempts to use the CRC32 instruction, it must check that the processor supports SSE4.2 (if
CPUID.01H:ECX.SSE4_2[bit 20] = 1).

Before an application attempts to use the POPCNT instruction, it must check that the processor supports SSE4.2 (if
CPUID.01H:ECX.SSE4_2[bit 20] = 1) and POPCNT (if CPUID.01H:ECX.POPCNT[bit 23] = 1).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed with Intel that Sse4.2 requires Ssse3, despite that it is not explicitly listed.

CC. @fiigii

@@ -11,10 +11,10 @@ namespace System.Runtime.Intrinsics.X86
/// This class provides access to Intel SSSE3 hardware instructions via intrinsics
/// </summary>
[CLSCompliant(false)]
public static class Ssse3
public abstract class Ssse3 : Sse2
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

12.7.2 Checking for SSSE3 Support

Before an application attempts to use the SSSE3 extensions, the application should follow the steps illustrated in
Section 11.6.2, “Checking for SSE/SSE2 Support.” Next, use the additional step provided below:

• Check that the processor supports SSSE3 (if CPUID.01H:ECX.SSSE3[bit 9] = 1).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed with Intel that Ssse3 requires Sse3, despite that it is not explicitly listed.

CC. @fiigii

Copy link
Member Author

@tannergooding tannergooding Jul 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

12.7.1 Guidelines for Using SSSE3 Extensions

The following guidelines describe how to maximize the benefits of using SSSE3 extensions:

• Check that the processor supports SSSE3 extensions.
• Ensure that your operating system supports SSE/SSE2/SSE3/SSSE3 extensions. (Operating system support
for the SSE extensions implies sufficient support for SSE2, SSE3, and SSSE3.)
• Employ the optimization and scheduling techniques described in the Intel® 64 and IA-32 Architectures Optimization
Reference Manual (see Section 1.4, “Related Literature”).

@tannergooding
Copy link
Member Author

The Checking for <ISA> Support comments were taken from the "Intel® 64 and IA-32 Architectures
Software Developer’s Manual; Combined Volumes: 1, 2A, 2B, 2C, 2D, 3A, 3B, 3C, 3D and 4": https://software.intel.com/en-us/articles/intel-sdm

@tannergooding
Copy link
Member Author

tannergooding commented Jul 28, 2018

AVX-512 support, if/when it comes, is a good point to discuss now.

It exposes multiple "levels", where subsequent levels depend on the "base" (AVX-512F) layer.

  • I didn't see any existing levels that had a dependency on both AVX-512F and another AVX-512 layer, but documentation here seems a bit sparse.

However, AVX-512F itself depends on both AVX2 and FMA. Given that .NET doesn't have multiple inheritance today, this may pose as a problem later.

@tannergooding
Copy link
Member Author

There is some additional API cleanup we should do, but that should be done separately as it is independent of making the intrinsic classes use inheritance.

@tannergooding
Copy link
Member Author

There is some additional work in the runtime (both VM and JIT) to ensure that the ISA checks are accurate (https://github.com/dotnet/coreclr/issues/18445).

In the case of the JIT, the COMPlus_EnableAVX flag currently controls support for all VEX-encoded instructions. We should probably rename this flag to COMPlus_EnableVEX and consider adding a separate flag that controls 256-bit register support (which I believe was the original intention of COMPlus_EnableAVX).

CC. @CarolEidt, @AndyAyersMS

@4creators
Copy link

4creators commented Jul 28, 2018

;TL;DR;

This is a starting point for reasoning about the change in the proposal https://github.com/dotnet/corefx/issues/29247

We should design our HW Intrinsics classes in a similar fashion. This allows for simplicity in coding calling functions from 2 (or more) instruction sets:

if (SSE3.IsSupported)
{
    var v1 = SSE3.SetAllVector128(0.5f);   // comes from SSE
    var v2 = SSE3.HorizontalAdd(v1, v1);   // comes from SSE3
}

This means we should have our classes inherit from each other to model this inheritance in the ISAs:

public class Sse
{ ...
}
public class Sse2 : Sse
{ ...
}
public class Sse3 : Sse2
{ ...
}

IMHO we should look not from the perspective of elegant engineering but from perspective of devs using intrinsics. Having a possibility to do using static for all classes solves above problem entirely (one would need aliases for IsSupported or just use them with class name).

Otherwise the above work seems to me a bit of over engineering since assumptions made at the start of design process are not necessarily valid.

@fiigii
Copy link

fiigii commented Jul 29, 2018

@tannergooding Thanks for the work. However, I still doubt the necessity of ISA inherent design that introduces so much complexity (e.g., the runtime/compiler have to specially treat this abstract class) but solves nothing. The correct ISA hierarchy is guaranteed by hardware (or correctly designed COMPlus_EnableXXX for debugging). Meanwhile, the SetZeroVector128 issue has many simple solutions that we have talked in https://github.com/dotnet/corefx/issues/29247

@fiigii
Copy link

fiigii commented Jul 29, 2018

COMPlus_EnableAVX flag currently controls support for all VEX-encoded instructions. We should probably rename this flag to COMPlus_EnableVEX and consider adding a separate flag that controls 256-bit register support (which I believe was the original intention of COMPlus_EnableAVX).

I believe the COMPlus_EnableAVX is equivalent to COMPlus_EnableVEX.

@tannergooding
Copy link
Member Author

I believe the COMPlus_EnableAVX is equivalent to COMPlus_EnableVEX.

Yes, this is what I was saying.

The problem is that this is not obvious and it would be better to fix these flags now, before we ship 3.0

@tannergooding
Copy link
Member Author

tannergooding commented Jul 29, 2018

Meanwhile, the SetZeroVector128 issue has many simple solutions that we have talked in dotnet/corefx#29247

@fiigii, yes there were several other ideas passed around, but the API reviewers discussed the options presented and, ultimately, it was determined that we should move forward with this approach and see how it works out. This was because exposing the ISAs via the proper inheritance hierarchy addresses many of the root concerns that were raised.

This approach means that:

  • All Sse methods are available immediately from the Sse2 class
    • Likewise, Sse-Sse4.2 are avialable via Avx
  • The IsSupported check for a given ISA clearly indicates that any base ISAs are also supported (and do not need to also be checked)
  • We still need to ensure that API names are usable when all are "globalized" (such as would have been the case with using static, which is also still a valid option for users)
  • etc

@tannergooding
Copy link
Member Author

since assumptions made at the start of design process are not necessarily valid.

@4creators, could you elaborate? What assumptions would no longer be valid here?

@@ -11,10 +11,10 @@ namespace System.Runtime.Intrinsics.X86
/// This class provides access to Intel AES hardware instructions via intrinsics
/// </summary>
[CLSCompliant(false)]
public static class Aes
public abstract class Aes : Sse2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make private or internal constructors on all of them as well? That way no one tries creating the class, or a class inheriting from this class.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

{
public static bool IsSupported { get => IsSupported; }

new public static bool IsSupported { get => IsSupported; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe public new is the "typical" order. See dotnet/machinelearning#557 (comment) for the same discussion from last week in dotnet/machinelearning.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed to match the IDE default.

@@ -11,7 +11,7 @@ namespace System.Runtime.Intrinsics.X86
/// This class provides access to Intel BMI2 hardware instructions via intrinsics
/// </summary>
[CLSCompliant(false)]
public static class Bmi2
public abstract class Bmi2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(for my knowledge) Bmi2 shouldn't inherit from Bmi1?

Copy link
Member

@eerhardt eerhardt Jul 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, I see the discussion in the PNS file


In reply to: 206531480 [](ancestors = 206531480)

@@ -11,9 +11,9 @@ namespace System.Runtime.Intrinsics.X86
/// This class provides access to Intel SSE4.1 hardware instructions via intrinsics
/// </summary>
[CLSCompliant(false)]
public static class Sse41
public abstract class Sse41 : Ssse3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the place where things get a little hairy with inheritance, right?

Sse4.1 needs to check for both Sse3 and Ssse3. But Ssse3 and Sse3 have no direct releationship. So really, we would need multiple inheritance here to correctly model this hierarchy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ssse3 has a direct dependency on Sse3 and this was confirmed with Intel. This is a case where the manual is not self-consistent (12.7.1 lists the Sse3 dependency, even though 12.7.2 does not).

The only case where multiple inheritance is currently needed is for AVX-512F (which has not been reviewed/etc). Because of a dependency on both AVX2 and FMA, where they are siblings (both inherit from AVX). -- I called this bit out here: #19186 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mistake on my end. Fixed. -- I also went through the other ISAs and ensures they looked correct.

@tannergooding
Copy link
Member Author

Responded to all feedback so far.

Copy link
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from my end.

Thanks, @tannergooding.

@tannergooding
Copy link
Member Author

@CarolEidt, are there any concerns on your end for this?

@CarolEidt
Copy link

LGTM - thanks for all the work, and the detailed notes.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants