Skip to content

Fix Vector256.IsHardwareAccelerated in R2R binaries #65351

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 18, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions src/coreclr/tools/Common/JitInterface/CorInfoImpl.cs
Original file line number Diff line number Diff line change
Expand Up @@ -3894,6 +3894,10 @@ private HRESULT getPgoInstrumentationResults(CORINFO_METHOD_STRUCT_* ftnHnd, ref

private bool notifyInstructionSetUsage(InstructionSet instructionSet, bool supportEnabled)
{
InstructionSet_ARM64 asArm64 = (InstructionSet_ARM64)instructionSet;
InstructionSet_X64 asX64 = (InstructionSet_X64)instructionSet;
InstructionSet_X86 asX86 = (InstructionSet_X86)instructionSet;

if (supportEnabled)
{
_actualInstructionSetSupported.AddInstructionSet(instructionSet);
Expand All @@ -3904,6 +3908,10 @@ private bool notifyInstructionSetUsage(InstructionSet instructionSet, bool suppo
// set is not a reason to not support usage of it.
if (!isMethodDefinedInCoreLib())
{
// If a vector instruction set is marked as attempted to be used, but is also explicitly unsupported
// then we need to mark as explicitly unsupported the implied instruction set associated with the vector set.
instructionSet = InstructionSetFlags.ConvertToImpliedInstructionSetForVectorInstructionSets(_compilation.TypeSystemContext.Target.Architecture, instructionSet);

_actualInstructionSetUnsupported.AddInstructionSet(instructionSet);
}
}
Expand Down
29 changes: 29 additions & 0 deletions src/coreclr/tools/Common/JitInterface/CorInfoInstructionSet.cs
Original file line number Diff line number Diff line change
Expand Up @@ -294,6 +294,35 @@ public void ExpandInstructionSetByImplication(TargetArchitecture architecture)
this = ExpandInstructionSetByImplicationHelper(architecture, this);
}

public static InstructionSet ConvertToImpliedInstructionSetForVectorInstructionSets(TargetArchitecture architecture, InstructionSet input)
{
switch(architecture)
{
case TargetArchitecture.ARM64:
switch(input)
{
case InstructionSet.ARM64_Vector64: return InstructionSet.ARM64_AdvSimd;
case InstructionSet.ARM64_Vector128: return InstructionSet.ARM64_AdvSimd;
}
break;
case TargetArchitecture.X64:
switch(input)
{
case InstructionSet.X64_Vector128: return InstructionSet.X64_SSE;
case InstructionSet.X64_Vector256: return InstructionSet.X64_AVX;
}
break;
case TargetArchitecture.X86:
switch(input)
{
case InstructionSet.X86_Vector128: return InstructionSet.X86_SSE;
case InstructionSet.X86_Vector256: return InstructionSet.X86_AVX;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should Vector<T> be normalized on the same handling?

Should there be a comment covering that:

  1. The ABI handling for Vector128/256 requires Sse/Avx, this is what's required for them to be passed around as __m128 and __m256, respectively
  2. The Vector128/256.IsHardwareAccelerated property only returns true under Sse2/Avx2 (this is when users can depend on most functions exposed to be SIMD accelerated)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Abi handling is actually separate from this entirely and is handled by an abi stability protection construct which works independently of all of this. (Mostly as the ABI for both Vector128 and Vector256 doesn't match the native vector ABI on Windows, and there was a desire at one point to unify on the actual official ABI. This has not yet happened.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is this check only for IsHardwareAccelerated then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just trying to understand what these particular checks line up with.

We support efficiently passing and some acceleration for Vector128<T> under SSE and Vector256<T> under AVX.
However, IsHardwareAccelerated will only return true under SSE2 and AVX2, respectively.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lines up with the Vector128/Vector256 api usage in the JIT. Effectively, when the jit determines that it cannot use a VectorXXX api it will call up, and notify crossgen that It has determined that if VectorXXX is useable at runtime via any intrinsics the generated code cannot be used. At that stage crossgen needs to encode that one of the instruction sets is not permitted, as it cannot encode that the VectorXXX intrinsics are not available. So, what it does it encode that the minimum required instruction set for the VectorXXX intrinsics is not available. Handling of IsHardwareAccelerated is actually a side-effect of this. In the encoding there are actually more states for how this works than is pleasant to contemplate. For instance, the options available to crossgen for encoding a method which calls Vector256<T>.IsHardwareAccelerated are:

  1. Generate a method where IsHardwareAccelerated returns false, and mark that if Avx is supported, then the method cannot be used. (In this case the JIT will specify that support for the Vector256 instruction set is required to be disabled)
  2. Generate a method where IsHardwareAccelerated returns false, and mark that if Avx2 is supported, then the method cannot be used, and also mark that Avx must be supported to use the method. (In this case the JIT will specify that support for the Vector256 instruction set is required to be enabled, but Avx2 support is required to be disabled.)
  3. Generate a method where IsHardwareAccelerated returns true, and mark that if Avx2 is not supported then the method cannot be used. (In this case the JIT will specify that support for the Vector256 instruction set is required to be enabled, and Avx2 support is required to be enabled.)

The problem that this bug solves is that while the previous logic correctly handled cases 2 and 3, it did not handle case 1. (What would happen is that the JIT would notify crossgen that Vector256 could not be used, and be we would fail to record that there was an instruction set restriction on the generated code.)

}
break;
}
return input;
}

public static InstructionSetFlags ExpandInstructionSetByImplicationHelper(TargetArchitecture architecture, InstructionSetFlags input)
{
InstructionSetFlags oldflags = input;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,9 @@
; Add jit 64bit architecture specific instruction set when instruction set is available
; instructionset64bit,<architecture>,<jit instruction set name>
;
; Note that a instruction set is a "Vector" instruction set. A vector instruction set may only imply a single other instruction set
; vectorinstructionset,<architecture>,<jit instruction set name>
;
; Add an instruction set implication (i.e, if instruction set A is present, then instruction set B must be present too.)
; implication,<architecture>,<jit instruction set name>,<implied jit instruction set name>
;
Expand Down Expand Up @@ -60,6 +63,9 @@ instructionset64bit,X86 ,PCLMULQDQ
instructionset64bit,X86 ,POPCNT
instructionset64bit,X86 ,AVXVNNI

vectorinstructionset,X86 ,Vector128
vectorinstructionset,X86 ,Vector256

implication ,X86 ,SSE ,X86Base
implication ,X86 ,SSE2 ,SSE
implication ,X86 ,SSE3 ,SSE2
Expand Down Expand Up @@ -109,6 +115,9 @@ instructionset64bit,ARM64 ,Rdm
instructionset64bit,ARM64 ,Sha1
instructionset64bit,ARM64 ,Sha256

vectorinstructionset,ARM64,Vector64
vectorinstructionset,ARM64,Vector128

implication ,ARM64 ,AdvSimd ,ArmBase
implication ,ARM64 ,Aes ,ArmBase
implication ,ARM64 ,Crc32 ,ArmBase
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,7 @@ public InstructionSetImplication(string architecture, InstructionSetImplication
SortedDictionary<int,string> _r2rNamesByNumber = new SortedDictionary<int,string>();
SortedSet<string> _architectures = new SortedSet<string>();
Dictionary<string,List<string>> _architectureJitNames = new Dictionary<string,List<string>>();
Dictionary<string,List<string>> _architectureVectorInstructionSetJitNames = new Dictionary<string,List<string>>();
HashSet<string> _64BitArchitectures = new HashSet<string>();
Dictionary<string,string> _64BitVariantArchitectureJitNameSuffix = new Dictionary<string,string>();

Expand All @@ -96,6 +97,8 @@ void ArchitectureEncountered(string arch)
_architectures.Add(arch);
if (!_architectureJitNames.ContainsKey(arch))
_architectureJitNames.Add(arch, new List<string>());
if (!_architectureVectorInstructionSetJitNames.ContainsKey(arch))
_architectureVectorInstructionSetJitNames.Add(arch, new List<string>());
}

void ValidateArchitectureEncountered(string arch)
Expand Down Expand Up @@ -162,6 +165,12 @@ public bool ParseInput(TextReader tr)
_architectureJitNames[command[1]].Add(command[5]);
_instructionSets.Add(new InstructionSetInfo(command[1],command[2],command[3],command[4],command[5],command[6]));
break;
case "vectorinstructionset":
if (command.Length != 3)
throw new Exception("Incorrect number of args for vectorinstructionset");
ValidateArchitectureEncountered(command[1]);
_architectureVectorInstructionSetJitNames[command[1]].Add(command[2]);
break;
case "instructionset64bit":
if (command.Length != 3)
throw new Exception("Incorrect number of args for instructionset");
Expand Down Expand Up @@ -189,6 +198,10 @@ public bool ParseInput(TextReader tr)
_instructionSets.Add(new InstructionSetInfo(targetarch, val));
_architectureJitNames[targetarch].Add(val.JitName);
}
foreach (var val in _architectureVectorInstructionSetJitNames[arch].ToArray())
{
_architectureVectorInstructionSetJitNames[targetarch].Add(val);
}
foreach (var val in _implications.ToArray())
{
if (val.Architecture != arch)
Expand Down Expand Up @@ -472,6 +485,48 @@ public void ExpandInstructionSetByImplication(TargetArchitecture architecture)
this = ExpandInstructionSetByImplicationHelper(architecture, this);
}

public static InstructionSet ConvertToImpliedInstructionSetForVectorInstructionSets(TargetArchitecture architecture, InstructionSet input)
{
switch(architecture)
{
");
foreach (string architecture in _architectures)
{
if (_architectureVectorInstructionSetJitNames[architecture].Count == 0)
continue;

tr.Write($@" case TargetArchitecture.{architecture}:
switch(input)
{{
");
foreach (var vectorInstructionSet in _architectureVectorInstructionSetJitNames[architecture])
{
string impliedInstructionSet = null;
foreach (var implication in _implications)
{
if (implication.Architecture != architecture) continue;
if (implication.JitName == vectorInstructionSet)
{
if (impliedInstructionSet != null)
{
throw new Exception($"Vector instruction set {vectorInstructionSet} implies multiple instruction sets");
}
impliedInstructionSet = implication.ImpliedJitName;
}
}
if (impliedInstructionSet != null)
{
tr.WriteLine($" case InstructionSet.{architecture}_{vectorInstructionSet}: return InstructionSet.{architecture}_{impliedInstructionSet};");
}
}
tr.WriteLine(@" }
break;");
}

tr.Write(@" }
return input;
}

public static InstructionSetFlags ExpandInstructionSetByImplicationHelper(TargetArchitecture architecture, InstructionSetFlags input)
{
InstructionSetFlags oldflags = input;
Expand Down
Loading