Skip to content

JIT: Re-enable acceleration of Vector512<long>.op_Multiply #111832

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 26, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions src/coreclr/jit/hwintrinsicxarch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3335,7 +3335,7 @@ GenTree* Compiler::impSpecialIntrinsic(NamedIntrinsic intrinsic,
{
// Emulate NI_AVX512DQ_VL_MultiplyLow with SSE41 for SIMD16
}
else
else if (simdSize != 64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, I think this needs Avx512DQ ISA check

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're treating any of the Vector512 methods other than IsSupported as intrinsic, that implies we have the full baseline AVX-512 set (F,DQ,BW,CD,VL). It's a bit confusing because some of the import paths assert or check that, but most don't. I'm actually cleaning up some of those redundant asserts in a different branch now.

Copy link
Member

@EgorBo EgorBo Jan 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, I thought that Vector512.IsHardwareAccelerated only relies on AVX512F, but looks like DOTNET_EnableAVX512DQ=0 turns it off so it's ok.

Copy link
Member Author

@saucecontrol saucecontrol Jan 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, IsHardwareAccelerated 😄

The rule is Vector512.IsHardwareAccelerated will return false unless all of the following are satisfied:

  1. Baseline AVX-512 support (F,CD,DW,BW,VL)
  2. Not throttling based on CPUID check for Skylake-X and others with severe downclocking for 512-bit vector instructions, or DOTNET_PreferredVectorBitWidth: >= 512
  3. No DOTNET_PreferredVectorBitWidth: < 512

The rule for whether Vector512 methods actually import as intrinsic is only that we have the baseline AVX-512 set, meaning IsHardwareAccelerated may return false, but all methods may actually be accelerated anyway.

So the fact that we're importing the methods for Vector512 as intrinsic in the first place means the ISA requirements have already been met.

Vector128 and Vector256 are a bit different, because the baseline ISA requirement may not be enough to accelerate all methods.

Vector256.IsHardwareAccelerated returns true only if AVX2 is supported, but we attempt to import methods as intrinsic as long as AVX is supported. Since many of the methods require AVX2 for acceleration, they have an extra check for AVX2 and then fall back to managed if it's not available. Hence all the (simdSize != 32) || compOpportunisticallyDependsOn(InstructionSet_AVX2) checks.

Similar checks are not included for Vector128, because the base requirement is SSE2, so almost all methods can be accelerated, minus a few that require SSE4.1 and check for it explicitly.

Clear as mud, I know...

{
// Software fallback
break;
Expand Down Expand Up @@ -3393,7 +3393,7 @@ GenTree* Compiler::impSpecialIntrinsic(NamedIntrinsic intrinsic,
{
// Emulate NI_AVX512DQ_VL_MultiplyLow with SSE41 for SIMD16
}
else
else if (simdSize != 64)
{
// Software fallback
break;
Expand Down
Loading