Skip to content

JIT: Re-enable acceleration of Vector512<long>.op_Multiply #111832

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 26, 2025

Conversation

saucecontrol
Copy link
Member

This was a regression in 9.0, from #103555

https://godbolt.org/z/11hs3Kqdd

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 25, 2025
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jan 25, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Copy link
Member

@EgorBo EgorBo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@@ -3335,7 +3335,7 @@ GenTree* Compiler::impSpecialIntrinsic(NamedIntrinsic intrinsic,
{
// Emulate NI_AVX512DQ_VL_MultiplyLow with SSE41 for SIMD16
}
else
else if (simdSize != 64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, I think this needs Avx512DQ ISA check

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're treating any of the Vector512 methods other than IsSupported as intrinsic, that implies we have the full baseline AVX-512 set (F,DQ,BW,CD,VL). It's a bit confusing because some of the import paths assert or check that, but most don't. I'm actually cleaning up some of those redundant asserts in a different branch now.

Copy link
Member

@EgorBo EgorBo Jan 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, I thought that Vector512.IsHardwareAccelerated only relies on AVX512F, but looks like DOTNET_EnableAVX512DQ=0 turns it off so it's ok.

Copy link
Member Author

@saucecontrol saucecontrol Jan 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, IsHardwareAccelerated 😄

The rule is Vector512.IsHardwareAccelerated will return false unless all of the following are satisfied:

  1. Baseline AVX-512 support (F,CD,DW,BW,VL)
  2. Not throttling based on CPUID check for Skylake-X and others with severe downclocking for 512-bit vector instructions, or DOTNET_PreferredVectorBitWidth: >= 512
  3. No DOTNET_PreferredVectorBitWidth: < 512

The rule for whether Vector512 methods actually import as intrinsic is only that we have the baseline AVX-512 set, meaning IsHardwareAccelerated may return false, but all methods may actually be accelerated anyway.

So the fact that we're importing the methods for Vector512 as intrinsic in the first place means the ISA requirements have already been met.

Vector128 and Vector256 are a bit different, because the baseline ISA requirement may not be enough to accelerate all methods.

Vector256.IsHardwareAccelerated returns true only if AVX2 is supported, but we attempt to import methods as intrinsic as long as AVX is supported. Since many of the methods require AVX2 for acceleration, they have an extra check for AVX2 and then fall back to managed if it's not available. Hence all the (simdSize != 32) || compOpportunisticallyDependsOn(InstructionSet_AVX2) checks.

Similar checks are not included for Vector128, because the base requirement is SSE2, so almost all methods can be accelerated, minus a few that require SSE4.1 and check for it explicitly.

Clear as mud, I know...

@EgorBo EgorBo merged commit d5c8265 into dotnet:main Jan 26, 2025
117 of 119 checks passed
grendello added a commit to grendello/runtime that referenced this pull request Jan 27, 2025
* main: (22 commits)
  Clean up Stopwatch a bit (dotnet#111834)
  JIT: Fix embedded broadcast simd size (dotnet#111638)
  Revert potential UB due to aliasing + more WB removals (dotnet#111733)
  re-enable acceleration of Vector512<long>.op_Multiply (dotnet#111832)
  Handle OSSL 3.4 change to SAN:othername formatting
  JIT: Fix stack allocated arrays for NativeAOT (dotnet#111827)
  JIT: enhance RBO inference for similar compares to constants (dotnet#111766)
  JIT: Don't run optSetBlockWeights when we have PGO data (dotnet#111764)
  [Android] Make sure RuntimeFlavor=CoreCLR when clr subset is specified (dotnet#111821)
  Change empty subject test certificate to include a critical SAN.
  Fix reversed code offsets in GcInfo (dotnet#111792)
  Swap some libraries areas between leads (dotnet#111816)
  Add left-handed spherical and cylindrical billboards (dotnet#109605)
  JIT: revise `optRelopImpliesRelop` to always set `reverseSense` (dotnet#111803)
  Fix Zip64ExtraField handling (dotnet#111802)
  Add build support for Android+CoreCLR (dotnet#110471)
  arm64: Add bic(s) compact encoding (dotnet#111452)
  JIT: Ensure `BBF_PROF_WEIGHT` flag is set when we have PGO data (dotnet#111780)
  Add support for AVX10.2, Add AVX10.2 API surface and template tests (dotnet#111209)
  JIT: Preliminary for enabling inlining late devirted calls (dotnet#111782)
  ...
@saucecontrol saucecontrol deleted the mullq branch January 28, 2025 01:22
@github-actions github-actions bot locked and limited conversation to collaborators Feb 27, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants