Skip to content

Ensure the JIT view of InstructionSets is correct in the face of JitConfig switches #62420

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jan 12, 2022

Conversation

tannergooding
Copy link
Member

@tannergooding tannergooding commented Dec 5, 2021

This resolves #60035

Part of the problem in #60035 was that the VM wasn't tracking InstructionSet_Vector* and instead these were added by the JIT. The other problem was that EnableHWIntrinsic=0 wasn't correctly disabling the baseline ISA for the platform, which lead to an inconsistent world view on some APIs.

There are a few reasons why this isn't being done in the VM as suggested #60035 (comment).

Reading these flags in the VM is problematic for a few reasons namely that some of these flags control the "baseline" ISAs (X86Base, SSE, SSE2; ArmBase, AdvSimd) and only influence hardware intrinsics. To that end, the VM must report such ISAs as reported while the JIT still needs to support disabling them and having them correctly impact the actual codegen.

Likewise, we have cross-targeting AltJit where the VM won't be correct at all and same-target AltJit where you may want to view disassembly for ISAs your hardware doesn't support (which will become more prevalent as we try to support "latest" ISAs including AvxVnni which was added as "experimental" before hardware existed; or potential future support for SVE and AVX-512 where its unlikely devs may support it locally).

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 5, 2021
@ghost
Copy link

ghost commented Dec 5, 2021

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

This resolves #60035

Part of the problem in #60035 was that the VM wasn't tracking InstructionSet_Vector* and instead these were added by the JIT. The other problem was that EnableHWIntrinsic=0 wasn't correctly disabling the baseline ISA for the platform, which lead to an inconsistent world view on some APIs.

There are a few reasons why this isn't being done in the VM as suggested #60035 (comment).

Reading these flags in the VM is problematic for a few reasons namely that some of these flags control the "baseline" ISAs (X86Base, SSE, SSE2; ArmBase, AdvSimd) and only influence hardware intrinsics. To that end, the VM must report such ISAs as reported while the JIT still needs to support disabling them and having them correctly impact the actual codegen.

Likewise, we have cross-targeting AltJit where the VM won't be correct at all and same-target AltJit where you may want to view disassembly for ISAs your hardware doesn't support (which will become more prevalent as we try to support "latest" ISAs including AvxVnni which was added as "experimental" before hardware existed; or potential future support for SVE and AVX-512 where its unlikely devs may support it locally).

Author: tannergooding
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@tannergooding
Copy link
Member Author

CC. @echesakovMSFT, @jkotas

@jkotas
Copy link
Member

jkotas commented Dec 5, 2021

Are the *.IsSupported properties guaranteed to return the same value for crossgened and JITed code after this change?

@jkotas jkotas requested a review from davidwrighton December 5, 2021 18:47
@tannergooding
Copy link
Member Author

Are the *.IsSupported properties guaranteed to return the same value for crossgened and JITed code after this change?

Based on the failures, looks like no.

The VM (src/coreclr/vm) logic and JIT (src/coreclr/jit) logic are both now correct and once again match the behavior we had in .NET 5 and so compiling for the hardware the JIT is running on is correct.

I expect there is some issue with the crossgen logic here, likely the same that is causing: #61471. The crossgen VM needs to report what hardware it's targeting and likely also ensure that the relevant COMPlus_Enable* flags aren't present (or the JIT needs to know to ignore them for crossgen).

@tannergooding
Copy link
Member Author

Notably, I'd also think that the JIT and Crossgen don't need to agree here.

Crossgen is supposed to mark what ISAs are used by a method. If a method uses an ISA that the hardware doesn't support, it has to rejit at which point whatever the JIT is targeting is fine.

If the hardware does support it, then it shouldn't matter what the JIT thinks because the code will still run fine. The one notable exception here may be AVX because that also controls the VEX encoding and by extension Vector256 support.

Therefore, if AVX is turned on and a method uses any SIMD instruction; then it will be emitted using the VEX encoding and it won't be usable on hardware that doesn't support AVX.

But aside from that, if the method is crossgen'd for SSE4.1 and the hardware supports SSE4.1, then even if the JIT says "no SSE4.1", the method will still execute fine.

@jkotas
Copy link
Member

jkotas commented Dec 5, 2021

But aside from that, if the method is crossgen'd for SSE4.1 and the hardware supports SSE4.1, then even if the JIT says "no SSE4.1", the method will still execute fine.

This works fine for opportunistic ISA uses where the user code cannot observe the functional difference. It breaks down when the code can observe IsSupported property. For example, consider code like this and various interleavings of AOTed and JITed code:

void M1()
{
    if (Avx.IsSupported) M2();
}

void M2()
{
   if (!Avx.IsSupported) throw new InvalidOperationException();
}

We have seen patterns like this in real-world code.

@tannergooding
Copy link
Member Author

How would you recommend handling the AltJit case?

That is, I'm running on x64 and want to get disassembly for Arm64? Should there be a new JIT/EE interface that asks for the ISA support for a given target platform or should the JIT assume "everything is supported" and replicate the JitConfig checks?

@jkotas
Copy link
Member

jkotas commented Dec 5, 2021

I would either do nothing, or replicate the environment checks when the JIT runs as cross-compiling altjit.

I do not think it makes sense to introduce altjit specific methods on JIT/EE interface.

@tannergooding tannergooding force-pushed the fix-60035 branch 2 times, most recently from faddb15 to 1ea5ea2 Compare December 6, 2021 04:57
#define EXTERNAL_JitEnableAVX_Default 1
#else // !(defined(TARGET_AMD64) || defined(TARGET_X86)
#define EXTERNAL_JitEnableAVX_Default 0
#endif // !(defined(TARGET_AMD64) || defined(TARGET_X86)
RETAIL_CONFIG_DWORD_INFO(EXTERNAL_FeatureSIMD, W("FeatureSIMD"), EXTERNAL_FeatureSIMD_Default, "Enable SIMD intrinsics recognition in System.Numerics.dll and/or System.Numerics.Vectors.dll")
Copy link
Member Author

@tannergooding tannergooding Dec 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to do something about FeatureSIMD. Originally it was intended to be a workaround in case the System.Numerics.Vector* logic had issues (back before hardware intrinsics were even a concept).

However, given how critical SIMD is to the ABI and even basic operations in the JIT for x64 and Arm64; I think it might be better to actually remove the public facing flag here and leave it only a compile level switch for x86 Unix and Arm32 (the two targets where FEATURE_SIMD == 0 today).

If we don't remove it, then we need to also have the VM take it into account for certain ISAs (basically everything except X86Base, Lzcnt, and ArmBase; Bmi1 and Bmi2 are "scalar" but not included because they depend on EnableAvx=1 due to the VEX encoding).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-- If we are fine with removing flags, EXTERNAL_EnableSSE3_4 is also legacy and unnecessary as it has the same meaning as EnableSSE3

#endif

instructionSetFlags = EnsureInstructionSetFlagsAreValid(instructionSetFlags);
opts.setSupportedISAs(instructionSetFlags);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commenting here because its in compiler.cpp, but further down on https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/compiler.cpp#L5608-L5637, we have some logic for if (!info.compMatchedVM).

It'd be helpful to understand what this means for crossgen/r2r and if its better handled as simply if (info.altJit). In particular we want to support:

  1. The ability to have "correct" instruction sets for the actual target when the VM doesn't match (i.e. cross-compiling for Arm64 on an x64 box; crossgen or otherwise)
  2. To allow for an AltJit to generate disassembly for hardware you don't have (i.e. see and debug Arm.AdvSimd on x64; or even X86.Avx2 on an x64 box that only supports Avx)

@tannergooding
Copy link
Member Author

@davidwrighton, @MichalStrehovsky there is something I'm not understanding about these flags for NativeAOT.

There are failures such as Vectors HW acceleration state unexpected - expected True, got False. Is there something additional expected for NativeAOT here?

@tannergooding
Copy link
Member Author

Actually, I think I figured it out. I seem to have swapped the order of Set64BitInstructionSetVariants and EnsureValidInstructionSetSupport at some point and it was causing the flags to be incorrectly reported to the JIT.

@tannergooding
Copy link
Member Author

tannergooding commented Jan 4, 2022

After double checking the results reported by JIT\Regression\JitBlue\Runtime_34587, this is once again behaving as expected so hopefully CI passes now:

No Environment Variables

Supported x86 ISAs:
  AES:           True
  AVX:           True
  AVX2:          True
  BMI1:          True
  BMI2:          True
  FMA:           True
  LZCNT:         True
  PCLMULQDQ:     True
  POPCNT:        True
  SSE:           True
  SSE2:          True
  SSE3:          True
  SSE4.1:        True
  SSE4.2:        True
  SSSE3:         True
  X86Base:       True
Supported x64 ISAs:
  AES.X64:       True
  AVX.X64:       True
  AVX2.X64:      True
  BMI1.X64:      True
  BMI2.X64:      True
  FMA.X64:       True
  LZCNT.X64:     True
  PCLMULQDQ.X64: True
  POPCNT.X64:    True
  SSE.X64:       True
  SSE2.X64:      True
  SSE3.X64:      True
  SSE4.1.X64:    True
  SSE4.2.X64:    True
  SSSE3.X64:     True
  X86Base.X64:   True

COMPlus_EnableHWIntrinsic=0

Supported x86 ISAs:
  AES:           False
  AVX:           False
  AVX2:          False
  BMI1:          False
  BMI2:          False
  FMA:           False
  LZCNT:         False
  PCLMULQDQ:     False
  POPCNT:        False
  SSE:           False
  SSE2:          False
  SSE3:          False
  SSE4.1:        False
  SSE4.2:        False
  SSSE3:         False
  X86Base:       False
Supported x64 ISAs:
  AES.X64:       False
  AVX.X64:       False
  AVX2.X64:      False
  BMI1.X64:      False
  BMI2.X64:      False
  FMA.X64:       False
  LZCNT.X64:     False
  PCLMULQDQ.X64: False
  POPCNT.X64:    False
  SSE.X64:       False
  SSE2.X64:      False
  SSE3.X64:      False
  SSE4.1.X64:    False
  SSE4.2.X64:    False
  SSSE3.X64:     False
  X86Base.X64:   False

EnableSSE=0

Supported x86 ISAs:
  AES:           False
  AVX:           False
  AVX2:          False
  BMI1:          False
  BMI2:          False
  FMA:           False
  LZCNT:         True
  PCLMULQDQ:     False
  POPCNT:        False
  SSE:           False
  SSE2:          False
  SSE3:          False
  SSE4.1:        False
  SSE4.2:        False
  SSSE3:         False
  X86Base:       True
Supported x64 ISAs:
  AES.X64:       False
  AVX.X64:       False
  AVX2.X64:      False
  BMI1.X64:      False
  BMI2.X64:      False
  FMA.X64:       False
  LZCNT.X64:     True
  PCLMULQDQ.X64: False
  POPCNT.X64:    False
  SSE.X64:       False
  SSE2.X64:      False
  SSE3.X64:      False
  SSE4.1.X64:    False
  SSE4.2.X64:    False
  SSSE3.X64:     False
  X86Base.X64:   True

EnableSSE2=0

Supported x86 ISAs:
  AES:           False
  AVX:           False
  AVX2:          False
  BMI1:          False
  BMI2:          False
  FMA:           False
  LZCNT:         True
  PCLMULQDQ:     False
  POPCNT:        False
  SSE:           True
  SSE2:          False
  SSE3:          False
  SSE4.1:        False
  SSE4.2:        False
  SSSE3:         False
  X86Base:       True
Supported x64 ISAs:
  AES.X64:       False
  AVX.X64:       False
  AVX2.X64:      False
  BMI1.X64:      False
  BMI2.X64:      False
  FMA.X64:       False
  LZCNT.X64:     True
  PCLMULQDQ.X64: False
  POPCNT.X64:    False
  SSE.X64:       True
  SSE2.X64:      False
  SSE3.X64:      False
  SSE4.1.X64:    False
  SSE4.2.X64:    False
  SSSE3.X64:     False
  X86Base.X64:   True

etc, for the rest of the available EnableIsa=0 switches including for ARM64

@tannergooding
Copy link
Member Author

Nope. @MichalStrehovsky, I'm definitely going to need help with NativeAOT here.

Even debugging in the flags look correct, but its still faulting for reasons I can't determine.

@tannergooding
Copy link
Member Author

Ok, everything looks to be working as expected now including for CG2/ILC.

I've rebased onto main and updated JitBlue/Runtime_34587 to also display the results of IsHardwareAccelerated and validate they are correct in the right scenarios.

@tannergooding
Copy link
Member Author

Thanks to both David and Michal for assisting here!

Comment on lines +75 to +80
implication ,X86 ,LZCNT ,X86Base
implication ,X86 ,PCLMULQDQ ,SSE2
implication ,X86 ,POPCNT ,SSE42
implication ,X86 ,Vector128 ,SSE
implication ,X86 ,Vector256 ,AVX
implication ,X86 ,AVXVNNI ,AVX2
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple new "implications" for x86. Nothing major, just ensuring AvxVnni and Lzcnt are correctly dependent on the ISAs they inherit from and ensure Vector128 is correctly marked as being dependent on Sse

Comment on lines +119 to +120
implication ,ARM64 ,Vector64 ,AdvSimd
implication ,ARM64 ,Vector128 ,AdvSimd
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for ARM64. Just ensuring Vector64/128 are correctly marked as being dependent on AdvSimd.


");

foreach (string architecture in _architectures)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably easier to see these changes by looking at https://github.com/dotnet/runtime/pull/62420/files#diff-47b7c76a34a29a49750b1591fcf7ed6196296eb075ca71cc160b03ff5e717f49R21.

This basically just adds InstructionSet_X64, InstructionSet_X86, and InstructionSet_ARM64 enums that contain only the ISAs for that architecture. These aren't actually used by anything, they are just returned by three new properties on InstructionSetFlags to make it simpler to debug.

Without this, you'll end up seeing the ARM64_* enum values regardless of platform due to how the debugger displays enums with overlapping values. This was making it practically impossible to debug since you'd see ARM64_AdvSimd rather than X64_SSE for example.

@tannergooding
Copy link
Member Author

CI never started, closing and reopening to try and force a refresh.

@tannergooding
Copy link
Member Author

Ah, CI is broken due to #63480

Copy link
Member

@MichalStrehovsky MichalStrehovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The crossgen2/nativeaot changes look good. Thank you for the debugging improvement!

@tannergooding tannergooding merged commit 4f89b79 into dotnet:main Jan 12, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Feb 11, 2022
@tannergooding tannergooding deleted the fix-60035 branch November 11, 2022 15:27
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Setting COMPlus_EnableHWIntrinsic=0 causes System.PlatformNotSupportedException with optimized libraries code
4 participants