Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JIT] Enable EGPRs in JIT by adding REX2 encoding to the backend. #106557

Open
wants to merge 51 commits into
base: main
Choose a base branch
from

Conversation

Ruihan-Yin
Copy link
Contributor

Overview

This PR is the follow-up PR after #104637, which added the initial CPUID and XSAVE updates for APX.

This PR adds REX2 encoding functionality for legacy instructions which enables the use of EGPR for add, sub, etc. Note that this PR focuses on REX2 encoding only: a follow up PR will enable EGPR support via the register allocator.

Specification

REX2 is a 2-byte prefix with a leading byte of 0xD5, detailed format below:
rex2

Similar to REX prefix, it provides the extended bits for the MODRM.REG field, REX2.R4/R3, and MODRM.R/M field, REX2.B4/B3, and the index register in SIB byte, REX2.X4/X3, those bits will act as the higher 5th/4th bits and combine with the field in MODRM and SIB byte as a 5-bit binary to access up to 32 registers.

REX2 prefix is generally available for legacy-map-0 and legacy-map-1 instructions, say 1-byte opcode or 2-byte opcode with escape byte 0x0F, with some exceptions.

Like VEX/EVEX, REX2 is considered as the last prefix before the main opcode, so it can not co-exist with REX/VEX/EVEX.

Design

The bulk of the changes occur in the backend emitter.

As there is no existing hardware that has APX support yet, we had some hacks to bypass the CPUID checks. In this PR, DOTNET_JitStressRex2Encoding will force all the eligible instructions to be encoded in REX2, regardless the presence of EGPRs in the operand. We had another switch DOTNET_JitBypassAPXCheck, with which will only bypass the APX CPUID check but JIT will encode REX2 only if needed, this is more useful when the LSRA changes come.

Note: REX2 can be used to address the lower 16 vector registers (XMM0~XMM15). But in this PR, we are not planning to add the support for this part now for simplicity, and the EGPRs functionality for SIMD instructions can be achieved with EVEX, we are open to discuss this part and tweak the design in the follow-up PRs.

Testing

We followed a multi-step testing plan to verify the encoding correctness and the semantic correctness.

Testing results will be presented below.

1. Emitter unit tests

In codgenxarch.cpp, similar to genAmd64EmitterUnitTestsSse2, we used the JitLateDisasm feature to insert instructions to encode as unit tests for emitter, and LateDisasm will invoke LLVM to disasm the code stream, this gave us the chance to cross validate the disassembly from JIT and LLVM. The output of this step is to verify the emit paths are generating "correct" code that would not trigger #UD or have wrong semantics.

Note that we are using a custom coredistools.dll which uses a recent LLVM that supports APX decoding.

2. SuperPMI

In this step, we would run the SuperPMI pipeline to get the asmdiffs with REX2 on and off, the inputs are all the MCH files. This step will give us the chance to check if there is any assertion failure or internal error within JIT and since the pipeline will invoke coredistools.dll as well, so we can verify the encoding correctness in a larger scope.

To ensure the new changes will not hit the existing code path in terms of throughput, we ran tpdiff with base JIT to be the main branch where changes are based on, and diff JIT to be the one with all the REX2 changes.

3. JIT unit tests

The 2 steps mentioned above are mainly verifying the encoding correctness of the generated binary code. Then the last will examine the semantic correctness of the generated code, say since we are simply forcing all the compatible instructions to be encoded in REX2, so the original semantics should not change, so we expect exactly the same output with REX2 on/off.

We used the existing CoreCLR unit test set: JIT and run it in the Intel SDE emulator.

Follow-up plans

This PR is only intended to provide the REX2 encoding functionality to the JIT backend, in terms of how to properly use it, we are preparing another PR that includes the updates on LSRA such that JIT will be able to allocate EGPRs only when needed, and generate optimal code.

Update comments.

Merge the REX2 changes into the original legacy emit path

bug fix: Set REX2.W with correct mask code.

register encoding and prefix emitting logics.

Add REX2 prefix emit logic

bug fixes

Add Stress mode for REX2 encoding and some bug fixes

resolve comments:
1. add assertion check for UD opcodes.
2. add checks for EGPRs.

Add REX2 to emitOutputAM, and let LEA to be REX2 compatible.

Add REX2.X encoding for SIB byte

But fixes: add REX2 prefix on the path in RI where MOV is specially handled.

Enable REX2 encoding for `movups`

fixed bugs in REX2 prefix emitting logic when working with map 1 instructions, and enabled REX2 for POPCNT

legacy map index-er

bug fixes

some clean-up

Adding initial APX unit testing path.

Adding a coredistools dll that has LLVM APX disasm capability.

It must be coppied into a CORE_ROOT manually.

clean up work for REX2

narrow the REX2 scope to `sub` only

some clean up based on the comments.

bug fix

resolve comment
 - SV path is mostly for debugging purposes

Added encoding unit tests for instructions with immediates
Code refactoring: AddX86PrefixIfNeeded.
… missing in JIT, may indicate these instructions are not being used in JIT, drop them for now.
Refactor REX2 encoding stress logics.
(this will have side effect that the estimated code will go up and mismatch with actual code size.)
Copy link
Contributor

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

@anthonycanino
Copy link
Contributor

@dotnet/avx512-contrib can we reopen this as a PR ready to review?

@BruceForstall BruceForstall reopened this Oct 21, 2024
@BruceForstall
Copy link
Member

@anthonycanino I re-opened it (it wasn't clear to me if your question implied you did not have permission to do so). Either you or @Ruihan-Yin need to update to latest main and resolve the conflicts, then mark it ready-for-review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
apx Related to the Intel Advanced Performance Extensions (APX) area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants