Closed
Description
Until Armv8, NEON architecture enabled users to write vectorized code using SIMD instructions. The vector length for NEON instructions remains fixed at 128 bits. To take into account High Performance Computing applications, newer versions of Arm like v9 has developed Scalable Vector Extension (SVE). SVE is a new programming model that contains SIMD instructions operating on flexible vector length, ranging from 128 bits to 2048 bits. SVE2 extends SVE to enable domains like Computer vision, multimedia, etc. More details about SVE can be found on Arm's website. In .NET 9, we want to start the foundational work in .NET libraries and RyuJIT to add support for SVE2 feature.
- Design: The instructions present in SVE2 needs to be exposed through .NET libraries to our customers. However, we want to make sure that the customers don't have to rewrite their code to consume SVE2 functionality and the experience of using SVE2 features should be seamless. Currently, .NET has
Vector<T>
that represents flexible vector length depending on the underlying hardware and the idea is to expose SVE2 "flexible vector length" functionality usingVector<T>
. We need to think about the common instructions and validate if they can be represented by APIs that just takesVector<T>
as parameter. Additionally, SVE2 introduces "predicate register" to mark vector lanes active/inactive. We do not want to expose this concept as well to our consumer through .NET libraries because of reason mentioned above. Hence, for every API, need to come up with a pseudo code of how the API should be implemented internally in JIT such that the "predicate register" concept is created and consumed inside JIT implicitly. There is a good discussion that happened in the past about the API proposal in [API Proposal]: Example usages of a VectorSVE API #88140. - Implement APIs in
System.Runtime.Intrinsic.Arm.SVE2
: Once design is finalized, need to add all the APIs in a newSVE2
class underSystem.Runtime.Intrinsic.Arm
. They need to be plumbed through the JIT (by transforming tree nodes to represent "predicate register" concept, if needed). all the way to generating code. Arm64: Implement SVE APIs #99957 - If possible, automatically generate the test template for various APIs
- Update Antigen to pick up the new SVE APIs
- SVE hardware availability in CI
- Backend support: Regardless of API design, we need to add instruction encoding of all the SVE2 instructions that we are planning to support. Here is a rough list of things that needs to happen to add the new instructions. Covered in Arm64: Implement SVE encodings #94549.
- Add new entries in hwinstrinsiclistarm64.h for AdvSimd.SVE2 APIs
- Depending on the API call the right emitIns_*() code
-
emitIns_*()
methods add the new instructions support -
emitfmtsarm64.h
- needs to add new instruction formatting Arm64: SVE/SVE2 encodings #94285 -
instrsarm64.h
- needs to add new instruction to instruction formatting mapping Arm64: SVE/SVE2 encodings #94285 - Add the encoding for new instructions in
emitOutputInstr()
- Add new
Zx
registers and predicate registers LSRA: Add support to track more than 64 registers #99658 - Add JitStressRegs mode to always allocate high Z/P register unless the candidate specifically says to use low registers.
- Make sure TP regression is minimum
- Testing if the encoding matches the display ins using windbg/msvc/cordistools
- Add entries for the new instructions in
genArm64EmitterUnitTests()
- Fix the
formatEncode*
data - Refactor and consolidate the common code/asserts, convert some methods lookup like
insSveIsLslN
/insSveGetLslOrModN
into table drive loop up. - Possibly frame layout code update such that the
SVE_simd
andSVE_mask
are towards the end of the frame in order for offset in stack to take into consider the variable VL.
- Automation for encoding: If we see above list, it is very time consuming to add each instruction's encoding in the code base. There are 800+ instructions and considering 30 mins for each instruction, it will take 400+ human hours to add and validate the encoding. Hence, there is an experiment going on to generate the encoding data and the C++ code around it automatically. The understanding is that the C++ code generated will not be accurate and manual inspection will still be required.
Edit: A working prototype of the tool that generates 2 C++ files for encoding is in Arm64: SVE/SVE2 encodings #94285.- Produce a json/xml file that contains all the encoding data of all the instructions. A good resource from which this can be extracted is here.
- Recreate
instrsarm64.h
that contains the existing as well as the new formats along with encoding binary representation and hexadecimal representation. - If there are new instruction formats, add their entries in
emitfmtsarm64.h
file. - Based on the
mmmm
,dddd
, etc. in the binary representation of encoding, have to tool write a logic to produce the instruction bytes. In other words, this will generate code that can be pasted inemitOutputInstr()
function. - Depending on the number of encodings in each group of instructions, they can be tied to appropriate
INST*
likeINST9
orINST8
, etc. and regenerated in sorted order.
Note that if we need to regenerate existing files likeinstrsarm64.h
andemitfmtsarm64.h
, existing instruction's encoding also needs to be generated by the tool.
- MOVPRFX instruction support: GCStress, sanity check. Also, when emitting every single instruction, need to make sure that if previous instruction was
movprfx
, we verify it follows all the rules with regards to destination registers, size, etc. Arm64/Sve: Validation for movprfx in emitter #105514 - Before consuming the new APIs in libraries, make sure that Mono has support for it or else
SVE.IsSupported
should returnfalse
for it. - On Windows, make sure that new SVE registers as well as predicate registers are available during debugging as well as
Vector<T>
shows the right number of elements. - ABI related changes for SVE/predicate registers in VM
- Support breakpoint patching for SVE instructions
ICorDebug
needs to be updated to know which SVE instructions have relative read/write/jumps - walker.cpp is specifically the place that needs to be updated- Similar work has been done for AVX512 - Support breakpoints on AVX-512 instructions #89705
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Done