-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Add scalable vector type to JIT and HFA type for Vector<T> #121114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Current implementation derives a size based on the identified SIMD type, and then uses the size to derive the node type. It should instead directly derive the node type from the identified SIMD type, because some SIMD types will not have a statically known size, and this size may conflict with other SIMD types.
The function that determines the size of a local variable needs to have access to compiler state at runtime to handle variable types with sizes that depend on some runtime value, for example Vector<T> when backed by ARM64 scalable vectors.
The function that determines the size of an indirection needs to have access to compiler state at runtime to handle variable types with sizes that depend on some runtime value, for example Vector<T> when backed by ARM64 scalable vectors.
Create a new type designed to support Vector<T> with size evaluated at runtime. Adds a new HFA type to the VM to support passing Vector<T> as a scalable vector register on ARM64. Both types are experimental and locked behind the DOTNET_JitUseScalableVectorT configuration option. This first patch implements SVE codegen for Vector<T>, mainly for managing Vector<T> as a data structure that can be placed in a Z register. When DOTNET_JitUseScalableVectorT=1, the JIT will move the type around using SVE instructions operating on Z registers. It does not yet unlock longer vector lengths or implement operations from the Vector<T> API surface using SVE. This API still generates NEON code, which is functionally equivalent so long as Vector<T>.Count is limited to 128-bits. When DOTNET_JitUseScalableVectorT=0 the code generated for Vector<T> should have zero functional difference but may have some cosmetic differences as some refactoring has been done on general SIMD codegen to support the new type.
|
@dotnet/arm64-contrib @a74nh @tannergooding This is the outcome of the investigation I've been doing into supporting scalable vector types in the JIT. I've worked through a few problems and test failures but I still consider this early experimentation, feedback or suggestions on direction are appreciated. |
The length of VectorT is now a constant that is set on VM initialization, and the DAC needs to be able to replicate this state without access to the original SVE VL value.
d3b0136 to
931231d
Compare
src/coreclr/jit/compiler.h
Outdated
| return result; | ||
| } | ||
|
|
||
| unsigned getSizeOfType(var_types type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a lot of churn without good reason. In the eventual future, it won't be possible to get the size of TYP_SIMDSV as a constant anyway, and we will want to assert that TYP_SIMDSV is never passed to genTypeSize.
If the goal is to have a transitional period where TYP_SIMDSV has constant (but unknown) size then can we instead make genTypeSize get the size from a non-constant table and set the size of TYP_SIMDSV in that table from jitStartup (probably Compiler::compStartup)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the eventual future, it won't be possible to get the size of TYP_SIMDSV as a constant anyway
This is not certain yet in my mind, do we want to be able to query and make use of the vector length when we can (JIT mode)? We could use this information to save emitting rdvl, and directly embed the constant in codegen instead.
can we instead make genTypeSize get the size from a non-constant table and set the size of TYP_SIMDSV in that table from jitStartup (probably Compiler::compStartup)?
Yes I think this would reduce a lot of churn, but only if we answer 'no' to the above decision.
Otherwise, the vector length would need to be obtained from the EE via ICorJitInfo, because the compiler can't execute rdvl to obtain the length in all compilation modes. I didn't think this could be done from a static context because the info handle belongs to the Compiler instance, but correct me if there's a good way of doing this (I could use JitTls::GetCompiler?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not certain yet in my mind, do we want to be able to query and make use of the vector length when we can (JIT mode)? We could use this information to save emitting
rdvl, and directly embed the constant in codegen instead.
I think we should prefer to translate to the TYP_SIMD16, TYP_SIMD32 etc. types if we want to do something special for these cases (what Kunal was doing). Otherwise this introduces yet another combination of things to test for...
Otherwise, the vector length would need to be obtained from the EE via ICorJitInfo, because the compiler can't execute rdvl to obtain the length in all compilation modes. I didn't think this could be done from a static context because the info handle belongs to the Compiler instance, but correct me if there's a good way of doing this (I could use JitTls::GetCompiler?).
I think for the transitional period it would be fine to get it via ICorJitHost::getIntConfigValue (i.e. through JitConfig) or by introducing a new method on ICorJitHost. ICorJitHost is passed to jitStartup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for the transitional period it would be fine to get it via ICorJitHost::getIntConfigValue (i.e. through JitConfig) or by introducing a new method on ICorJitHost. ICorJitHost is passed to jitStartup.
Or ask the EE for Vector<T> size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds reasonable to me too. I guess we already have Compiler::getVectorTByteLength, we would just put its value into genTypeSizes table as soon as possible (and validate on subsequent JITs that the value is consistent)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should prefer to translate to the TYP_SIMD16, TYP_SIMD32 etc. types if we want to do something special for these cases (what Kunal was doing). Otherwise this introduces yet another combination of things to test for...
👍. I would prefer we keep things generally the same between NAOT and JIT scenarios.
A whole lot of stuff gets simpler if we only have to consider that TYP_SIMDSV means "unknown size" and TYP_SIMD16 means 16-bytes. It also reduces risk of bugs or other breaks due to mishandling of TYP_SIMDSV.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NAOT and JIT scenarios.
Nit: I expect we will use the unknown size for R2R too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you all for the comments.
I'll split out the refactoring related to getSizeOfType and revert it, and implement the genTypeSizes table changes instead.
I can leave implementing the actual query to the EE for the length until later, but I'll refer back here when doing so.
| DEF_TP(SIMD32 ,"simd32" , TYP_SIMD32, 32,32, 32, 8,16, VTR_FLOAT, availableDoubleRegs, RBM_FLT_CALLEE_SAVED, RBM_FLT_CALLEE_TRASH, VTF_S|VTF_VEC) | ||
| DEF_TP(SIMD64 ,"simd64" , TYP_SIMD64, 64,64, 64, 16,16, VTR_FLOAT, availableDoubleRegs, RBM_FLT_CALLEE_SAVED, RBM_FLT_CALLEE_TRASH, VTF_S|VTF_VEC) | ||
| #elif defined(TARGET_ARM64) | ||
| DEF_TP(SIMDSV ,"simdsv" , TYP_SIMDSV, 0,EAS,EAS, 0,16, VTR_FLOAT, availableDoubleRegs, RBM_FLT_CALLEE_SAVED, RBM_FLT_CALLEE_TRASH, VTF_S|VTF_VEC) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'd have a preference towards just TYP_SIMD as the name.
That then fits with the mapping of Vector<T> not having a suffix but Vector64/128/256/512 having one and also doesn't tie the concept down to Arm64 specific terminology -- other CPU architectures have similar concepts of vectors or matrices whose size aren't statically known, but are rather hardware dependent. So I imagine this functionality will be repurposed and shared over time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem, should be easily changed at any point with search and replace. I've had in mind that this type may be repurposed as 'the best possible vector implementation for the architecture'.
| #define BRS EA_BYREF | ||
| #define EPS EA_PTRSIZE | ||
| #ifdef TARGET_ARM64 | ||
| #define EAS EA_SCALABLE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, I wonder if SCALABLE is the right term here.
It would seem like the existing EA_UNKNOWN might be sufficient and would trigger relevant asserts in places that aren't correctly equipped to handle it. Alternatively, a different term might be a better fit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I used this change to reveal the areas in the Emitter that needed to support SVE instructions as well as NEON instructions for the type. I agree EA_UNKNOWN would be better for making this apply more widely.
I'll need to find a way to remap EA_UNKNOWN back to EA_SCALABLE for ARM64 as EA_SCALABLE is the main marker we're using to choose SVE instructions. There should be plenty of information available to do this.
src/coreclr/inc/dacvars.h
Outdated
|
|
||
| DEFINE_DACVAR(DWORD, dac__g_gcNotificationFlags, g_gcNotificationFlags) | ||
| DEFINE_DACVAR(DWORD, dac__g_vectorTByteLength, ::g_vectorTByteLength) | ||
| DEFINE_DACVAR(BOOL, dac__g_vectorTIsScalable, ::g_vectorTIsScalable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it not enough for the DAC to know about Vector<T> size? (It knows that already.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite an unfamiliar area of the code for me, am I understanding correctly that the VM is going to serialize the Vector<T> MethodTable for the DAC rather than rely on the DAC to reconstruct it from type information? Therefore we don't need this extra variable.
The boolean flag is interesting because it encodes the value of the environment variable. Would the DAC have access to this as well? The setting is only relevant when sizeof(TYP_SIMDSV) == sizeof(TYP_SIMD16), we need to manually choose whether to emit NEON or SVE because the sizes are the same. But maybe there's no need for it unless the debugger wants to generate code itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The boolean flag is interesting because it encodes the value of the environment variable. Would the DAC have access to this as well?
Why does the DAC need to know about that?
The debugger/DAC related changes should start with explanation of the debugger feature. Also, it would be best to submit any debugger related changes separately from JIT changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the DAC need to know about that?
I thought I needed to make some of the member variables to EEJitManager that I added compatible with the DAC to fix some compilation errors, but I'll look again. I'm not intending to introduce a feature new here.
Is CoreLibBinder::GetClass(CLASS__VECTORT) the correct way to obtain the MethodTable for Vector<T> in this situation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix some compilation errors
You should be able to fix the compilation errors by ifdefing out the code using DACCESS_COMPILE.
Is CoreLibBinder::GetClass(CLASS__VECTORT) the correct way to obtain the MethodTable for Vector
Yes, but I do not think you need that for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up having to get the size from the MethodTable, because GetSizeOfVectorT was not DAC compatible and I used it in calling convention code, which seems to be required in the DAC. So now I set the size once when constructing the MethodTable and use this as the source of truth everywhere else.
| CORINFO_HFA_ELEM_DOUBLE, | ||
| CORINFO_HFA_ELEM_VECTOR64, | ||
| CORINFO_HFA_ELEM_VECTOR128, | ||
| CORINFO_HFA_ELEM_VECTORT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've noted down to revisit this on the issue, as technically a Vector<T> would be a 'Pure Scalable Type' not a HFA/HVA. It looks very similar in principal but there may be some subtle differences.
9ba8b1a to
edb1ad7
Compare
genTypeSize is now sufficient for this purpose
| #endif | ||
|
|
||
| if (numInstanceFieldBytes != 16) | ||
| if (vectorTSize > 0 && vectorTSize != 16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (vectorTSize > 0 && vectorTSize != 16) | |
| if (vectorTSize > 0) |
Nit: vectorTSize != 16 check should not be needed.
|
@snickolls-arm I would recommend submitting more smaller PRs. Smaller PRs are easier review and make things faster to move forward. IMHO, this PR can be split into at least 3 PRs:
|
This branch introduces a new type (
TYP_SIMDSV) to the JIT for supporting scalable vectors, registers whose size is determined by hardware at runtime but remains constant for the duration of a process. For ARM64, this means we have vectors sized in powers of 2 from 128 bits up to 2048 bits depending on hardware implementation, with an instruction available to query this size for compiler use. I've also adjusted the implementation ofTYP_MASKto scale with the vector length on ARM64 in a similar manner.This builds on and borrows much of Kunal's work in: #115948.
This PR focuses on enabling scalable type awareness as a foundation for future vector length agnostic code generation. The new type is allowing the JIT to emit SVE register moves, loads and stores for
Vector<T>etc. but doesn't change the implementation of theVector<T>API surface. It still emits NEON for arithmetic operations, logical operations, floating point operations and so on.The compiler resolves the sizes for
TYP_SIMDSVandTYP_MASKearly in initialization and updates the type system data with this information. For now, the vector length is still fixed at 128 bits in order to preserve the functionality, as there is a mixed state of NEON and SVE instructions. Once full SVE codegen is implemented for these types, it will be possible to let their sizes scale with the underlying hardware vector length.With this type being passed around, we can begin implementing vector length agnostic code in subsequent work, as we can now test for a
TYP_SIMDSVand distinguish it from a fixed sizeTYP_SIMD16.Testing
Importing
Vector<T>as the new type is gated behindDOTNET_JitUseScalableVectorT, meaningTYP_SIMDSVwill not appear in compilation unless that variable is set. Likewise, the VM will not report use of the HFA typeCORINFO_HFA_ELEM_VECTORTunless this variable is set. With the variable set, testing is validating the implementation of the new type. Without the variable set, testing is validating thatTYP_SIMD16behavior remains stable under these changes.SuperPMI method contexts are currently out of sync with the updated JIT-EE interface, which causes mismatches in return values between the JIT and EE. I don't expect this to stop until
DOTNET_JitUseScalableVectorTis removed and standardized, so this feature will need to be tested with some specially generated MCH files while being developed.Future
We will be able to remove
DOTNET_JitUseScalableVectorTonceVector<T>is working in AOT compilation. This will require all phases to be aware thatTYP_SIMDSVis dynamically sized and make the choice if the pass can run or not depending on this.For a transitional approach, we could allow specifying a fixed target vector length for AOT compilation while broader vector-length agnostic support is implemented. In some cases we might be able to take advantage of knowing the vector size at compilation time, so having both approaches available might be advantageous for JIT mode.
Code Example
With
DOTNET_JitUseScalableVectorT=1, the main vector loop body compiles to:Contributing towards #120599