Add scalable vector type to JIT and HFA type for Vector<T> #121114

snickolls-arm · 2025-10-27T13:35:47Z

This branch introduces a new type (TYP_SIMDSV) to the JIT for supporting scalable vectors, registers whose size is determined by hardware at runtime but remains constant for the duration of a process. For ARM64, this means we have vectors sized in powers of 2 from 128 bits up to 2048 bits depending on hardware implementation, with an instruction available to query this size for compiler use. I've also adjusted the implementation of TYP_MASK to scale with the vector length on ARM64 in a similar manner.

This builds on and borrows much of Kunal's work in: #115948.

This PR focuses on enabling scalable type awareness as a foundation for future vector length agnostic code generation. The new type is allowing the JIT to emit SVE register moves, loads and stores for Vector<T> etc. but doesn't change the implementation of the Vector<T> API surface. It still emits NEON for arithmetic operations, logical operations, floating point operations and so on.

The compiler resolves the sizes for TYP_SIMDSV and TYP_MASK early in initialization and updates the type system data with this information. For now, the vector length is still fixed at 128 bits in order to preserve the functionality, as there is a mixed state of NEON and SVE instructions. Once full SVE codegen is implemented for these types, it will be possible to let their sizes scale with the underlying hardware vector length.

With this type being passed around, we can begin implementing vector length agnostic code in subsequent work, as we can now test for a TYP_SIMDSV and distinguish it from a fixed size TYP_SIMD16.

Testing

Importing Vector<T> as the new type is gated behind DOTNET_JitUseScalableVectorT, meaning TYP_SIMDSV will not appear in compilation unless that variable is set. Likewise, the VM will not report use of the HFA type CORINFO_HFA_ELEM_VECTORT unless this variable is set. With the variable set, testing is validating the implementation of the new type. Without the variable set, testing is validating that TYP_SIMD16 behavior remains stable under these changes.

SuperPMI method contexts are currently out of sync with the updated JIT-EE interface, which causes mismatches in return values between the JIT and EE. I don't expect this to stop until DOTNET_JitUseScalableVectorT is removed and standardized, so this feature will need to be tested with some specially generated MCH files while being developed.

Future

We will be able to remove DOTNET_JitUseScalableVectorT once Vector<T> is working in AOT compilation. This will require all phases to be aware that TYP_SIMDSV is dynamically sized and make the choice if the pass can run or not depending on this.

For a transitional approach, we could allow specifying a fixed target vector length for AOT compilation while broader vector-length agnostic support is implemented. In some cases we might be able to take advantage of knowing the vector size at compilation time, so having both approaches available might be advantageous for JIT mode.

Code Example

static void SimdAdd(ReadOnlySpan<float> a, ReadOnlySpan<float> b, Span<float> c)
{
    int len = Math.Min(Math.Min(a.Length, b.Length), c.Length);
    int i = 0;

    if (Vector.IsHardwareAccelerated)
    {
        int width = Vector<float>.Count;
        for (; i <= len - width; i += width)
        {
            var va = new Vector<float>(a.Slice(i, width));
            var vb = new Vector<float>(b.Slice(i, width));
            (va + vb).CopyTo(c.Slice(i, width));
        }
    }

   // Scalar tail (or full path if no HW acceleration)
   for (; i < len; i++)
       c[i] = a[i] + b[i];
}

With DOTNET_JitUseScalableVectorT=1, the main vector loop body compiles to:

G_M23455_IG03:        ; offs=0x000020, size=0x0044, bbWeight=4, PerfScore 100.00, gcrefRegs=0000 {}, byrefRegs=0015 {x0 x2 x4}, BB03 [0002], BB20 [0018], BB32 [0033], BB44 [0048], byref, isz

IN000b: 000020      mov     w8, w7
IN000c: 000024      add     x9, x8, #4
IN000d: 000028      cmp     x9, w1, UXTW
IN000e: 00002C      bhi     G_M23455_IG10
IN000f: 000030      lsl     x8, x8, #2
IN0010: 000034      add     x10, x0, x8
IN0011: 000038      ldr     z24, [x10]                                 ;; was ldr q24, [x10]
IN0012: 00003C      cmp     x9, w3, UXTW
IN0013: 000040      bhi     G_M23455_IG10
IN0014: 000044      add     x10, x2, x8
IN0015: 000048      ldr     z25, [x10]                                 ;; was ldr q25, [x10]
IN0016: 00004C      fadd    v24.4s, v24.4s, v25.4s
IN0017: 000050      cmp     x9, w5, UXTW
IN0018: 000054      bhi     G_M23455_IG10
IN0019: 000058      add     x8, x4, x8
IN001a: 00005C      str     z24, [x8]                                  ;; was str q24, [x8]
IN001b: 000060      add     w7, w7, #4

G_M23455_IG04:        ; offs=0x000064, size=0x000C, bbWeight=8, PerfScore 16.00, gcrefRegs=0000 {}, byrefRegs=0015 {x0 x2 x4}, loop=IG03, BB04 [0003], byref, isz

IN001c: 000064      sub     w8, w6, #4
IN001d: 000068      cmp     w8, w7
IN001e: 00006C      bge     G_M23455_IG03

Contributing towards #120599

Current implementation derives a size based on the identified SIMD type, and then uses the size to derive the node type. It should instead directly derive the node type from the identified SIMD type, because some SIMD types will not have a statically known size, and this size may conflict with other SIMD types.

The function that determines the size of a local variable needs to have access to compiler state at runtime to handle variable types with sizes that depend on some runtime value, for example Vector<T> when backed by ARM64 scalable vectors.

The function that determines the size of an indirection needs to have access to compiler state at runtime to handle variable types with sizes that depend on some runtime value, for example Vector<T> when backed by ARM64 scalable vectors.

Create a new type designed to support Vector<T> with size evaluated at runtime. Adds a new HFA type to the VM to support passing Vector<T> as a scalable vector register on ARM64. Both types are experimental and locked behind the DOTNET_JitUseScalableVectorT configuration option. This first patch implements SVE codegen for Vector<T>, mainly for managing Vector<T> as a data structure that can be placed in a Z register. When DOTNET_JitUseScalableVectorT=1, the JIT will move the type around using SVE instructions operating on Z registers. It does not yet unlock longer vector lengths or implement operations from the Vector<T> API surface using SVE. This API still generates NEON code, which is functionally equivalent so long as Vector<T>.Count is limited to 128-bits. When DOTNET_JitUseScalableVectorT=0 the code generated for Vector<T> should have zero functional difference but may have some cosmetic differences as some refactoring has been done on general SIMD codegen to support the new type.

snickolls-arm · 2025-10-27T13:42:50Z

@dotnet/arm64-contrib @a74nh @tannergooding

This is the outcome of the investigation I've been doing into supporting scalable vector types in the JIT. I've worked through a few problems and test failures but I still consider this early experimentation, feedback or suggestions on direction are appreciated.

…_Plus_Imm

The length of VectorT is now a constant that is set on VM initialization, and the DAC needs to be able to replicate this state without access to the original SVE VL value.

jakobbotsch · 2025-11-03T12:51:24Z

src/coreclr/jit/compiler.h

        return result;
    }

+    unsigned getSizeOfType(var_types type);


This seems like a lot of churn without good reason. In the eventual future, it won't be possible to get the size of TYP_SIMDSV as a constant anyway, and we will want to assert that TYP_SIMDSV is never passed to genTypeSize.

If the goal is to have a transitional period where TYP_SIMDSV has constant (but unknown) size then can we instead make genTypeSize get the size from a non-constant table and set the size of TYP_SIMDSV in that table from jitStartup (probably Compiler::compStartup)?

In the eventual future, it won't be possible to get the size of TYP_SIMDSV as a constant anyway

This is not certain yet in my mind, do we want to be able to query and make use of the vector length when we can (JIT mode)? We could use this information to save emitting rdvl, and directly embed the constant in codegen instead.

can we instead make genTypeSize get the size from a non-constant table and set the size of TYP_SIMDSV in that table from jitStartup (probably Compiler::compStartup)?

Yes I think this would reduce a lot of churn, but only if we answer 'no' to the above decision.

Otherwise, the vector length would need to be obtained from the EE via ICorJitInfo, because the compiler can't execute rdvl to obtain the length in all compilation modes. I didn't think this could be done from a static context because the info handle belongs to the Compiler instance, but correct me if there's a good way of doing this (I could use JitTls::GetCompiler?).

This is not certain yet in my mind, do we want to be able to query and make use of the vector length when we can (JIT mode)? We could use this information to save emitting rdvl, and directly embed the constant in codegen instead.

I think we should prefer to translate to the TYP_SIMD16, TYP_SIMD32 etc. types if we want to do something special for these cases (what Kunal was doing). Otherwise this introduces yet another combination of things to test for...

Otherwise, the vector length would need to be obtained from the EE via ICorJitInfo, because the compiler can't execute rdvl to obtain the length in all compilation modes. I didn't think this could be done from a static context because the info handle belongs to the Compiler instance, but correct me if there's a good way of doing this (I could use JitTls::GetCompiler?).

I think for the transitional period it would be fine to get it via ICorJitHost::getIntConfigValue (i.e. through JitConfig) or by introducing a new method on ICorJitHost. ICorJitHost is passed to jitStartup.

I think for the transitional period it would be fine to get it via ICorJitHost::getIntConfigValue (i.e. through JitConfig) or by introducing a new method on ICorJitHost. ICorJitHost is passed to jitStartup.

Or ask the EE for Vector<T> size?

That sounds reasonable to me too. I guess we already have Compiler::getVectorTByteLength, we would just put its value into genTypeSizes table as soon as possible (and validate on subsequent JITs that the value is consistent)

I think we should prefer to translate to the TYP_SIMD16, TYP_SIMD32 etc. types if we want to do something special for these cases (what Kunal was doing). Otherwise this introduces yet another combination of things to test for...

👍. I would prefer we keep things generally the same between NAOT and JIT scenarios.

A whole lot of stuff gets simpler if we only have to consider that TYP_SIMDSV means "unknown size" and TYP_SIMD16 means 16-bytes. It also reduces risk of bugs or other breaks due to mishandling of TYP_SIMDSV.

NAOT and JIT scenarios.

Nit: I expect we will use the unknown size for R2R too.

Thank you all for the comments.

I'll split out the refactoring related to getSizeOfType and revert it, and implement the genTypeSizes table changes instead.

I can leave implementing the actual query to the EE for the length until later, but I'll refer back here when doing so.

tannergooding · 2025-11-03T16:31:19Z

src/coreclr/jit/typelist.h

 DEF_TP(SIMD32   ,"simd32" , TYP_SIMD32,  32,32, 32,   8,16, VTR_FLOAT, availableDoubleRegs, RBM_FLT_CALLEE_SAVED,    RBM_FLT_CALLEE_TRASH,    VTF_S|VTF_VEC)
 DEF_TP(SIMD64   ,"simd64" , TYP_SIMD64,  64,64, 64,  16,16, VTR_FLOAT, availableDoubleRegs, RBM_FLT_CALLEE_SAVED,    RBM_FLT_CALLEE_TRASH,    VTF_S|VTF_VEC)
+#elif defined(TARGET_ARM64)
+DEF_TP(SIMDSV   ,"simdsv" , TYP_SIMDSV,  0,EAS,EAS,   0,16, VTR_FLOAT, availableDoubleRegs, RBM_FLT_CALLEE_SAVED,    RBM_FLT_CALLEE_TRASH,    VTF_S|VTF_VEC)


nit: I'd have a preference towards just TYP_SIMD as the name.

That then fits with the mapping of Vector<T> not having a suffix but Vector64/128/256/512 having one and also doesn't tie the concept down to Arm64 specific terminology -- other CPU architectures have similar concepts of vectors or matrices whose size aren't statically known, but are rather hardware dependent. So I imagine this functionality will be repurposed and shared over time.

No problem, should be easily changed at any point with search and replace. I've had in mind that this type may be repurposed as 'the best possible vector implementation for the architecture'.

tannergooding · 2025-11-03T16:34:11Z

src/coreclr/jit/typelist.h

 #define BRS EA_BYREF
 #define EPS EA_PTRSIZE
+#ifdef TARGET_ARM64
+#define EAS EA_SCALABLE


Similarly, I wonder if SCALABLE is the right term here.

It would seem like the existing EA_UNKNOWN might be sufficient and would trigger relevant asserts in places that aren't correctly equipped to handle it. Alternatively, a different term might be a better fit

Yes, I used this change to reveal the areas in the Emitter that needed to support SVE instructions as well as NEON instructions for the type. I agree EA_UNKNOWN would be better for making this apply more widely.

I'll need to find a way to remap EA_UNKNOWN back to EA_SCALABLE for ARM64 as EA_SCALABLE is the main marker we're using to choose SVE instructions. There should be plenty of information available to do this.

jkotas · 2025-11-04T05:20:17Z

src/coreclr/inc/dacvars.h


 DEFINE_DACVAR(DWORD, dac__g_gcNotificationFlags, g_gcNotificationFlags)
+DEFINE_DACVAR(DWORD, dac__g_vectorTByteLength, ::g_vectorTByteLength)
+DEFINE_DACVAR(BOOL,  dac__g_vectorTIsScalable, ::g_vectorTIsScalable)


Why is it not enough for the DAC to know about Vector<T> size? (It knows that already.)

This is quite an unfamiliar area of the code for me, am I understanding correctly that the VM is going to serialize the Vector<T> MethodTable for the DAC rather than rely on the DAC to reconstruct it from type information? Therefore we don't need this extra variable.

The boolean flag is interesting because it encodes the value of the environment variable. Would the DAC have access to this as well? The setting is only relevant when sizeof(TYP_SIMDSV) == sizeof(TYP_SIMD16), we need to manually choose whether to emit NEON or SVE because the sizes are the same. But maybe there's no need for it unless the debugger wants to generate code itself.

The boolean flag is interesting because it encodes the value of the environment variable. Would the DAC have access to this as well?

Why does the DAC need to know about that?

The debugger/DAC related changes should start with explanation of the debugger feature. Also, it would be best to submit any debugger related changes separately from JIT changes.

Why does the DAC need to know about that?

I thought I needed to make some of the member variables to EEJitManager that I added compatible with the DAC to fix some compilation errors, but I'll look again. I'm not intending to introduce a feature new here.

Is CoreLibBinder::GetClass(CLASS__VECTORT) the correct way to obtain the MethodTable for Vector<T> in this situation?

fix some compilation errors

You should be able to fix the compilation errors by ifdefing out the code using DACCESS_COMPILE.

Is CoreLibBinder::GetClass(CLASS__VECTORT) the correct way to obtain the MethodTable for Vector

Yes, but I do not think you need that for this PR.

I ended up having to get the size from the MethodTable, because GetSizeOfVectorT was not DAC compatible and I used it in calling convention code, which seems to be required in the DAC. So now I set the size once when constructing the MethodTable and use this as the source of truth everywhere else.

TYP_SIMDSV

snickolls-arm · 2025-11-06T10:23:11Z

src/coreclr/inc/corhdr.h

    CORINFO_HFA_ELEM_DOUBLE,
    CORINFO_HFA_ELEM_VECTOR64,
    CORINFO_HFA_ELEM_VECTOR128,
+    CORINFO_HFA_ELEM_VECTORT,


I've noted down to revisit this on the issue, as technically a Vector<T> would be a 'Pure Scalable Type' not a HFA/HVA. It looks very similar in principal but there may be some subtle differences.

genTypeSize is now sufficient for this purpose

jkotas · 2025-11-07T02:57:55Z

src/coreclr/vm/methodtablebuilder.cpp

+#endif

-    if (numInstanceFieldBytes != 16)
+    if (vectorTSize > 0 && vectorTSize != 16)


Suggested change

if (vectorTSize > 0 && vectorTSize != 16)

if (vectorTSize > 0)

Nit: vectorTSize != 16 check should not be needed.

jkotas · 2025-11-07T03:04:49Z

@snickolls-arm I would recommend submitting more smaller PRs. Smaller PRs are easier review and make things faster to move forward. IMHO, this PR can be split into at least 3 PRs:

Add TYP_SIMDSV in the JIT
Add JitUseScalableVectorT config and respect it in the type loader (methodtablebuilder.cpp)
Recognize HFAs (PSTs) composed from scalable vectors

snickolls-arm added 4 commits October 24, 2025 12:56

github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 27, 2025

dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Oct 27, 2025

jkotas added the arm-sve Work related to arm64 SVE/SVE2 support label Oct 27, 2025

snickolls-arm added 4 commits October 28, 2025 10:35

Fix typo in immediate comparison in Compiler::instGen_Set_Reg_To_Base…

8ae80c3

…_Plus_Imm

Fix conditional compilation for config values

5558a08

Fix macro expansion for MSVC

79ba8ba

Make the length of VectorT accessible to the DAC

931231d

The length of VectorT is now a constant that is set on VM initialization, and the DAC needs to be able to replicate this state without access to the original SVE VL value.

snickolls-arm force-pushed the scalable-vector-type branch from d3b0136 to 931231d Compare November 3, 2025 12:46

jakobbotsch reviewed Nov 3, 2025

View reviewed changes

This was referenced Nov 3, 2025

Failed to install runtime_python_requirements #114924

Open

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

tannergooding reviewed Nov 3, 2025

View reviewed changes

jkotas reviewed Nov 4, 2025

View reviewed changes

Fix DAC compilation and implement edit to genTypeSize table for

ab63432

TYP_SIMDSV

build-analysis bot mentioned this pull request Nov 5, 2025

error : HttpRequestException: The SSL connection could not be established, see inner exception. dotnet/dnceng#5015

Open

3 tasks

Fix merge issues, replayed commits

b950fa5

snickolls-arm commented Nov 6, 2025

View reviewed changes

snickolls-arm mentioned this pull request Nov 6, 2025

Accelerating Vector<T> with SVE on ARM64 #120599

Open

8 tasks

Move type initialization earlier and fix cast issue on MSVC

edb1ad7

snickolls-arm force-pushed the scalable-vector-type branch from 9ba8b1a to edb1ad7 Compare November 6, 2025 11:44

Remove getSizeOfSIMDType

af6fd80

genTypeSize is now sufficient for this purpose

build-analysis bot mentioned this pull request Nov 6, 2025

[android-arm64] The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#6408

Open

3 tasks

jkoritzinsky mentioned this pull request Nov 6, 2025

Certificate issues when Sending To Helix from Apple platforms dotnet/dnceng#6410

Open

jkotas reviewed Nov 7, 2025

View reviewed changes

snickolls-arm mentioned this pull request Nov 10, 2025

Map SIMD types directly from class names #121489

Open

	if (vectorTSize > 0 && vectorTSize != 16)
	if (vectorTSize > 0)

Add scalable vector type to JIT and HFA type for Vector<T> #121114

Are you sure you want to change the base?

Add scalable vector type to JIT and HFA type for Vector<T> #121114

Conversation

snickolls-arm commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing

Future

Code Example

Uh oh!

snickolls-arm commented Oct 27, 2025

Uh oh!

jakobbotsch Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jakobbotsch Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkotas Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

snickolls-arm Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkotas commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

snickolls-arm commented Oct 27, 2025 •

edited

Loading

jakobbotsch Nov 3, 2025 •

edited

Loading

jakobbotsch Nov 3, 2025 •

edited

Loading

jkotas Nov 4, 2025 •

edited

Loading

snickolls-arm Nov 5, 2025 •

edited

Loading