Skip to content

HFA/Vector calling convention representation in the Jit on arm64. #37341

Closed
@sandreenko

Description

@sandreenko

Hello, I could use some help with a new model for our return/call SIMD* typing that I am implementing, but first a few examples of what is happening now.

1 example:
    [MethodImpl(MethodImplOptions.NoInlining)]
    static Vector4 ReturnVector4()
    {
        return new Vector4(1);
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    static Vector4 ReturnVector4UsingCall()
    {
        return ReturnVector4();
    }

IL for ReturnVector4UsingCall is very simple: call ReturnVector4; ret,
IR would be ASG(LCL_VAR, call); return LCL_VAR;
The complexity is that Arm64 supports both vector and HFA calling conventions, in this case
Vector4 is an HFA value, so we have to return it as v0.s[0], v1.s[0], v2.s[0], v3.s[0].
Now let's see how we import this call and with which type:

  1. create it as TYP_STRUCT, using callRetTyp = JITtype2varType(calliSig.retType) in impImportCall;
  2. change it to TYP_SIMD16 in impImportCall: callRetTyp = impNormStructType(actualMethodRetTypeSigClass); call->gtType = callRetTyp;;
  3. change it back to TYP_STRUCT in impAssignStructPtr: src->gtType = genActualType(returnType); and that is the final value of the type.
    a fun side-effect: even if call result is not used we are still creating ASG(LCL_VAR, call), change call type to struct and only later delete the ASG leaving call with the correct struct type.

Note for !compDoOldStructRetyping(): I don't do 2. and 3., so create as TYP_STRUCT and keep it.

and the return in this case is STRUCT, so we end up with nice IR:

***** BB01
STMT00000 (IL 0x000...0x005)
N005 ( 15,  4) [000003] -ACXG---R---              *  ASG       simd16 (copy)
N004 (  1,  1) [000001] D------N----              +--*  LCL_VAR   simd16<System.Numerics.Vector4> V01 tmp1         d:1
N003 ( 15,  4) [000000] --CXG-------              \--*  CALL r2r_ind struct TestHFAandHVA.ReturnVector4,NA,NA,NA
N002 (  1,  1) [000006] ------------ arg0 in x11     \--*  CNS_INT(h) long   0x29e89a04b90 ftn REG x11

***** BB01
STMT00001 (IL   ???...  ???)
N002 (  2,  2) [000005] ------------              *  RETURN    struct
N001 (  1,  1) [000004] -------N----              \--*  LCL_VAR   simd16<System.Numerics.Vector4> V01 tmp1         u:1 (last use)

Note/todo/fun fact: if we did not set LCL_VAR type as SIMD16 and keep it as a struct, then copy prop would optimize it as:

N002 (  2,  2) [000005] ------------              *  RETURN    struct
N003 ( 15,  4) [000000] --CXG-------              \--*  CALL r2r_ind struct TestHFAandHVA.ReturnVector4,NA,NA,NA
N002 (  1,  1) [000006] ------------ arg0 in x11     \--*  CNS_INT(h) long   0x29e89a04b90 ftn REG x11

Summary 1: in HFA case we type call and return as TYP_STRUCT with some confusing transformations in the middle.

2 example:
    [MethodImpl(MethodImplOptions.NoInlining)]
    static Vector<int> ReturnVectorInt()
    {
        return new Vector<int>();
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    static Vector<int> ReturnVectorIntUsingCall()
    {
        return ReturnVectorInt();
    }
  1. create it as TYP_STRUCT, using callRetTyp = JITtype2varType(calliSig.retType) in impImportCall;
  2. change it to TYP_SIMD16 in impImportCall: callRetTyp = impNormStructType(actualMethodRetTypeSigClass); call->gtType = callRetTyp;;
  3. keep it as TYP_SIMD16 in impAssignStructPtr: src->gtType = genActualType(returnType);.
    and IR looks good:
***** BB01
STMT00000 (IL 0x000...0x005)
N005 ( 15,  4) [000003] -ACXG---R---              *  ASG       simd16 (copy)
N004 (  1,  1) [000001] D------N----              +--*  LCL_VAR   simd16<System.Numerics.Vector4> V01 tmp1         d:1
N003 ( 15,  4) [000000] --CXG-------              \--*  CALL r2r_ind struct TestHFAandHVA.ReturnVector4,NA,NA,NA
N002 (  1,  1) [000006] ------------ arg0 in x11     \--*  CNS_INT(h) long   0x29e89a04b90 ftn REG x11

***** BB01
STMT00001 (IL   ???...  ???)
N002 (  2,  2) [000005] ------------              *  RETURN    struct
N001 (  1,  1) [000004] -------N----              \--*  LCL_VAR   simd16<System.Numerics.Vector4> V01 tmp1         u:1 (last use)

Summary 1,2: based on these 2 examples we could think that TYP_SIMD16 on a call or a return means passed in a single vector register and it will have TYP_STRUCT when it is an HFA,
and TYP_STRUCT can be assigned to TYP_SIMD16, but...

3 example:
    struct A
    {
        bool a;
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    static Vector<A> ReturnVectorNotKnown()
    {
        return new Vector<A>();
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    static Vector<A> ReturnVectorNotKnownUsingCall()
    {
        return ReturnVectorNotKnown();
    }

guess which type Jit will use for it before you read the answer :-)

The IR after importation will be:

               [000001] --C-G-------              *  RETURN    simd16
               [000000] --C-G-------              \--*  CALL r2r_ind struct TestHFAandHVA.ReturnVectorNotKnown

because for the return TYPE we ask VM and for call type we use getBaseTypeAndSizeOfSIMDType that can only parse known primitive types, so we get a nice mistyping out of nowhere,
does not look like a problem so far, JIT can handle it using morph::fgFixupStructReturn that sets call type to simd16.

3.1. example:

add a temp local var to the last example:

    [MethodImpl(MethodImplOptions.NoInlining)]
    static Vector<A> ReturnVectorNotKnownUsingCallAndTemp()
    {
        var a = ReturnVectorNotKnown();
        return a;
    }

and we have IR that we want right after importation, thanks to impAssignStructPtr from the first example:

***** BB01
STMT00000 (IL 0x000...0x005)
               [000003] -AC-G-------              *  ASG       simd16 (copy)
               [000001] D------N----              +--*  LCL_FLD   simd16 V00 loc0         [+0]
               [000000] --C-G-------              \--*  CALL r2r_ind simd16 TestHFAandHVA.ReturnVectorNotKnown

***** BB01
STMT00001 (IL 0x006...0x007)
               [000005] ------------              *  RETURN    simd16
               [000004] ------------              \--*  LCL_FLD   simd16 V00 loc0         [+0]

but V00 is created as STRUCT, so can't put it in a register, sad:

Generating: N009 ( 15,  4) [000000] --CXG-------         t0 = *  CALL r2r_ind simd16 TestHFAandHVA.ReturnVectorNotKnown REG d0 $140
IN0003:                           ldr     x0, [x11]
Call: GCvars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}
IN0004:                           blr     x0
                                                              /--*  t0     simd16 
Generating: N011 ( 19,  9) [000003] DA-XG-------              *  STORE_LCL_FLD simd16 V00 loc0         d:2[+0] NA REG NA
IN0005:                           str     q0, [fp,#16]
                            Live vars: {} => {V00}
Added IP mapping: 0x0006 STACK_EMPTY (G_M38418_IG02,ins#5,ofs#20)
Generating: N013 (???,???) [000010] ------------                 IL_OFFSET void   IL offset: 0x6 REG NA
Generating: N015 (  3,  4) [000004] ------------         t4 =    LCL_FLD   simd16 V00 loc0         u:2[+0] d16 (last use) REG d16 $141
IN0006:                           ldr     q16, [fp,#16]
                            Live vars: {V00} => {}
                                                              /--*  t4     simd16 
Generating: N017 (  4,  5) [000005] ------------              *  RETURN    simd16 REG NA $142
IN0007:                           mov     v0.16b, v16.16b

Note for !compDoOldStructRetyping(): we do not want all these retyping to happens in random places, so we want types not to change after we create them during importation until they reach lowering.
Question: but what type should we use in the last example? TYP_STRUCT works much better, because then we don't need access LCL_VAR as LCL_FLD, they have exactly the same types and Jit knows that!
For now, I am stick with TYP_STRUCT in all cases for all call types, keep RETURN as VM sees them, but it cases asserts that I can't avoid without implementing #11413, because we start getting IND SIMD16(ADDR byref(call STRUCT) for such calls and ADDR(call) is not a valid IR (we sometimes create them, but we are lucky in those examples and I am not lucky in mine).

Summary 1, 2, 3: do not try to guess Jit TYP looking at C# code.

4 example:
    [MethodImpl(MethodImplOptions.NoInlining)]
    static Vector<T> ReturnVectorTWithMerge<T>(int v, T init1, T init2, T init3, T init4) where T : struct
    {
        if (v == 0)
        {
            return new Vector<T>();
        }
        else if (v == 1)
        {
            return new Vector<T>(init1);
        }
        else if (v == 2)
        {
            return new Vector<T>(init2);
        }
        else if (v == 3)
        {
            return new Vector<T>(init3);
        }
        else
        {
            return new Vector<T>(init4);
        }
    }
    
    struct A
    {
        bool a;
    }
    
    ReturnVectorTWithMerge<A>(int v, a, b, c, d);

so we know that return->gtType == TYP_SIMD and call types would be TYP_STRUCT, and we know that it is working fine somehow and after morph, we change call types to TYP_SIMD and it is great.

but here comes my favorite thing: return merging, we create a 1 local var where we put all return results and it is happening before global morph, during PHASE Morph - Add internal blocks,
can you guess the type of this LCL_VAR?

lvaGrabTemp returning 12 (V12 tmp5) called for Single return block return value.
SIMD Candidate Type System.Numerics.Vector`1[System.__Canon]
  Unknown SIMD Vector<T>

mergeReturns statement tree [000071] added to genReturnBB BB10 [0009]
               [000071] ------------              *  RETURN    struct
               [000070] -------N----              \--*  LCL_VAR   struct<System.Numerics.Vector`1[__Canon], 16> V12 tmp5 

and return knows it is a struct somehow... But maybe morph will fix it like it fixes calls? Nop... it will bail out with an assert that you can easily repro in the current master, see #37247:

Assert failure(PID 198612 [0x000307d4], Thread: 221228 [0x3602c]): Assertion failed '!"Incompatible types for gtNewTempAssign"' in 'TestHFAandHVA:ReturnVectorTWithMerge(int,System.__Canon,System.__Canon,System.__Canon,System.__Canon):System.Numerics.Vector`1[__Canon]' during 'Morph - Global' (IL size 54)

    File: F:\git\runtime\src\coreclr\src\jit\gentree.cpp Line: 15159
    Image: F:\git\runtime\artifacts\bin\coreclr\Windows_NT.arm64.Checked\x64\crossgen.exe

when we try to do ASG(LCL_VAR struct our merge lclVar, LCL_FLD SIMD16 from our calls

It doesn't nowadays lead to a bad codegen in release, because lower has a handling for it under compDoOldStructRetyping() == false and we ignore asserts
that is actually compDoOldStructRetyping() == false and do the right thing of setting RETURN TYP back to SIMD16, I do not have an older version or runtime to check what was happening before compDoOldStructRetyping.

Summary 1, 2, 3, 4: with compDoOldStructRetyping == true the old system is very unpredictable and fragile, with failures in simple cases.
compDoOldStructRetyping == false that I am trying to support on arm64 has the same difficulties and I would like to hear @CarolEidt , @tannergooding , @dotnet/jit-contrib opinions about types that we should choose in each case. I have tried many options and none of them was good enough.

I have started working on #11413, so I could keep calls as TYP_STRUCT always, ignoring SIMD and avoiding creating IND(ADDR(CALL) when we assign their results to LCL_VAR/FLD SIMD*, what do you think?

Metadata

Metadata

Assignees

Labels

area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIdesign-discussionOngoing discussion about design without consensus

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions