Description
Hello, I could use some help with a new model for our return/call SIMD*
typing that I am implementing, but first a few examples of what is happening now.
1 example:
[MethodImpl(MethodImplOptions.NoInlining)]
static Vector4 ReturnVector4()
{
return new Vector4(1);
}
[MethodImpl(MethodImplOptions.NoInlining)]
static Vector4 ReturnVector4UsingCall()
{
return ReturnVector4();
}
IL for ReturnVector4UsingCall
is very simple: call ReturnVector4; ret
,
IR would be ASG(LCL_VAR, call); return LCL_VAR
;
The complexity is that Arm64 supports both vector and HFA calling conventions, in this case
Vector4 is an HFA value, so we have to return it as v0.s[0], v1.s[0], v2.s[0], v3.s[0]
.
Now let's see how we import this call and with which type:
- create it as
TYP_STRUCT
, usingcallRetTyp = JITtype2varType(calliSig.retType)
inimpImportCall
; - change it to
TYP_SIMD16
inimpImportCall
:callRetTyp = impNormStructType(actualMethodRetTypeSigClass); call->gtType = callRetTyp;
; - change it back to
TYP_STRUCT
inimpAssignStructPtr
:src->gtType = genActualType(returnType);
and that is the final value of the type.
a fun side-effect: even if call result is not used we are still creatingASG(LCL_VAR, call)
, change call type to struct and only later delete theASG
leavingcall
with the correct struct type.
Note for !compDoOldStructRetyping(): I don't do 2. and 3., so create as TYP_STRUCT
and keep it.
and the return in this case is STRUCT
, so we end up with nice IR:
***** BB01
STMT00000 (IL 0x000...0x005)
N005 ( 15, 4) [000003] -ACXG---R--- * ASG simd16 (copy)
N004 ( 1, 1) [000001] D------N---- +--* LCL_VAR simd16<System.Numerics.Vector4> V01 tmp1 d:1
N003 ( 15, 4) [000000] --CXG------- \--* CALL r2r_ind struct TestHFAandHVA.ReturnVector4,NA,NA,NA
N002 ( 1, 1) [000006] ------------ arg0 in x11 \--* CNS_INT(h) long 0x29e89a04b90 ftn REG x11
***** BB01
STMT00001 (IL ???... ???)
N002 ( 2, 2) [000005] ------------ * RETURN struct
N001 ( 1, 1) [000004] -------N---- \--* LCL_VAR simd16<System.Numerics.Vector4> V01 tmp1 u:1 (last use)
Note/todo/fun fact: if we did not set LCL_VAR
type as SIMD16 and keep it as a struct, then copy prop would optimize it as:
N002 ( 2, 2) [000005] ------------ * RETURN struct
N003 ( 15, 4) [000000] --CXG------- \--* CALL r2r_ind struct TestHFAandHVA.ReturnVector4,NA,NA,NA
N002 ( 1, 1) [000006] ------------ arg0 in x11 \--* CNS_INT(h) long 0x29e89a04b90 ftn REG x11
Summary 1: in HFA case we type call
and return
as TYP_STRUCT
with some confusing transformations in the middle.
2 example:
[MethodImpl(MethodImplOptions.NoInlining)]
static Vector<int> ReturnVectorInt()
{
return new Vector<int>();
}
[MethodImpl(MethodImplOptions.NoInlining)]
static Vector<int> ReturnVectorIntUsingCall()
{
return ReturnVectorInt();
}
- create it as
TYP_STRUCT
, usingcallRetTyp = JITtype2varType(calliSig.retType)
inimpImportCall
; - change it to
TYP_SIMD16
inimpImportCall
:callRetTyp = impNormStructType(actualMethodRetTypeSigClass); call->gtType = callRetTyp;
; - keep it as
TYP_SIMD16
inimpAssignStructPtr
:src->gtType = genActualType(returnType);
.
and IR looks good:
***** BB01
STMT00000 (IL 0x000...0x005)
N005 ( 15, 4) [000003] -ACXG---R--- * ASG simd16 (copy)
N004 ( 1, 1) [000001] D------N---- +--* LCL_VAR simd16<System.Numerics.Vector4> V01 tmp1 d:1
N003 ( 15, 4) [000000] --CXG------- \--* CALL r2r_ind struct TestHFAandHVA.ReturnVector4,NA,NA,NA
N002 ( 1, 1) [000006] ------------ arg0 in x11 \--* CNS_INT(h) long 0x29e89a04b90 ftn REG x11
***** BB01
STMT00001 (IL ???... ???)
N002 ( 2, 2) [000005] ------------ * RETURN struct
N001 ( 1, 1) [000004] -------N---- \--* LCL_VAR simd16<System.Numerics.Vector4> V01 tmp1 u:1 (last use)
Summary 1,2: based on these 2 examples we could think that TYP_SIMD16
on a call or a return means passed in a single vector register and it will have TYP_STRUCT
when it is an HFA,
and TYP_STRUCT
can be assigned to TYP_SIMD16
, but...
3 example:
struct A
{
bool a;
}
[MethodImpl(MethodImplOptions.NoInlining)]
static Vector<A> ReturnVectorNotKnown()
{
return new Vector<A>();
}
[MethodImpl(MethodImplOptions.NoInlining)]
static Vector<A> ReturnVectorNotKnownUsingCall()
{
return ReturnVectorNotKnown();
}
guess which type Jit will use for it before you read the answer :-)
The IR after importation will be:
[000001] --C-G------- * RETURN simd16
[000000] --C-G------- \--* CALL r2r_ind struct TestHFAandHVA.ReturnVectorNotKnown
because for the return TYPE we ask VM and for call type we use getBaseTypeAndSizeOfSIMDType
that can only parse known primitive types, so we get a nice mistyping out of nowhere,
does not look like a problem so far, JIT can handle it using morph::fgFixupStructReturn
that sets call type to simd16.
3.1. example:
add a temp local var to the last example:
[MethodImpl(MethodImplOptions.NoInlining)]
static Vector<A> ReturnVectorNotKnownUsingCallAndTemp()
{
var a = ReturnVectorNotKnown();
return a;
}
and we have IR that we want right after importation, thanks to impAssignStructPtr
from the first example:
***** BB01
STMT00000 (IL 0x000...0x005)
[000003] -AC-G------- * ASG simd16 (copy)
[000001] D------N---- +--* LCL_FLD simd16 V00 loc0 [+0]
[000000] --C-G------- \--* CALL r2r_ind simd16 TestHFAandHVA.ReturnVectorNotKnown
***** BB01
STMT00001 (IL 0x006...0x007)
[000005] ------------ * RETURN simd16
[000004] ------------ \--* LCL_FLD simd16 V00 loc0 [+0]
but V00
is created as STRUCT, so can't put it in a register, sad:
Generating: N009 ( 15, 4) [000000] --CXG------- t0 = * CALL r2r_ind simd16 TestHFAandHVA.ReturnVectorNotKnown REG d0 $140
IN0003: ldr x0, [x11]
Call: GCvars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}
IN0004: blr x0
/--* t0 simd16
Generating: N011 ( 19, 9) [000003] DA-XG------- * STORE_LCL_FLD simd16 V00 loc0 d:2[+0] NA REG NA
IN0005: str q0, [fp,#16]
Live vars: {} => {V00}
Added IP mapping: 0x0006 STACK_EMPTY (G_M38418_IG02,ins#5,ofs#20)
Generating: N013 (???,???) [000010] ------------ IL_OFFSET void IL offset: 0x6 REG NA
Generating: N015 ( 3, 4) [000004] ------------ t4 = LCL_FLD simd16 V00 loc0 u:2[+0] d16 (last use) REG d16 $141
IN0006: ldr q16, [fp,#16]
Live vars: {V00} => {}
/--* t4 simd16
Generating: N017 ( 4, 5) [000005] ------------ * RETURN simd16 REG NA $142
IN0007: mov v0.16b, v16.16b
Note for !compDoOldStructRetyping(): we do not want all these retyping to happens in random places, so we want types not to change after we create them during importation until they reach lowering.
Question: but what type should we use in the last example? TYP_STRUCT
works much better, because then we don't need access LCL_VAR as LCL_FLD
, they have exactly the same types and Jit knows that!
For now, I am stick with TYP_STRUCT
in all cases for all call
types, keep RETURN
as VM sees them, but it cases asserts that I can't avoid without implementing #11413, because we start getting IND SIMD16(ADDR byref(call STRUCT)
for such calls and ADDR(call)
is not a valid IR (we sometimes create them, but we are lucky in those examples and I am not lucky in mine).
Summary 1, 2, 3: do not try to guess Jit TYP looking at C# code.
4 example:
[MethodImpl(MethodImplOptions.NoInlining)]
static Vector<T> ReturnVectorTWithMerge<T>(int v, T init1, T init2, T init3, T init4) where T : struct
{
if (v == 0)
{
return new Vector<T>();
}
else if (v == 1)
{
return new Vector<T>(init1);
}
else if (v == 2)
{
return new Vector<T>(init2);
}
else if (v == 3)
{
return new Vector<T>(init3);
}
else
{
return new Vector<T>(init4);
}
}
struct A
{
bool a;
}
ReturnVectorTWithMerge<A>(int v, a, b, c, d);
so we know that return->gtType == TYP_SIMD
and call types would be TYP_STRUCT
, and we know that it is working fine somehow
and after morph, we change call
types to TYP_SIMD
and it is great.
but here comes my favorite thing: return merging, we create a 1 local var where we put all return results and it is happening before global morph, during PHASE Morph - Add internal blocks
,
can you guess the type of this LCL_VAR?
lvaGrabTemp returning 12 (V12 tmp5) called for Single return block return value.
SIMD Candidate Type System.Numerics.Vector`1[System.__Canon]
Unknown SIMD Vector<T>
mergeReturns statement tree [000071] added to genReturnBB BB10 [0009]
[000071] ------------ * RETURN struct
[000070] -------N---- \--* LCL_VAR struct<System.Numerics.Vector`1[__Canon], 16> V12 tmp5
and return knows it is a struct somehow... But maybe morph will fix it like it fixes calls? Nop... it will bail out with an assert that you can easily repro in the current master, see #37247:
Assert failure(PID 198612 [0x000307d4], Thread: 221228 [0x3602c]): Assertion failed '!"Incompatible types for gtNewTempAssign"' in 'TestHFAandHVA:ReturnVectorTWithMerge(int,System.__Canon,System.__Canon,System.__Canon,System.__Canon):System.Numerics.Vector`1[__Canon]' during 'Morph - Global' (IL size 54)
File: F:\git\runtime\src\coreclr\src\jit\gentree.cpp Line: 15159
Image: F:\git\runtime\artifacts\bin\coreclr\Windows_NT.arm64.Checked\x64\crossgen.exe
when we try to do ASG(LCL_VAR struct our merge lclVar, LCL_FLD SIMD16 from our calls
It doesn't nowadays lead to a bad codegen in release, because lower has a handling for it under compDoOldStructRetyping() == false
and we ignore asserts
that is actually compDoOldStructRetyping() == false
and do the right thing of setting RETURN TYP
back to SIMD16
, I do not have an older version or runtime to check what was happening before compDoOldStructRetyping
.
Summary 1, 2, 3, 4: with compDoOldStructRetyping == true
the old system is very unpredictable and fragile, with failures in simple cases.
compDoOldStructRetyping == false
that I am trying to support on arm64 has the same difficulties and I would like to hear @CarolEidt , @tannergooding , @dotnet/jit-contrib opinions about types that we should choose in each case. I have tried many options and none of them was good enough.
I have started working on #11413, so I could keep calls
as TYP_STRUCT
always, ignoring SIMD
and avoiding creating IND(ADDR(CALL)
when we assign their results to LCL_VAR/FLD SIMD*
, what do you think?