Skip to content

[*64] Allow more fastTailCalls involving structs #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 9 additions & 7 deletions src/jit/compiler.h
Original file line number Diff line number Diff line change
Expand Up @@ -8186,13 +8186,15 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
bool compPublishStubParam : 1; // EAX captured in prolog will be available through an instrinsic
bool compRetBuffDefStack : 1; // The ret buff argument definitely points into the stack.

var_types compRetType; // Return type of the method as declared in IL
var_types compRetNativeType; // Normalized return type as per target arch ABI
unsigned compILargsCount; // Number of arguments (incl. implicit but not hidden)
unsigned compArgsCount; // Number of arguments (incl. implicit and hidden)
unsigned compRetBuffArg; // position of hidden return param var (0, 1) (BAD_VAR_NUM means not present);
int compTypeCtxtArg; // position of hidden param for type context for generic code (CORINFO_CALLCONV_PARAMTYPE)
unsigned compThisArg; // position of implicit this pointer param (not to be confused with lvaArg0Var)
var_types compRetType; // Return type of the method as declared in IL
var_types compRetNativeType; // Normalized return type as per target arch ABI
unsigned compILargsCount; // Number of arguments (incl. implicit but not hidden)
unsigned compArgsCount; // Number of arguments (incl. implicit and hidden)
unsigned compArgRegCount; // Number of integer locals
unsigned compOtherArgRegCount; // Number of multireg args

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be ifdef'd under FEATURE_MULTIREG_ARGS. Also, is it the number of integer locals or the number of integer arguments in registers?

unsigned compRetBuffArg; // position of hidden return param var (0, 1) (BAD_VAR_NUM means not present);
int compTypeCtxtArg; // position of hidden param for type context for generic code (CORINFO_CALLCONV_PARAMTYPE)
unsigned compThisArg; // position of implicit this pointer param (not to be confused with lvaArg0Var)
unsigned compILlocalsCount; // Number of vars : args + locals (incl. implicit but not hidden)
unsigned compLocalsCount; // Number of vars : args + locals (incl. implicit and hidden)
unsigned compMaxStack;
Expand Down
11 changes: 8 additions & 3 deletions src/jit/lclvars.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -239,9 +239,11 @@ void Compiler::lvaInitTypeRef()
// Finally the local variables
//-------------------------------------------------------------------------

unsigned varNum = varDscInfo.varNum;
LclVarDsc* varDsc = varDscInfo.varDsc;
CORINFO_ARG_LIST_HANDLE localsSig = info.compMethodInfo->locals.args;
unsigned argRegNum = varDscInfo.intRegArgNum;
unsigned otherArgRegNum = varDscInfo.floatRegArgNum;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is confusing at best, and needs a comment. I gather that this handles the case where there may be a mix of int and float regs, so that this is the first of each that is used for this arg?

unsigned varNum = varDscInfo.varNum;
LclVarDsc* varDsc = varDscInfo.varDsc;
CORINFO_ARG_LIST_HANDLE localsSig = info.compMethodInfo->locals.args;

for (unsigned i = 0; i < info.compMethodInfo->locals.numArgs;
i++, varNum++, varDsc++, localsSig = info.compCompHnd->getArgNext(localsSig))
Expand All @@ -262,6 +264,9 @@ void Compiler::lvaInitTypeRef()
}
}

info.compArgRegCount = argRegNum;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't make sense to me. Why are you assigning a register number to a count?

info.compOtherArgRegCount = otherArgRegNum;

if ( // If there already exist unsafe buffers, don't mark more structs as unsafe
// as that will cause them to be placed along with the real unsafe buffers,
// unnecessarily exposing them to overruns. This can affect GS tests which
Expand Down
1 change: 0 additions & 1 deletion src/jit/lower.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1714,7 +1714,6 @@ void Lowering::LowerFastTailCall(GenTreeCall* call)
fgArgTabEntryPtr argTabEntry = comp->gtArgEntryByNode(call, putArgStkNode);
assert(argTabEntry);
unsigned callerArgNum = argTabEntry->argNum - calleeNonStandardArgCount;
noway_assert(callerArgNum < comp->info.compArgsCount);

unsigned callerArgLclNum = callerArgNum;
LclVarDsc* callerArgDsc = comp->lvaTable + callerArgLclNum;
Expand Down
100 changes: 81 additions & 19 deletions src/jit/morph.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6943,20 +6943,23 @@ bool Compiler::fgCanFastTailCall(GenTreeCall* callee)
// Note that callee being a vararg method is not a problem since we can account the params being passed.

// Count of caller args including implicit and hidden (i.e. thisPtr, RetBuf, GenericContext, VarargCookie)
unsigned nCallerArgs = info.compArgsCount;
unsigned callerArgRegCount = info.compArgRegCount;
unsigned callerOtherArgRegCount = info.compOtherArgRegCount;

// Count the callee args including implicit and hidden.
// Note that GenericContext and VarargCookie are added by importer while
// importing the call to gtCallArgs list along with explicit user args.
unsigned nCalleeArgs = 0;
unsigned calleeArgRegCount = 0;
unsigned calleeOtherArgRegCount = 0;

if (callee->gtCallObjp) // thisPtr
{
nCalleeArgs++;
++calleeArgRegCount;
}

if (callee->HasRetBufArg()) // RetBuf
{
nCalleeArgs++;
++calleeArgRegCount;

// If callee has RetBuf param, caller too must have it.
// Otherwise go the slow route.
Expand All @@ -6973,8 +6976,6 @@ bool Compiler::fgCanFastTailCall(GenTreeCall* callee)
bool hasMultiByteArgs = false;
for (GenTreePtr args = callee->gtCallArgs; (args != nullptr) && !hasMultiByteArgs; args = args->gtOp.gtOp2)
{
nCalleeArgs++;

assert(args->OperIsList());
GenTreePtr argx = args->gtOp.gtOp1;

Expand Down Expand Up @@ -7004,21 +7005,61 @@ bool Compiler::fgCanFastTailCall(GenTreeCall* callee)
unsigned typeSize = 0;
hasMultiByteArgs = !VarTypeIsMultiByteAndCanEnreg(argx->TypeGet(), objClass, &typeSize, false);

#if defined(FEATURE_UNIX_AMD64_STRUCT_PASSING) || defined(_TARGET_ARM64_)
// On System V/arm64 the args could be a 2 eightbyte struct that is passed in two registers.
// Account for the second eightbyte in the nCalleeArgs.
// https://github.com/dotnet/coreclr/issues/2666
// TODO-CQ-Amd64-Unix/arm64: Structs of size between 9 to 16 bytes are conservatively estimated
// as two args, since they need two registers whereas nCallerArgs is
// counting such an arg as one. This would mean we will not be optimizing
// certain calls though technically possible.
#if defined(FEATURE_UNIX_AMD64_STRUCT_PASSING)
SYSTEMV_AMD64_CORINFO_STRUCT_REG_PASSING_DESCRIPTOR structDesc;

if (typeSize > TARGET_POINTER_SIZE)
assert(objClass != nullptr);
eeGetSystemVAmd64PassStructInRegisterDescriptor(objClass, &structDesc);
if (structDesc.passedInRegisters)
{
unsigned extraArgRegsToAdd = (typeSize / TARGET_POINTER_SIZE);
nCalleeArgs += extraArgRegsToAdd;
for (unsigned int i = 0; i < structDesc.eightByteCount; i++)
{
if (structDesc.IsIntegralSlot(i))
{
++calleeArgRegCount;
}
else if (structDesc.IsSseSlot(i))
{
++calleeOtherArgRegCount;
}
else
{
assert(false && "Invalid eightbyte classification type.");
break;
}
}
}
#endif // FEATURE_UNIX_AMD64_STRUCT_PASSING || _TARGET_ARM64_

#else // ARM64
var_types hfaType = GetHfaType(argx);
bool isHfaArg = varTypeIsFloating(hfaType);
unsigned size = 1;

if (isHfaArg)
{
size = GetHfaCount(argx);
// HFA structs are passed by value in multiple registers
calleeArgRegCount += size;
}
else
{
// Structs are either passed in 1 or 2 (64-bit) slots
size = (unsigned)(roundUp(info.compCompHnd->getClassSize(argx->gtArgPlace.gtArgPlaceClsHnd),
TARGET_POINTER_SIZE)) /
TARGET_POINTER_SIZE;

if (size <= 2)
{
// Structs that are the size of 2 pointers are passed by value in multiple registers
calleeArgRegCount += size;
}
else if (size > 2)
{
size = 1; // Structs that are larger that 2 pointers (except for HFAs) are passed by
// reference (to a copy)
}
}
#endif // FEATURE_UNIX_AMD64_STRUCT_PASSING

#else
assert(!"Target platform ABI rules regarding passing struct type args in registers");
Expand All @@ -7030,6 +7071,10 @@ bool Compiler::fgCanFastTailCall(GenTreeCall* callee)
hasMultiByteArgs = true;
}
}
else
{
++calleeArgRegCount;
}
}

// Go the slow route, if it has multi-byte params
Expand All @@ -7038,14 +7083,31 @@ bool Compiler::fgCanFastTailCall(GenTreeCall* callee)
return false;
}

const unsigned maxRegArgs = MAX_REG_ARG;
bool hasOtherRegs = callerOtherArgRegCount > 0 || calleeOtherArgRegCount > 0;

// Logging to help determine why the callee has been marked as canFastTailCall
JITDUMP("\ncanFastTailCall:\n");
JITDUMP("calleeArgRegCount (%d) > maxRegArgs: (%d): %s\n",
calleeArgRegCount, maxRegArgs, calleeArgRegCount > MAX_REG_ARG ? "true" : "false");
JITDUMP("callerArgRegCount (%d) < calleeArgRegCount (%d): %s\n",
callerArgRegCount, calleeArgRegCount, callerArgRegCount < calleeArgRegCount ? "true" : "false");

JITDUMP("!hasOtherRegs (%d) && (callerOtherArgRegCount (%d) < calleeOtherArgRegCount (%d)): %s\n",
(int)hasOtherRegs, callerOtherArgRegCount, calleeOtherArgRegCount, !hasOtherRegs && (callerOtherArgRegCount < calleeOtherArgRegCount) ? "true" : "false");

// If we reached here means that callee has only those argument types which can be passed in
// a register and if passed on stack will occupy exactly one stack slot in out-going arg area.
// If we are passing args on stack for callee and it has more args passed on stack than
// caller, then fast tail call cannot be performed.
//
// Note that the GC'ness of on stack args need not match since the arg setup area is marked
// as non-interruptible for fast tail calls.
if ((nCalleeArgs > MAX_REG_ARG) && (nCallerArgs < nCalleeArgs))
if ((calleeArgRegCount > MAX_REG_ARG) &&
(callerArgRegCount < calleeArgRegCount) &&
(!hasOtherRegs || (callerOtherArgRegCount < calleeOtherArgRegCount)) // Make sure that we have structs passed in regs
// and avoid 0 < 0
)
{
return false;
}
Expand Down
81 changes: 81 additions & 0 deletions tests/src/JIT/opt/FastTailCall/DoNotFastTailCallWithStructs.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.

using System;

// 10 byte struct
public struct A
{
public int a;
public int b;
public short c;
}

// 32 byte struct
public struct B
{
public long a;
public long b;
public long c;
public long d;
}

public class DoNotFastTailCallWithStructs
{
static A a;
static B b;

public static void Foo(B b, int count=1000)
{
if (count == 100)
{
return count;
}

if (count-- % 2 == 0)
{
return Foo(a, count);
}

else
{
return Foo(b, count);
}
}

public static void Foo(A a, int count=1000)
{
if (count == 100)
{
return count;
}

if (count-- % 2 == 0)
{
return Foo(a, count);
}

else
{
return Foo(b, count);
}
}

public static int Main()
{
a = new A();
b = new B();

a.a = 100;
a.b = 1000;
a.c = 10000;

b.a = 500;
b.b = 5000;
b.c = 50000;
b.d = 500000;

return Foo(a);
}
}
45 changes: 45 additions & 0 deletions tests/src/JIT/opt/FastTailCall/DoNotFastTailCallWithStructs.csproj
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
<?xml version="1.0" encoding="utf-8"?>
<Project ToolsVersion="12.0" DefaultTargets="Build" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<Import Project="$([MSBuild]::GetDirectoryNameOfFileAbove($(MSBuildThisFileDirectory), dir.props))\dir.props" />
<PropertyGroup>
<Configuration Condition=" '$(Configuration)' == '' ">Debug</Configuration>
<Platform Condition=" '$(Platform)' == '' ">AnyCPU</Platform>
<AssemblyName>$(MSBuildProjectName)</AssemblyName>
<SchemaVersion>2.0</SchemaVersion>
<ProjectGuid>{95DFC527-4DC1-495E-97D7-E94EE1F7140D}</ProjectGuid>
<OutputType>Exe</OutputType>
<ProjectTypeGuids>{786C830F-07A1-408B-BD7F-6EE04809D6DB};{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}</ProjectTypeGuids>
<SolutionDir Condition="$(SolutionDir) == '' Or $(SolutionDir) == '*Undefined*'">..\..\</SolutionDir>
<IlasmRoundTrip>true</IlasmRoundTrip>
</PropertyGroup>
<!-- Default configurations to help VS understand the configurations -->
<PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Debug|AnyCPU' ">
</PropertyGroup>
<PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Release|AnyCPU' ">
</PropertyGroup>
<ItemGroup>
<CodeAnalysisDependentAssemblyPaths Condition=" '$(VS100COMNTOOLS)' != '' " Include="$(VS100COMNTOOLS)..\IDE\PrivateAssemblies">
<Visible>False</Visible>
</CodeAnalysisDependentAssemblyPaths>
</ItemGroup>
<PropertyGroup>
<!-- Set to 'Full' if the Debug? column is marked in the spreadsheet. Leave blank otherwise. -->
<DebugType>None</DebugType>
<NoLogo>True</NoLogo>
<NoStandardLib>True</NoStandardLib>
<Noconfig>True</Noconfig>
<Optimize>True</Optimize>
<JitOptimizationSensitive>True</JitOptimizationSensitive>
<AllowUnsafeBlocks>True</AllowUnsafeBlocks>
<DefineConstants>$(DefineConstants);CORECLR</DefineConstants>
</PropertyGroup>
<ItemGroup>
<Compile Include="DoNotFastTailCallWithStructs.cs" />
</ItemGroup>
<ItemGroup>
<Service Include="{82A7F48D-3B50-4B1E-B82E-3ADA8210C358}" />
</ItemGroup>
<Import Project="$([MSBuild]::GetDirectoryNameOfFileAbove($(MSBuildThisFileDirectory), dir.targets))\dir.targets" />
<PropertyGroup Condition=" '$(MsBuildProjectDirOverride)' != '' ">
</PropertyGroup>
</Project>
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.
using System;

public struct A
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@
<DefineConstants>$(DefineConstants);CORECLR</DefineConstants>
</PropertyGroup>
<ItemGroup>
<Compile Include="FastTailCallStackFixup.cs" />
<Compile Include="StackFixup.cs" />
</ItemGroup>
<ItemGroup>
<Service Include="{82A7F48D-3B50-4B1E-B82E-3ADA8210C358}" />
Expand Down
38 changes: 38 additions & 0 deletions tests/src/JIT/opt/FastTailCall/StructPassingSimple.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.
using System;

// 10 byte struct
public struct A
{
public int a;
public int b;
public short c;
}

class TailCallStructPassingSimple
{
// Simple tail call candidate that would be ignored on Arm64 and amd64 Unix
// due to https://github.com/dotnet/coreclr/issues/2666
public static int ImplicitTailCallTenByteStruct(A a, int count=1000)
{
if (count-- == 0)
{
return 100;
}

return ImplicitTailCallTenByteStruct(a, count);
}

public static int Main()
{
A temp = new A();
temp.a = 50;
temp.b = 500;
temp.c = 62;

int ret = ImplicitTailCallTenByteStruct(temp);
return ret;
}
}
Loading