Skip to content

Allow fasttail calls for engregesterable structs #6

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

jashook
Copy link
Owner

@jashook jashook commented Jun 26, 2017

Note

This causes 153 new fastTailCall decisions in System.Private.CoreLib. With no new decisions to not fastTailCall. See this list for new fastTailCalled methods.

varDsc->IsFloatRegType() ? ++floatingRegCount : ++argRegCount;
};

for (unsigned argNum = 0; argNum < info.compMethodInfo->args.numArgs; argNum++, varDscTrailingPointer++)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider this version of this loop used elsewhere:

for (lclNum = 0, varDsc = lvaTable; lclNum < lvaCount; lclNum++, varDsc++)

@@ -233,16 +240,55 @@ void Compiler::lvaInitTypeRef()
InitVarDscInfo varDscInfo;
varDscInfo.Init(lvaTable, hasRetBuffArg);

// Maintain a pointer to the start of the arguments.
LclVarDsc* varDscTrailingPointer = varDscInfo.varDsc;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renames as curArg or similiar

// Finally the local variables
//-------------------------------------------------------------------------

unsigned varNum = varDscInfo.varNum;
LclVarDsc* varDsc = varDscInfo.varDsc;
CORINFO_ARG_LIST_HANDLE localsSig = info.compMethodInfo->locals.args;

// Maintain a pointer to the start of the locals.
varDscTrailingPointer = varDsc;
Copy link

@briansull briansull Jun 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a dead assignment?

@@ -255,13 +301,25 @@ void Compiler::lvaInitTypeRef()
varDsc->lvPinned = ((corInfoType & CORINFO_TYPE_MOD_PINNED) != 0);
varDsc->lvOnFrame = true; // The final home for this local variable might be our local stack frame

CORINFO_CLASS_HANDLE clsHnd = info.compCompHnd->getArgClass(&info.compMethodInfo->locals, localsSig);
stackSize += roundUp(compGetTypeSize(strip(corInfoType), clsHnd), TARGET_POINTER_SIZE);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This local name is confusing: corInfoType

unsigned callerArgRegCount = info.compArgRegCount;
unsigned callerFloatArgRegCount = info.compFloatArgRegCount;

// TODO ARM64, UNIX x64
Copy link

@briansull briansull Jun 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need // TODO-ARM64/AMD64-UNIX instead

//
// hasMultiByteArgs will determine if the struct can be passed
// in registers. If it cannot we will break the loop and not
// fastTailCall.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to have each line end with a period, NOTE is implied here


if (typeSize > TARGET_POINTER_SIZE)
// TODO. Here we have made the assumption that multibyte struct
// arguments will cause a no fastTailCall decision.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// TODO-AMD64-Linux

@BruceForstall
Copy link

In general, it would be useful to have a file-level comment for your new test cases, describing what they are intending to test -- an overall "here's why this is here" kind of comment.

@jashook jashook force-pushed the allow_fasttail_calls_for_engregesterable_structs branch 2 times, most recently from ad4f1b1 to b7590e3 Compare June 26, 2017 18:47
Before this change structs on Arm64 and Amd64 Unix could
pessimize when we could fastTailCall if they were engregisterable
and took more than one register.
@jashook jashook force-pushed the allow_fasttail_calls_for_engregesterable_structs branch from b7590e3 to 3b1a01d Compare June 26, 2017 18:53
@jashook
Copy link
Owner Author

jashook commented Jun 26, 2017

Ok this Pr is in a good state again. It has changes based on @briansull's comments that are incomplete.

{
++info.compArgsCount;
++argRegCount;
};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code assumes that anything using incrementArgCount (which is the four specials: 'this', retbuf, generics context, varargs cookie) are register arguments. But that doesn't appear to be true, e.g., for x86, which only has 2 register arguments. (There are also "non-standard" reg args for things like pinvoke helpers, etc.).

It seems like the only code that can count reg args correctly is fgMorphArgs, when AddRegArg() is called. I wonder if this calculation should be done then instead.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, ignore that; I was confusing incoming and outgoing args.

@@ -255,13 +301,25 @@ void Compiler::lvaInitTypeRef()
varDsc->lvPinned = ((corInfoType & CORINFO_TYPE_MOD_PINNED) != 0);
varDsc->lvOnFrame = true; // The final home for this local variable might be our local stack frame

CORINFO_CLASS_HANDLE clsHnd = info.compCompHnd->getArgClass(&info.compMethodInfo->locals, localsSig);
stackSize += roundUp(compGetTypeSize(strip(corInfoType), clsHnd), TARGET_POINTER_SIZE);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You hoisted this out of the if (strip(corInfoType) == CORINFO_TYPE_CLASS) condition. Is it legal (or desirable) to call getArgClass in all cases?

{
++info.compArgsCount;
++argRegCount;
};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this going to double-count the "specials"? Don't they have varDsc that is going to be counted by your new loop below the call to lvaInitArgs()?

incrementRegCount(varDsc);

#ifdef (FEATURE_MULTIREG_ARGS && !WINDOWS_AMD64_ABI)
if (varDsc->lvOtherArgReg != REG_NA)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this need to be written:

#if defined(FEATURE_MULTIREG_ARGS) && !defined(WINDOWS_AMD64_ABI)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of !defined(WINDOWS_AMD64_ABI) I would prefer defined(UNIX_AMD64_ABI)


In reply to: 124107782 [](ancestors = 124107782)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

although it's not clear it needs any platform #ifdef. If it does, please add a comment as to why.


In reply to: 124108646 [](ancestors = 124108646,124107782)

// Calculate the argument register usage.
//
// This will later be used for fastTailCall determination
//-------------------------------------------------------------------------

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole section should probably be under #if FEATURE_FASTTAILCALL

for (LclVarDsc* varDsc, unsigned argNum = 0; argNum < info.compMethodInfo->args.numArgs; argNum++, varDsc++)
{
if (varDsc->lvRegNum != REG_STK)
{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be lvArgReg? or maybe lvIsRegArg?

else
{
stackSize += varDsc->lvSize();
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lvSize() can only be used for structs. I think you need a different calculation here for argument stack size.

}

#else // ARM64
var_types hfaType = GetHfaType(argx);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also Windows amd64, which I think you don't want?

}

#elif (_TARGET_AMD64_ && UNIX_AMD64_ABI) || _TARGET_ARM64_

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be (or, at least, should be) written:

#elif (defined(_TARGET_AMD64_) && defined(UNIX_AMD64_ABI)) || defined(_TARGET_ARM64_)


// For *nix Amd64 and Arm64 check to see if all arguments for the callee
// and caller are passing in registers. If not make sure that the stack
// size for the callee is at less than or equal to the caller's.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is at less [](start = 27, length = 10)

"... is at less ..."

Should the "at" be removed? I might rephrase to say:

If not, ensure that the outgoing argument stack size requirement for the callee is less than or equal to the caller's entire stack frame usage.

(I assume we care about the caller's entire stack frame usage, not the caller's argument stack space usage, which would be less.)

// x64 Windows: If we have more callee registers used than MAX_REG_ARG, then
// make sure the callee's incoming arguments is less than the caller's
if ((((calleeArgRegCount + calleeFloatArgRegCount) > maxRegArgs) &&
((callerArgRegCount + callerFloatArgRegCount) > (callerArgRegCount + callerFloatArgRegCount)))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(callerArgRegCount + callerFloatArgRegCount) > (callerArgRegCount + callerFloatArgRegCount) [](start = 9, length = 91)

There's a problem here:

(callerArgRegCount + callerFloatArgRegCount) > (callerArgRegCount + callerFloatArgRegCount)

looks like a > a

@jashook jashook closed this Jun 27, 2017
#endif // FEATURE_UNIX_AMD64_STRUCT_PASSING || _TARGET_ARM64_
else
{
if (structDesc.eightByteCount > 1)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads better as:

if (structDesc.eightByteCount == 2)

Copy link

@briansull briansull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with one comment

jashook pushed a commit that referenced this pull request Dec 1, 2017
# This is the 1st commit message:

Remove CoreFX runtest dependency

This change will start using build-test.sh generatelayoutonly to build
the coreoverlay directory for use with runtest.sh.

# This is the commit message #2:

Missing ./

# This is the commit message #3:

Fix untar location

# This is the commit message #4:

Use portable builds

# This is the commit message #5:

Get ci green

# This is the commit message #6:

Small changes

# This is the commit message #7:

Small changes

# This is the commit message #8:

Small changes

# This is the commit message #9:

Add generateonly builds.sh

# This is the commit message #10:

Ensure correct test location

# This is the commit message #11:

netci change

# This is the commit message #12:

netci change

# This is the commit message #13:

Working on setting up build-test

# This is the commit message #14:

Remove generatelayoutonly

# This is the commit message #15:

Undo some of the changes to use generatelayoutonly

# This is the commit message #16:

Correctly remove build-test.sh invocation

# This is the commit message #17:

Fix build-test generatelayoutonly

# This is the commit message #18:

Fix a few netci issues
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants