-
Notifications
You must be signed in to change notification settings - Fork 0
Allow fasttail calls for engregesterable structs #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/jit/lclvars.cpp
Outdated
varDsc->IsFloatRegType() ? ++floatingRegCount : ++argRegCount; | ||
}; | ||
|
||
for (unsigned argNum = 0; argNum < info.compMethodInfo->args.numArgs; argNum++, varDscTrailingPointer++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider this version of this loop used elsewhere:
for (lclNum = 0, varDsc = lvaTable; lclNum < lvaCount; lclNum++, varDsc++)
src/jit/lclvars.cpp
Outdated
@@ -233,16 +240,55 @@ void Compiler::lvaInitTypeRef() | |||
InitVarDscInfo varDscInfo; | |||
varDscInfo.Init(lvaTable, hasRetBuffArg); | |||
|
|||
// Maintain a pointer to the start of the arguments. | |||
LclVarDsc* varDscTrailingPointer = varDscInfo.varDsc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renames as curArg or similiar
src/jit/lclvars.cpp
Outdated
// Finally the local variables | ||
//------------------------------------------------------------------------- | ||
|
||
unsigned varNum = varDscInfo.varNum; | ||
LclVarDsc* varDsc = varDscInfo.varDsc; | ||
CORINFO_ARG_LIST_HANDLE localsSig = info.compMethodInfo->locals.args; | ||
|
||
// Maintain a pointer to the start of the locals. | ||
varDscTrailingPointer = varDsc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a dead assignment?
src/jit/lclvars.cpp
Outdated
@@ -255,13 +301,25 @@ void Compiler::lvaInitTypeRef() | |||
varDsc->lvPinned = ((corInfoType & CORINFO_TYPE_MOD_PINNED) != 0); | |||
varDsc->lvOnFrame = true; // The final home for this local variable might be our local stack frame | |||
|
|||
CORINFO_CLASS_HANDLE clsHnd = info.compCompHnd->getArgClass(&info.compMethodInfo->locals, localsSig); | |||
stackSize += roundUp(compGetTypeSize(strip(corInfoType), clsHnd), TARGET_POINTER_SIZE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This local name is confusing: corInfoType
src/jit/morph.cpp
Outdated
unsigned callerArgRegCount = info.compArgRegCount; | ||
unsigned callerFloatArgRegCount = info.compFloatArgRegCount; | ||
|
||
// TODO ARM64, UNIX x64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need // TODO-ARM64/AMD64-UNIX
instead
src/jit/morph.cpp
Outdated
// | ||
// hasMultiByteArgs will determine if the struct can be passed | ||
// in registers. If it cannot we will break the loop and not | ||
// fastTailCall. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to have each line end with a period, NOTE is implied here
|
||
if (typeSize > TARGET_POINTER_SIZE) | ||
// TODO. Here we have made the assumption that multibyte struct | ||
// arguments will cause a no fastTailCall decision. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// TODO-AMD64-Linux
In general, it would be useful to have a file-level comment for your new test cases, describing what they are intending to test -- an overall "here's why this is here" kind of comment. |
ad4f1b1
to
b7590e3
Compare
Before this change structs on Arm64 and Amd64 Unix could pessimize when we could fastTailCall if they were engregisterable and took more than one register.
b7590e3
to
3b1a01d
Compare
Ok this Pr is in a good state again. It has changes based on @briansull's comments that are incomplete. |
src/jit/lclvars.cpp
Outdated
{ | ||
++info.compArgsCount; | ||
++argRegCount; | ||
}; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code assumes that anything using incrementArgCount (which is the four specials: 'this', retbuf, generics context, varargs cookie) are register arguments. But that doesn't appear to be true, e.g., for x86, which only has 2 register arguments. (There are also "non-standard" reg args for things like pinvoke helpers, etc.).
It seems like the only code that can count reg args correctly is fgMorphArgs, when AddRegArg() is called. I wonder if this calculation should be done then instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, ignore that; I was confusing incoming and outgoing args.
src/jit/lclvars.cpp
Outdated
@@ -255,13 +301,25 @@ void Compiler::lvaInitTypeRef() | |||
varDsc->lvPinned = ((corInfoType & CORINFO_TYPE_MOD_PINNED) != 0); | |||
varDsc->lvOnFrame = true; // The final home for this local variable might be our local stack frame | |||
|
|||
CORINFO_CLASS_HANDLE clsHnd = info.compCompHnd->getArgClass(&info.compMethodInfo->locals, localsSig); | |||
stackSize += roundUp(compGetTypeSize(strip(corInfoType), clsHnd), TARGET_POINTER_SIZE); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You hoisted this out of the if (strip(corInfoType) == CORINFO_TYPE_CLASS)
condition. Is it legal (or desirable) to call getArgClass
in all cases?
src/jit/lclvars.cpp
Outdated
{ | ||
++info.compArgsCount; | ||
++argRegCount; | ||
}; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this going to double-count the "specials"? Don't they have varDsc that is going to be counted by your new loop below the call to lvaInitArgs()?
src/jit/lclvars.cpp
Outdated
incrementRegCount(varDsc); | ||
|
||
#ifdef (FEATURE_MULTIREG_ARGS && !WINDOWS_AMD64_ABI) | ||
if (varDsc->lvOtherArgReg != REG_NA) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this need to be written:
#if defined(FEATURE_MULTIREG_ARGS) && !defined(WINDOWS_AMD64_ABI)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of !defined(WINDOWS_AMD64_ABI)
I would prefer defined(UNIX_AMD64_ABI)
In reply to: 124107782 [](ancestors = 124107782)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
although it's not clear it needs any platform #ifdef
. If it does, please add a comment as to why.
In reply to: 124108646 [](ancestors = 124108646,124107782)
src/jit/lclvars.cpp
Outdated
// Calculate the argument register usage. | ||
// | ||
// This will later be used for fastTailCall determination | ||
//------------------------------------------------------------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole section should probably be under #if FEATURE_FASTTAILCALL
src/jit/lclvars.cpp
Outdated
for (LclVarDsc* varDsc, unsigned argNum = 0; argNum < info.compMethodInfo->args.numArgs; argNum++, varDsc++) | ||
{ | ||
if (varDsc->lvRegNum != REG_STK) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be lvArgReg? or maybe lvIsRegArg?
src/jit/lclvars.cpp
Outdated
else | ||
{ | ||
stackSize += varDsc->lvSize(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lvSize() can only be used for structs. I think you need a different calculation here for argument stack size.
src/jit/morph.cpp
Outdated
} | ||
|
||
#else // ARM64 | ||
var_types hfaType = GetHfaType(argx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also Windows amd64, which I think you don't want?
src/jit/morph.cpp
Outdated
} | ||
|
||
#elif (_TARGET_AMD64_ && UNIX_AMD64_ABI) || _TARGET_ARM64_ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to be (or, at least, should be) written:
#elif (defined(_TARGET_AMD64_) && defined(UNIX_AMD64_ABI)) || defined(_TARGET_ARM64_)
src/jit/morph.cpp
Outdated
|
||
// For *nix Amd64 and Arm64 check to see if all arguments for the callee | ||
// and caller are passing in registers. If not make sure that the stack | ||
// size for the callee is at less than or equal to the caller's. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is at less [](start = 27, length = 10)
"... is at less ..."
Should the "at" be removed? I might rephrase to say:
If not, ensure that the outgoing argument stack size requirement for the callee is less than or equal to the caller's entire stack frame usage.
(I assume we care about the caller's entire stack frame usage, not the caller's argument stack space usage, which would be less.)
src/jit/morph.cpp
Outdated
// x64 Windows: If we have more callee registers used than MAX_REG_ARG, then | ||
// make sure the callee's incoming arguments is less than the caller's | ||
if ((((calleeArgRegCount + calleeFloatArgRegCount) > maxRegArgs) && | ||
((callerArgRegCount + callerFloatArgRegCount) > (callerArgRegCount + callerFloatArgRegCount))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(callerArgRegCount + callerFloatArgRegCount) > (callerArgRegCount + callerFloatArgRegCount) [](start = 9, length = 91)
There's a problem here:
(callerArgRegCount + callerFloatArgRegCount) > (callerArgRegCount + callerFloatArgRegCount)
looks like a > a
LowerFastTailCall assumes nCallerArgs == nCalleeArgs. This is tracked with https://github.com/dotnet/coreclr/issues/12468. It should be fixed in a seperate PR.
Now shows no diff in fastTailCall decisions on Amd64 Windows.
#endif // FEATURE_UNIX_AMD64_STRUCT_PASSING || _TARGET_ARM64_ | ||
else | ||
{ | ||
if (structDesc.eightByteCount > 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reads better as:
if (structDesc.eightByteCount == 2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved with one comment
# This is the 1st commit message: Remove CoreFX runtest dependency This change will start using build-test.sh generatelayoutonly to build the coreoverlay directory for use with runtest.sh. # This is the commit message #2: Missing ./ # This is the commit message #3: Fix untar location # This is the commit message #4: Use portable builds # This is the commit message #5: Get ci green # This is the commit message #6: Small changes # This is the commit message #7: Small changes # This is the commit message #8: Small changes # This is the commit message #9: Add generateonly builds.sh # This is the commit message #10: Ensure correct test location # This is the commit message #11: netci change # This is the commit message #12: netci change # This is the commit message #13: Working on setting up build-test # This is the commit message #14: Remove generatelayoutonly # This is the commit message #15: Undo some of the changes to use generatelayoutonly # This is the commit message #16: Correctly remove build-test.sh invocation # This is the commit message #17: Fix build-test generatelayoutonly # This is the commit message #18: Fix a few netci issues
Note
This causes 153 new fastTailCall decisions in System.Private.CoreLib. With no new decisions to not fastTailCall. See this list for new fastTailCalled methods.