-
Notifications
You must be signed in to change notification settings - Fork 5k
Replace FEATURE_EH_FUNCLETS in JIT with runtime switch #99191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsEverything but win-x86 already had
|
@@ -1503,10 +1503,11 @@ void CodeGen::genExitCode(BasicBlock* block) | |||
void CodeGen::genJumpToThrowHlpBlk(emitJumpKind jumpKind, SpecialCodeKind codeKind, BasicBlock* failBlk) | |||
{ | |||
bool useThrowHlpBlk = compiler->fgUseThrowHelperBlocks(); | |||
#if defined(UNIX_X86_ABI) && defined(FEATURE_EH_FUNCLETS) | |||
#if defined(UNIX_X86_ABI) | |||
// TODO: Is this really UNIX_X86_ABI specific? Should we guard with compiler->UsesFunclets() instead? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be double-checked along with
runtime/src/coreclr/jit/morph.cpp
Lines 14299 to 14304 in ce6cf4b
if (info.compXcptnsCount > 0) | |
{ | |
assert(!codeGen->isGCTypeFixed()); | |
// Enforce fully interruptible codegen for funclet unwinding | |
SetInterruptible(true); | |
} |
Is any of that relevant for win-x86 with funclets?
cc @jakobbotsch @SingleAccretion since you seemed to favor this over having a separate JIT flavor. |
Ideally, we would switch win-x86 to the funclet plan and delete !FEATURE_EH_FUNCLETS code everywhere. @janvorli Have you thought about what it would take? |
It would be good to check where the throughput cost is coming from (perhaps some of the new runtime funclet checks can be hoisted out). |
cc @dotnet/jit-contrib |
The original plan was to leave the win x86 on the old plan and not to spend time porting that to the new EH. Is there a reason for modernizing this almost obsolete platform? |
Windows x86 support is here to stay for many years. The reason for switching over to the unified model would be code base simplification. We keep spending time on dealing with the difference, as this PR demonstrates. @filipnavara has been working on enabling native AOT for Windows x86 that uses the funclet model. Our testing strategy for native AOT assumes that the codegen differences are kept to absolute minimum. Switching regular Windows x86 to the funclet model would help with that. |
I checked the methods that are contributing to the TP regression in Tier0 for benchmarks.run_pgo. The breakdown looks like: Base: 17457576750, Diff: 17528444068, +0.4059%
20569264 : +6.76% : 17.35% : +0.1178% : public: unsigned int __thiscall emitter::emitOutputInstr(struct insGroup *, struct emitter::instrDesc *, unsigned char **)
14592023 : NA : 12.31% : +0.0836% : private: struct insGroup * __thiscall emitter::emitAllocIG(void)
14141188 : NA : 11.93% : +0.0810% : public: void __thiscall CodeGen::siOpenScopesForNonTrackedVars(struct BasicBlock const *, unsigned int)
7645212 : NA : 6.45% : +0.0438% : public: void __thiscall emitter::emitBegFN(bool)
3468762 : +5.70% : 2.93% : +0.0199% : public: unsigned int __thiscall GCInfo::gcMakeRegPtrTable(unsigned char *, int, struct InfoHdr const &, unsigned int, unsigned int *)
3332472 : NA : 2.81% : +0.0191% : public: bool __thiscall Compiler::UsesFunclets(void)
3085243 : +40.50% : 2.60% : +0.0177% : public: void __thiscall emitter::emitCreatePlaceholderIG(enum insGroupPlaceholderType, struct BasicBlock *, unsigned int *const &, unsigned int, unsigned int, bool)
2923051 : +24.12% : 2.47% : +0.0167% : public: void __thiscall GCInfo::gcCountForHeader(unsigned int *, unsigned int *)
2415660 : +3.65% : 2.04% : +0.0138% : protected: void __thiscall CodeGen::genCall(struct GenTreeCall *)
2128950 : +147.06% : 1.80% : +0.0122% : private: void __thiscall emitter::emitNxtIG(bool)
1950494 : +4.12% : 1.65% : +0.0112% : private: unsigned int __thiscall GCInfo::gcInfoBlockHdrSave(unsigned char *, int, unsigned int, unsigned int, unsigned int, struct InfoHdr *, int *)
1760517 : NA : 1.48% : +0.0101% : public: void __thiscall Compiler::funSetCurrentFunc(unsigned int)
1701900 : +14.19% : 1.44% : +0.0097% : private: void * __thiscall emitter::emitAddLabel(unsigned int *const &, unsigned int, unsigned int)
1597287 : +2.00% : 1.35% : +0.0091% : private: void __thiscall LinearScan::setBlockSequence(void)
1414600 : NA : 1.19% : +0.0081% : public: struct FuncInfoDsc * __thiscall Compiler::funCurrentFunc(void)
1332289 : +3.57% : 1.12% : +0.0076% : public: void __thiscall Compiler::lvaAssignVirtualFrameOffsetsToLocals(void)
1253994 : +39.81% : 1.06% : +0.0072% : public: enum PhaseStatus __thiscall Compiler::lvaMarkLocalVars(void)
1060950 : +22.73% : 0.89% : +0.0061% : public: void __thiscall CodeGen::genEmitMachineCode(void)
1057289 : +5.74% : 0.89% : +0.0061% : public: void __thiscall CodeGen::genEmitUnwindDebugGCandEH(void)
979839 : NA : 0.83% : +0.0056% : protected: void __thiscall Compiler::impImportLeaveEHRegions(struct BasicBlock *)
926607 : +1.81% : 0.78% : +0.0053% : protected: void __thiscall CodeGen::genFnProlog(void)
919488 : +12.16% : 0.78% : +0.0053% : public: void __thiscall emitter::emitGeneratePrologEpilog(void)
535926 : +8.21% : 0.45% : +0.0031% : protected: void __thiscall CodeGen::genExitCode(struct BasicBlock *)
512478 : NA : 0.43% : +0.0029% : public: unsigned int __thiscall Compiler::bbThrowIndex(struct BasicBlock *)
391775 : +0.28% : 0.33% : +0.0022% : public: unsigned int __thiscall emitter::emitEndCodeGen(class Compiler *, bool, bool, bool, unsigned int, unsigned int *, unsigned int *, void **, void **, void **, void **, void **, void **)
390696 : +6.24% : 0.33% : +0.0022% : public: void __thiscall Compiler::fgFindBasicBlocks(void)
289133 : NA : 0.24% : +0.0017% : public: enum PhaseStatus __thiscall Compiler::fgMergeFinallyChains(void)
283556 : +0.91% : 0.24% : +0.0016% : protected: void __thiscall Compiler::compCompile(void **, unsigned int *, class JitFlags *)
267951 : +4.55% : 0.23% : +0.0015% : public: void __thiscall emitter::emitBegPrologEpilog(struct insGroup *)
240221 : +0.98% : 0.20% : +0.0014% : protected: void __thiscall CodeGen::genCheckUseBlockInit(void)
218487 : +0.84% : 0.18% : +0.0013% : public: void __thiscall Compiler::compInit(class ArenaAllocator *, struct CORINFO_METHOD_STRUCT_*, class ICorJitInfo *, struct CORINFO_METHOD_INFO *, struct InlineInfo *)
208706 : +1.19% : 0.18% : +0.0012% : protected: void __thiscall CodeGen::genInitialize(void)
182117 : +2.21% : 0.15% : +0.0010% : public: enum PhaseStatus __thiscall Compiler::fgAddInternal(void)
-141778 : -33.31% : 0.12% : -0.0008% : __scrt_stub_for_is_c_termination_complete
-141848 : -100.00% : 0.12% : -0.0008% : public: void __thiscall Compiler::fgNormalizeEH(void)
-923488 : -89.05% : 0.78% : -0.0053% : protected: void __thiscall Compiler::impImportLeave(struct BasicBlock *)
-2406826 : -20.66% : 2.03% : -0.0138% : private: void __thiscall emitter::emitGenIG(struct insGroup *)
-2627780 : -1.42% : 2.22% : -0.0151% : protected: void __thiscall CodeGen::genCodeForBBlist(void)
-4615901 : -29.00% : 3.89% : -0.0264% : public: void __thiscall CodeGen::genGenerateMachineCode(void)
-12943600 : -100.00% : 10.92% : -0.0741% : private: void __thiscall emitter::emitNewIG(void) A number of cases in the top look like different inlining decisions that we'd expect to change regardless with native PGO. |
We should update the documentation that states that Windows/x86 does not use funclets: https://github.com/dotnet/runtime/blob/main/docs/design/coreclr/botr/clr-abi.md#funclets |
Aren't they still used in CoreCLR? I though this only disabled them for NativeAOT. |
That document covers NativeAOT as well as CoreCLR. |
I have done a bit of work towards this: https://github.com/jkotas/runtime/tree/x86funclets (compiles, does not work and needs cleanup of NYIs) |
f23fe27
to
87da528
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update this PR to fix the merge conflicts, and so the testing + asmdiffs/TP diffs can run again?
src/coreclr/jit/compiler.h
Outdated
return UsesFunclets(); | ||
} | ||
#else | ||
bool UsesFunclets() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
bool UsesFunclets() | |
static bool UsesFunclets() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is semi-intentional. The compiler will inline it for non-x86 configurations anyway, so the static
doesn't have an effect on the resulting code. The absence of static
does, however, prevent it from calling Compiler:: UsesFunclets()
by accident and only discovering it when the x86 JIT is built on CI.
src/coreclr/jit/compiler.h
Outdated
return true; | ||
} | ||
|
||
bool UsesCallfinallyThunks() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
bool UsesCallfinallyThunks() | |
static bool UsesCallFinallyThunks() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same reason as #99191 (comment)
I'll do it first thing tomorrow morning along with applying your suggestions. |
36611a4
to
2584068
Compare
No asm diffs (expected). Small TP diffs on all non-x86 platforms, mostly improvements. Surprising there is any change here? Big TP regressions on x86, especially MinOpts: up to almost 0.5%. I wonder if this could be significantly improved by changing:
such that we compute the result early (maybe in
|
The TP diff on x86 are mostly inlining decisions that would like be reduced with a new PGO profile. At least that was the conclusion when @jakobbotsch looked into it. I can try some variations to see if it makes a difference but other unrelated PRs saw similar patterns on small changes. |
Our superpmi-diffs pipeline disables native PGO (otherwise we'd never get "apples-to-apples" comparison). I tried the small suggestion I made above here: (after rebasing this PR on top of upstream/main, and fixing the conflicts) and it seemed to greatly reduce the x86 TP regression. |
7d37a01
to
4671c3c
Compare
Thanks for testing it. I did another rebase and cherry-picked your change. |
The notable win-x86 TP diffs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* Replace FEATURE_EH_FUNCLETS/FEATURE_EH_CALLFINALLY_THUNKS in JIT with runtime switch * Cache Native AOT ABI check to see if TP improves --------- Co-authored-by: Bruce Forstall <brucefo@microsoft.com>
Everything but win-x86 already had
FEATURE_EH_FUNCLETS
defined unconditionally. For win-x86 there's a newFEATURE_EH_X86_FRAMES
switch defined in targetx86.h (shortcut forTARGET_X86 && !UNIX_X86_ABI
). Funclet code is always compiled in and controlled at runtime withCompiler::UsesFunclets()
. On win-x86 it's controlled by NativeAOT ABI, everywhere else it's unconditionallytrue
.