Skip to content

Commit 3a457cb

Browse files
authored
Improve call counting mechanism (#1457)
Improve call counting mechanism - Call counting through the prestub is fairly expensive and can be seen immediately after call counting begins - Added call counting stubs. When starting call counting for a method: - A `CallCountingInfo` is created and initializes a remaining call count with a threshold - A `CallCountingStub` is created. It contains a small amount of code that decrements the remaining call count and checks for zero. When nonzero, it jumps to the code version's native code entry point. When zero, it forwards to a helper function that handles tier promotion. - When the call count threshold is reached, the helper call enqueues completion of call counting for background processing - When completing call counting, the code version is enqueued for promotion, and the call counting stub is removed from the call chain - Once all work queued for promotion is completed and methods transitioned to optimized tier, call counting stubs are deleted based on some heuristics and under runtime suspension - The `CallCountingManager` is the main class with most of the logic. Its private subclasses are just simple data structures. - Call counting is done at a `NativeCodeVersion` level (stub association is with the code version) - The code versioning lock is used for data structures used for call counting. Since installing a call counting stub requires that we know what the currently active code version is, it made sense to use the same lock. - Call counting stubs have hardcoded code. x64 has short and long stubs, short stubs are used when possible (often) and use IP-relative branches to the method's code and helper stub. Other platforms have only one type of stub (a short stub). - For tiered methods that don't have a precode (virtual and interface methods), a forwarder stub (a precode) is created and it forwards to the call counting stub. This is so that the call counting stub can be safely and easily deleted. The forwarder stubs are only used when counting calls, there is one per method (not per code version), and they are not deleted. See `CallCountingManager::SetCodeEntryPoint()` for more info. - The `OnCallCountThresholdReachedStub()` takes a "stub-identifying token". The helper call gets a stub address from it, and tells whether it's a short or long stub. From the stub, the remaining call count pointer is used to get the `CallCountingInfo`, and from it gets the `NativeCodeVersion` associated with the stub. - The `CallCountingStubManager` traces through a call counting stub so that VS-like debuggers can step into a method through the call counting stub - Exceptions (OOM) - On foreground threads, exceptions are propagated unless they can be handled without any compromise - On background threads, exceptions are caught and logged as before. Tried to limit scope of exception to one per method or code version such that a loop over many would not all be aborted by one exception. - Fixed a latent race where a method is recorded for call counting and then the method's code entry point is set to tier 0 code - With that order, the tiering delay may expire and the method's entry point may be updated for call counting in the background before the code entry point is set by the recording thread, and that last action would disable call counting for the method and cause it to not be optimized. The only thing protecting from this happening was the delay itself, but a configured shorter delay increases the possibility of this happening. - Inverted the order such that the method's code entry point is set before recording it for call counting, both on first and subsequent calls - Changed the tiered compilation lock to be an any-GC-mode lock so that it can be taken inside the code versioning lock, as some things were more naturally placed inside the code versioning lock where we know the active code version, like checking for the tiering delay to delay call counting and promoting the code version when the call count threshold is reached - Unfortunately, that makes code inside the lock a GC-no-trigger scope and things like scheduling a timer or queuing a work item to the thread pool could not be done inside that scope. This tradeoff seems to be better than alternatives, so refactored those pieces to occur outside that scope. - Publishing an entry point after changing the active code version now takes call counting into account, fixes https://github.com/dotnet/coreclr/issues/22426 - After the changes: - Call counting overhead is much smaller and is not many orders of magnitude greater than a method call - Some config modes for tuning tiering are now much more reasonable and do not affect perf negatively nearly as much as before - increasing call count threshold, disabling or decreasing the tiering delay. Enables dynamic thresholds in the future, which is not feasible due to the overhead currently. - No change to startup or steady-state perf - Left for later - Eventing work to report call counting stub code ranges and method name (also needs to be done for other stubs) - Some tests that consume events to verify run-time behavior in a few config modes - Debugger test to verify debugging while call-counting. Debugger tests also need to be fixed for tiering. - The call count threshold has not been changed for now. As we don't have many tests that measure the performance in-between startup and steady-state, some will need to be created maybe from existing tests, to determine the effects - Fixes https://github.com/dotnet/coreclr/issues/23596
1 parent 531feac commit 3a457cb

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

54 files changed

+4116
-1476
lines changed

docs/design/features/code-versioning.md

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -330,11 +330,7 @@ to update the active child at either of those levels (ReJIT uses SetActiveILCode
330330
In order to do step 3 the `CodeVersionManager` relies on one of three different mechanisms, a `FixupPrecode`, a `JumpStamp`, or backpatching entry point slots. In [method.hpp](https://github.com/dotnet/coreclr/blob/master/src/vm/method.hpp) these mechanisms are described in the `MethodDesc::IsVersionableWith*()` functions, and all methods have been classified to use at most one of the techniques, based on the `MethodDesc::IsVersionableWith*()` functions.
331331

332332
### Thread-safety ###
333-
CodeVersionManager is designed for use in a free-threaded environment, in many cases by requiring the caller to acquire a lock before calling. This lock can be acquired by constructing an instance of the
334-
335-
```
336-
CodeVersionManager::TableLockHolder(CodeVersionManager*)
337-
```
333+
CodeVersionManager is designed for use in a free-threaded environment, in many cases by requiring the caller to acquire a lock before calling. This lock can be acquired by constructing an instance of `CodeVersionManager::LockHolder`.
338334

339335
in some scope for the CodeVersionManager being operated on. CodeVersionManagers from different domains should not have their locks taken by the same thread with one exception, it is OK to take the shared domain manager lock and one AppDomain manager lock in that order. The lock is required to change the shape of the tree or traverse it but not to read/write configuration properties from each node. A few special cases:
340336

src/coreclr/src/debug/daccess/request.cpp

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4281,7 +4281,7 @@ HRESULT ClrDataAccess::GetPendingReJITID(CLRDATA_ADDRESS methodDesc, int *pRejit
42814281
PTR_MethodDesc pMD = PTR_MethodDesc(TO_TADDR(methodDesc));
42824282

42834283
CodeVersionManager* pCodeVersionManager = pMD->GetCodeVersionManager();
4284-
CodeVersionManager::TableLockHolder lock(pCodeVersionManager);
4284+
CodeVersionManager::LockHolder codeVersioningLockHolder;
42854285
ILCodeVersion ilVersion = pCodeVersionManager->GetActiveILCodeVersion(pMD);
42864286
if (ilVersion.IsNull())
42874287
{
@@ -4313,7 +4313,7 @@ HRESULT ClrDataAccess::GetReJITInformation(CLRDATA_ADDRESS methodDesc, int rejit
43134313
PTR_MethodDesc pMD = PTR_MethodDesc(TO_TADDR(methodDesc));
43144314

43154315
CodeVersionManager* pCodeVersionManager = pMD->GetCodeVersionManager();
4316-
CodeVersionManager::TableLockHolder lock(pCodeVersionManager);
4316+
CodeVersionManager::LockHolder codeVersioningLockHolder;
43174317
ILCodeVersion ilVersion = pCodeVersionManager->GetILCodeVersion(pMD, rejitId);
43184318
if (ilVersion.IsNull())
43194319
{
@@ -4365,7 +4365,7 @@ HRESULT ClrDataAccess::GetProfilerModifiedILInformation(CLRDATA_ADDRESS methodDe
43654365
PTR_MethodDesc pMD = PTR_MethodDesc(TO_TADDR(methodDesc));
43664366

43674367
CodeVersionManager* pCodeVersionManager = pMD->GetCodeVersionManager();
4368-
CodeVersionManager::TableLockHolder lock(pCodeVersionManager);
4368+
CodeVersionManager::LockHolder codeVersioningLockHolder;
43694369
ILCodeVersion ilVersion = pCodeVersionManager->GetActiveILCodeVersion(pMD);
43704370
if (ilVersion.GetRejitState() != ILCodeVersion::kStateActive || !ilVersion.HasDefaultIL())
43714371
{
@@ -4398,7 +4398,7 @@ HRESULT ClrDataAccess::GetMethodsWithProfilerModifiedIL(CLRDATA_ADDRESS mod, CLR
43984398

43994399
PTR_Module pModule = PTR_Module(TO_TADDR(mod));
44004400
CodeVersionManager* pCodeVersionManager = pModule->GetCodeVersionManager();
4401-
CodeVersionManager::TableLockHolder lock(pCodeVersionManager);
4401+
CodeVersionManager::LockHolder codeVersioningLockHolder;
44024402

44034403
LookupMap<PTR_MethodTable>::Iterator typeIter(&pModule->m_TypeDefToMethodTableMap);
44044404
for (int i = 0; typeIter.Next(); i++)

src/coreclr/src/debug/ee/debugger.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3634,7 +3634,7 @@ HRESULT Debugger::SetIP( bool fCanSetIPOnly, Thread *thread,Module *module,
36343634

36353635
CodeVersionManager *pCodeVersionManager = module->GetCodeVersionManager();
36363636
{
3637-
CodeVersionManager::TableLockHolder lock(pCodeVersionManager);
3637+
CodeVersionManager::LockHolder codeVersioningLockHolder;
36383638
ILCodeVersion ilCodeVersion = pCodeVersionManager->GetActiveILCodeVersion(module, mdMeth);
36393639
if (!ilCodeVersion.IsDefaultVersion())
36403640
{

src/coreclr/src/debug/ee/functioninfo.cpp

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -933,7 +933,7 @@ void DebuggerJitInfo::LazyInitBounds()
933933
LOG((LF_CORDB,LL_EVERYTHING, "DJI::LazyInitBounds: this=0x%x GetBoundariesAndVars success=0x%x\n", this, fSuccess));
934934

935935
// SetBoundaries uses the CodeVersionManager, need to take it now for lock ordering reasons
936-
CodeVersionManager::TableLockHolder lockHolder(mdesc->GetCodeVersionManager());
936+
CodeVersionManager::LockHolder codeVersioningLockHolder;
937937
Debugger::DebuggerDataLockHolder debuggerDataLockHolder(g_pDebugger);
938938

939939
if (!m_fAttemptInit)
@@ -1059,7 +1059,7 @@ void DebuggerJitInfo::SetBoundaries(ULONG32 cMap, ICorDebugInfo::OffsetMapping *
10591059
// Pick a unique initial value (-10) so that the 1st doesn't accidentally match.
10601060
int ilPrevOld = -10;
10611061

1062-
_ASSERTE(m_nativeCodeVersion.GetMethodDesc()->GetCodeVersionManager()->LockOwnedByCurrentThread());
1062+
_ASSERTE(CodeVersionManager::IsLockOwnedByCurrentThread());
10631063

10641064
InstrumentedILOffsetMapping mapping;
10651065

@@ -1606,8 +1606,8 @@ DebuggerJitInfo *DebuggerMethodInfo::FindOrCreateInitAndAddJitInfo(MethodDesc* f
16061606
NativeCodeVersion nativeCodeVersion;
16071607
if (fd->IsVersionable())
16081608
{
1609-
CodeVersionManager::TableLockHolder lockHolder(fd->GetCodeVersionManager());
16101609
CodeVersionManager *pCodeVersionManager = fd->GetCodeVersionManager();
1610+
CodeVersionManager::LockHolder codeVersioningLockHolder;
16111611
nativeCodeVersion = pCodeVersionManager->GetNativeCodeVersion(fd, startAddr);
16121612
if (nativeCodeVersion.IsNull())
16131613
{
@@ -2087,7 +2087,7 @@ void DebuggerMethodInfo::CreateDJIsForMethodDesc(MethodDesc * pMethodDesc)
20872087
CodeVersionManager* pCodeVersionManager = pMethodDesc->GetCodeVersionManager();
20882088
// grab the code version lock to iterate available versions of the code
20892089
{
2090-
CodeVersionManager::TableLockHolder lock(pCodeVersionManager);
2090+
CodeVersionManager::LockHolder codeVersioningLockHolder;
20912091
NativeCodeVersionCollection nativeCodeVersions = pCodeVersionManager->GetNativeCodeVersions(pMethodDesc);
20922092

20932093
for (NativeCodeVersionIterator itr = nativeCodeVersions.Begin(), end = nativeCodeVersions.End(); itr != end; itr++)

src/coreclr/src/inc/CrstTypes.def

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -280,7 +280,7 @@ Crst NativeImageCache
280280
End
281281

282282
Crst GCCover
283-
AcquiredBefore LoaderHeap ReJITDomainTable
283+
AcquiredBefore LoaderHeap CodeVersioning
284284
End
285285

286286
Crst GCMemoryPressure
@@ -486,7 +486,7 @@ Crst Reflection
486486
End
487487

488488
// Used to synchronize all rejit information stored in a given AppDomain.
489-
Crst ReJITDomainTable
489+
Crst CodeVersioning
490490
AcquiredBefore LoaderHeap SingleUseLock DeadlockDetection JumpStubCache DebuggerController FuncPtrStubs
491491
AcquiredAfter ReJITGlobalRequest ThreadStore GlobalStrLiteralMap SystemDomain DebuggerMutex MethodDescBackpatchInfoTracker
492492
End
@@ -495,7 +495,7 @@ End
495495
// new functions to rejit tables, or request Reverts on existing functions in the rejit
496496
// tables. One of these crsts exist per runtime.
497497
Crst ReJITGlobalRequest
498-
AcquiredBefore ThreadStore ReJITDomainTable SystemDomain JitInlineTrackingMap
498+
AcquiredBefore ThreadStore CodeVersioning SystemDomain JitInlineTrackingMap
499499
End
500500

501501
// ETW infrastructure uses this crst to protect a hash table of TypeHandles which is
@@ -679,7 +679,7 @@ Crst InlineTrackingMap
679679
End
680680

681681
Crst JitInlineTrackingMap
682-
AcquiredBefore ReJITDomainTable ThreadStore LoaderAllocator
682+
AcquiredBefore CodeVersioning ThreadStore LoaderAllocator
683683
End
684684

685685
Crst EventPipe
@@ -695,6 +695,7 @@ Crst ReadyToRunEntryPointToMethodDescMap
695695
End
696696

697697
Crst TieredCompilation
698+
AcquiredAfter CodeVersioning
698699
AcquiredBefore ThreadpoolTimerQueue
699700
End
700701

src/coreclr/src/inc/clrconfigvalues.h

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -633,10 +633,17 @@ RETAIL_CONFIG_DWORD_INFO(INTERNAL_HillClimbing_GainExponent,
633633
RETAIL_CONFIG_DWORD_INFO(EXTERNAL_TieredCompilation, W("TieredCompilation"), 1, "Enables tiered compilation")
634634
RETAIL_CONFIG_DWORD_INFO(EXTERNAL_TC_QuickJit, W("TC_QuickJit"), 1, "For methods that would be jitted, enable using quick JIT when appropriate.")
635635
RETAIL_CONFIG_DWORD_INFO(UNSUPPORTED_TC_QuickJitForLoops, W("TC_QuickJitForLoops"), 0, "When quick JIT is enabled, quick JIT may also be used for methods that contain loops.")
636+
RETAIL_CONFIG_DWORD_INFO(EXTERNAL_TC_AggressiveTiering, W("TC_AggressiveTiering"), 0, "Transition through tiers aggressively.")
636637
RETAIL_CONFIG_DWORD_INFO(INTERNAL_TC_CallCountThreshold, W("TC_CallCountThreshold"), 30, "Number of times a method must be called in tier 0 after which it is promoted to the next tier.")
637638
RETAIL_CONFIG_DWORD_INFO(INTERNAL_TC_CallCountingDelayMs, W("TC_CallCountingDelayMs"), 100, "A perpetual delay in milliseconds that is applied call counting in tier 0 and jitting at higher tiers, while there is startup-like activity.")
638639
RETAIL_CONFIG_DWORD_INFO(INTERNAL_TC_DelaySingleProcMultiplier, W("TC_DelaySingleProcMultiplier"), 10, "Multiplier for TC_CallCountingDelayMs that is applied on a single-processor machine or when the process is affinitized to a single processor.")
639640
RETAIL_CONFIG_DWORD_INFO(INTERNAL_TC_CallCounting, W("TC_CallCounting"), 1, "Enabled by default (only activates when TieredCompilation is also enabled). If disabled immediately backpatches prestub, and likely prevents any promotion to higher tiers")
641+
RETAIL_CONFIG_DWORD_INFO(INTERNAL_TC_UseCallCountingStubs, W("TC_UseCallCountingStubs"), 1, "Uses call counting stubs for faster call counting.")
642+
#ifdef _DEBUG
643+
RETAIL_CONFIG_DWORD_INFO(INTERNAL_TC_DeleteCallCountingStubsAfter, W("TC_DeleteCallCountingStubsAfter"), 1, "Deletes call counting stubs after this many have completed. Zero to disable deleting.")
644+
#else
645+
RETAIL_CONFIG_DWORD_INFO(INTERNAL_TC_DeleteCallCountingStubsAfter, W("TC_DeleteCallCountingStubsAfter"), 4096, "Deletes call counting stubs after this many have completed. Zero to disable deleting.")
646+
#endif
640647
#endif
641648

642649
///

0 commit comments

Comments
 (0)