Fix PerfMap crash when Enable IPC command is sent early in startup#124055
Fix PerfMap crash when Enable IPC command is sent early in startup#124055
Conversation
AppDomain is created in SystemDomain::Attach, which runs after DiagnosticServerAdapter::Initialize. A PerfMapEnable IPC command received during PauseForDiagnosticsMonitor would crash calling IterateAssembliesEx on null. Safe to skip: no assemblies are loaded before SystemDomain::Attach.
EEJitManager is created in ExecutionManager::Init, which runs after DiagnosticServerAdapter::Initialize. A PerfMapEnable IPC command received during PauseForDiagnosticsMonitor would crash on null dereference. Safe to skip: no JIT'd code exists before ExecutionManager::Init. CodeVersionManager::LockHolder remains inside the null check intentionally - the lock is only needed while iterating code versions, not for the null check. The lock is already initialized in CodeVersionManager::StaticInitialize before DiagnosticServerAdapter::Initialize.
e94322b to
62d36a5
Compare
There was a problem hiding this comment.
Pull request overview
Fixes a CoreCLR startup crash when a PerfMapEnable IPC command arrives very early (before AppDomain and/or EEJitManager are initialized), by making PerfMap::Enable(sendExisting: true) resilient to those components being unavailable.
Changes:
- Guard assembly enumeration behind a null-check for
GetAppDomain()to avoid callingIterateAssembliesExbeforeSystemDomain::Attach(). - Guard JIT code heap enumeration behind a null-check for
ExecutionManager::GetEEJitManager()to avoid iterating heaps beforeExecutionManager::Init(). - Reuse the
EEJitManager*local when building theCodeHeapIterator.
|
Tagging subscribers to this area: @agocke |
|
|
||
| // Similarly, the EEJitManager may not exist yet if called before ExecutionManager::Init(). | ||
| // In this case, there's no JIT'd code to log anyway. | ||
| EEJitManager *pJitManager = ExecutionManager::GetEEJitManager(); |
There was a problem hiding this comment.
m_pEEJitManager that this returns is not Volatile so this still has a race conditions due to memory model issues. I do not think we want to make the statics like m_pEEJitManager Volatile.
I assume that we have a good reason to start listening to initialize perfmap very early before basic subsystems like code manager are ready. Is that correct?
If it is the case, I would introduce a new static Volatile<bool> s_perfmapReady set after we have everything in place to JIT managed code, and do an early out check using this bool variable.
There was a problem hiding this comment.
we have a good reason to start listening to initialize perfmap very early before basic subsystems like code manager are ready. Is that correct?
I was aiming for simple changes that would make the runtime resilient to early PerfMap Enable commands, rather than adding logic to queue the EnablePerfMap request dependencies are all initiailized. Currently the only known instance of early perfmap enable is record-trace, but the crash behavior would still exist, as repro'd by DiagnosticClient.EnablePerfMap on an app suspended with DOTNET_DefaultDiagnosticPortSuspend=1.
Description
Fixes #123438
PerfMap::Enable(sendExisting=true) crashes when a PerfMapEnable IPC command is received very early in startup (e.g., during PauseForDiagnosticsMonitor). At this point in the startup sequence:
Both are initialized after DiagnosticServerAdapter::Initialize, so IPC commands can arrive before these are ready.
Changes
Both are safe to skip when null - no assemblies are loaded and no code is JIT'd at that point in startup. Future methods will still be logged as they are JIT'd.
Startup Timeline
Testing
mdh1418 validated on a wsl2 instance, pausing the startup logic with
DOTNET_DefaultDiagnosticPortSuspend=1and kicking off a separate app invoking DiagnosticClient.EnablePerfMap. Before the change, it would sigsegv and crash.After the changes, its resilient and doesn't crash.
Original prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.