-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aging stacks #102415
Aging stacks #102415
Conversation
// If TRUE, GC is scheduled cooperatively with this thread. | ||
// NOTE: This "byte" is actually a boolean - we don't allow | ||
// recursive disables. | ||
Volatile<ULONG> m_fPreemptiveGCDisabled; | ||
LONG m_generation; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we fold m_generation
and m_fPreemptiveGCDisabled
together to avoid adding more instructions to PInvoke transitions? m_fPreemptiveGCDisabled
is underutilized field - it is 32-bit, but we only use it as a bool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We flip m_fPreemptiveGCDisabled for various reasons, including when blocking a thread for suspension.
If we reset generation at the same time, I think we will often reset generation too early.
Like a thread was running a pinvoke and then tries to enter coop mode. It sets coop mode optimistically, checks the trap flag, then sees that we are suspending, so it becomes preemptive again and blocks itself. The generation should not be reset to 0 for this thread, since we did not run anything managed yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To handle this case, can you snapshot the generation from m_fPreemptiveGCDisabled when starting the thread suspension, so that threads that wake up and block while the runtime is being suspended still get the benefit of the aging stacks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current pattern for resetting generation on entering managed code is:
- set coop mode
- check if we need to do slow path
- if do slow path, then then let the slow path deal with resetting generation, possibly after resuming.
- otherwise, reset generation and run managed code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I understand that. I am wondering whether there is a way to do this without adding work to the PInvoke transitions. The PInvoke transitions in CoreCLR are more expensive than they need to be already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There may also be a concern with generation switching while we do GC.
Like we see a thread blocked in preemptive and it is old, so we do not mark it. Then the thread wakes up and tries to enter coop mode, and blocks. If it resets age to 0 before blocking, we could report its roots at relocation stage. It should be harmless, as the roots are too old, but it is one more scenario to think about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to think if a case when a thread goes out of preemptive mode concurrently with us doing suspension/GC is something we should worry about.
If the actual target scenario is the threads that are blocked/preemptive for longer than one GC cycle, it might not matter if pinvoke resets generation together with disabling preemptive mode.
Then we can just fold m_fPreemptiveGCDisabled
and m_generation
in on word.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized that this scheme would not work for native aot where we have Thread::m_pTransitionFrame
that is a full pointer. We may want to put together design for the ideal most efficient PInvoke transition and eventually unify all runtimes on it.
/azp run runtime-coreclr gcstress-extra, runtime-coreclr gcstress0x3-gcstress0xc |
Azure Pipelines successfully started running 2 pipeline(s). |
/azp run runtime-coreclr gcstress-extra, runtime-coreclr gcstress0x3-gcstress0xc |
Azure Pipelines successfully started running 2 pipeline(s). |
/azp run runtime-coreclr gcstress-extra, runtime-coreclr gcstress0x3-gcstress0xc |
Azure Pipelines successfully started running 2 pipeline(s). |
/azp run runtime-coreclr gcstress-extra, runtime-coreclr gcstress0x3-gcstress0xc |
Azure Pipelines successfully started running 2 pipeline(s). |
/azp run runtime-coreclr gcstress-extra, runtime-coreclr gcstress0x3-gcstress0xc |
Azure Pipelines successfully started running 2 pipeline(s). |
If we reset the age unconditionally, we could fold the age and coop flag on CoreCLR into one field. However - we probably should find a way to measure the impact first - to see what we could possibly gain through optimizing it. Resetting the age, as it is right now, is a single instruction on all platforms, so impact on code size is minimal. It is also on a highly predictable branch (trap check is nearly always negative). It assigns into a location that no other thread writes or reads, as long as GC is not happening. It is also likely to be on the same cache line as the other things that are assigned - frame, state,... One extra such assignment could be basically free, so it would be interesting to measure. |
The scheme used by NativeAOT seems to be optimal. At minimum we need to post a transition frame and that is what we do. The presence of the frame also acts as a flag that the thread is in preemptive mode. On CoreCLR the frames chain is a continuous list, so the presence of a frame is not enough to tell if a thread is in preemptive mode, thus we need to change the state, although that part should be cheap. I think this is worth exploring, but the current invariant seems to be going quite deep into the runtime design, so taking it out could be fairly involved. |
The GCPROTECT macros are a separate linked list. (We made this change a few years ago. The GCPROTECT macros were part of the Frame list originally, but it caused all sort of problems.) |
/azp run runtime-coreclr gcstress-extra, runtime-coreclr gcstress0x3-gcstress0xc |
Azure Pipelines successfully started running 2 pipeline(s). |
Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it. |
Stacks can be aged in a similar way as we age GC handles.
Stacks of threads that did not run managed code since the last GC promoted everything to Gen-N cannot possibly have anything of interest to the next GC of generation < N, since all the objects rooted by the stack would be too old.
(a stack can`t obtain new roots if its thread is not running managed code)
There are threads that do not run often (finalizer, extra threadpool threads, threads waiting on something). In some scenarios the number of such threads may go into hundreds.
Stackwalking those threads in ephemeral GCs would generally yield no actionable roots. Aging will allow to skip these stacks.