Skip to content

Don't suspend for debugger while holding the slot backpatching lock #40060

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 4, 2020

Conversation

kouvel
Copy link
Contributor

@kouvel kouvel commented Jul 29, 2020

  • Mostly reverted the previous workaround for the issue (commit fc06054)
  • Added a forbid-suspend-for-debugger region in preemptive GC mode
  • Added a crst holder type that acquires a lock and enters the forbid region
  • Where the slot backpatching lock would be taken where cooperative GC mode may be entered inside that lock, the new crst holder is used
  • When a suspend for debugger is requested, a thread in preemptive GC mode that is in the forbid region is considered not yet suspended and is synched later similarly to threads in cooperative GC mode

Fixes #37278

@kouvel kouvel added this to the 5.0.0 milestone Jul 29, 2020
@kouvel kouvel self-assigned this Jul 29, 2020
@kouvel
Copy link
Contributor Author

kouvel commented Jul 29, 2020

CC @janvorli @mangod9 @noahfalk

@kouvel kouvel force-pushed the TierDebugFix branch 3 times, most recently from f496527 to 1336c15 Compare July 29, 2020 05:04
@mangod9
Copy link
Member

mangod9 commented Jul 29, 2020

Was the previous fix incomplete which led to the issue re-occurring due to timing?

@kouvel
Copy link
Contributor Author

kouvel commented Jul 29, 2020

Was the previous fix incomplete which led to the issue re-occurring due to timing?

The underlying issue was introduced when we started using cooperative GC mode for the slot backpatching info data structure. It was not obvious that such an issue would arise. The workaround used in the previous fix was a partial fix to the most problematic occurrences of the issue, we did not come up with a complete fix at the time after a fair bit of discussion, and the workaround was never intended to be a complete fix for the issue. The issue appears to be more problematic in 5.0 due to differences in the ways in which the lock is used. This change should be a complete fix for the issue.

- Mostly reverted the previous workaround for the issue (commit fc06054)
- Added a forbid-suspend-for-debugger region in preemptive GC mode
- Added a crst holder type that acquires a lock and enters the forbid region
- Where the slot backpatching lock would be taken where cooperative GC mode may be entered inside that lock, the new crst holder is used
- When a suspend for debugger is requested, a thread in preemptive GC mode that is in the forbid region is considered not yet suspended and is synched later similarly to threads in cooperative GC mode

Fixes #37278
@hoyosjs
Copy link
Member

hoyosjs commented Jul 29, 2020

cc: @gregg-miskelly

Copy link
Member

@janvorli janvorli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

Copy link
Member

@noahfalk noahfalk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Kount! This looks good to me. Given the subtleties in the area I don't have high confidence that I would spot an issue if there was one, but I am OK accepting that risk.

Long term this still feels a bit "princess and the pea" to me. The pea is that we designed a low-level data structure that requires co-op mode to access it, and then we've layered on a bunch of complex locking/suspend logic to resolve the undesirable effects (the mattresses in the analogy). As I recall the original purpose of the co-op mode datastructure was to simplify some lifetime management for state attached to collectible assemblies. My suspicion is that the amount of complexity and performance overhead we have added working around co-op mode outweighs the value we are getting from it. Of course I make no claim that I have carefully evaluated the alternatives, so it is only my hunch at this point.

@kouvel
Copy link
Contributor Author

kouvel commented Aug 4, 2020

Thanks @noahfalk! Longer-term, avoiding coop mode would avoid some complication, though there are other complications it may not avoid. I haven't seen a startup perf hit from using the coop mode data structure initially or from this change, so there may not be any improvement from using a preemptive mode data structure. The alternative data structure may have to be a separate data structure and along with other changes that would go along with using it, it would be a bit too large/risky for .NET 5 at the moment. In the interest of simplifying some things it would be good longer-term to see if we can switch to a mostly preemptive mode solution.

@kouvel kouvel closed this Aug 4, 2020
@kouvel kouvel reopened this Aug 4, 2020
@kouvel kouvel merged commit 6d98f6d into dotnet:master Aug 4, 2020
@kouvel kouvel deleted the TierDebugFix branch August 4, 2020 22:02
Jacksondr5 pushed a commit to Jacksondr5/runtime that referenced this pull request Aug 10, 2020
…otnet#40060)

- Mostly reverted the previous workaround for the issue (commit fc06054)
- Added a forbid-suspend-for-debugger region in preemptive GC mode
- Added a crst holder type that acquires a lock and enters the forbid region
- Where the slot backpatching lock would be taken where cooperative GC mode may be entered inside that lock, the new crst holder is used
- When a suspend for debugger is requested, a thread in preemptive GC mode that is in the forbid region is considered not yet suspended and is synched later similarly to threads in cooperative GC mode

Fixes dotnet#37278
kouvel added a commit that referenced this pull request Sep 23, 2020
…bid-suspend-for-debugger region (#42587)

- Followup to #40060
- In short timing windows if a thread is placed in a pending-suspend-for-debugger state while in a forbid-suspend-for-debugger region, from the above PR it would not suspend for the debugger but it was missed that it can also get stuck in a perpetual spin-wait while entering cooperative GC mode. This causes the thread to not suspend for the debugger (VS times out after waiting) and the thread can only progress by unmarking it for debugger suspension.
- Fixed to break the spin-wait by checking whether the thread is in the forbid region

Fix for #42375 in master
kouvel added a commit that referenced this pull request Sep 23, 2020
…s in a forbid-suspend-for-debugger region (#42630)

- Port of #42587 to 5.0 RC2
- Followup to #40060
- In short timing windows if a thread is placed in a pending-suspend-for-debugger state while in a forbid-suspend-for-debugger region, from the above PR it would not suspend for the debugger but it was missed that it can also get stuck in a perpetual spin-wait while entering cooperative GC mode. This causes the thread to not suspend for the debugger (VS times out after waiting) and the thread can only progress by unmarking it for debugger suspension.
- Fixed to break the spin-wait by checking whether the thread is in the forbid region

Fix for #42375
@ghost ghost locked as resolved and limited conversation to collaborators Dec 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Func eval unreliable in .NET 5 due to Tiered JIT
5 participants