Skip to content

Conversation

@noahfalk
Copy link
Member

Fixes #4826

When running dotnet-stack against an unresponsive target process, there were various points where dotnet-stack wouldn't correctly cancel when Ctrl-C was pressed. There were several underlying issues:

  • Cancellation caused EventPipeSession.Dispose() to run which attempted to send a Stop IPC command that might block indefinitely
  • Several of the async operations dotnet-stack performed did not pass a cancellation token and so ignore when Ctrl-C is pressed
  • The calls to start and stop the session were still using the synchronous API which both ignored the cancellation token and create the standard async-over-sync issues.

The change in behavior for EventPipeSession.Dispose() is strictly speaking a breaking change, although callers would need to emply some dubious code patterns to observe the difference. The most likely way code could observe the difference is if thread 1 is reading from the EventStream at the same time thread 2 called Dispose(). Previously this would have caused thread 1 to start receiving rundown events although it was also a race condition between thread 1 reading from the stream and thread 2 disposing the stream. Its possible some tool could have worked successfully if thread 1 always won the race in practice. If any code was doing that pattern then now thread 1 will observe the stream is disposed without seeing the rundown events first. The proper way to ensure seeing all the rundown events would be to explicitly call EventPipeSession.Stop(), then read all the remaining data and reach the end of stream marker, then Dispose() the session.

I looked through all the usage of EventPipeSession in our existing tools and it looked like all of them were already using Stop() properly.

Fixes dotnet#4826

When running dotnet-stack against an unresponsive target process, there were various points where dotnet-stack wouldn't correctly cancel when Ctrl-C was pressed. There were several underlying issues:
- Cancellation caused EventPipeSession.Dispose() to run which attempted to send a Stop IPC command that might block indefinitely
- Several of the async operations dotnet-stack performed did not pass a cancellation token and so ignore when Ctrl-C is pressed
- The calls to start and stop the session were still using the synchronous API which both ignored the cancellation token and create the standard async-over-sync issues.

The change in behavior for EventPipeSession.Dispose() is strictly speaking a breaking change, although callers would need to emply some dubious code patterns to observe the difference. The most likely way code could observe the difference is if thread 1 is reading from the EventStream at the same time thread 2 called Dispose(). Previously this would have caused thread 1 to start receiving rundown events although it was also a race condition between thread 1 reading from the stream and thread 2 disposing the stream. Its possible some tool could have worked successfully if thread 1 always won the race in practice. If any code was doing that pattern then now thread 1 will observe the stream is disposed without seeing the rundown events first. The proper way to ensure seeing all the rundown events would be to explicitly call EventPipeSession.Stop(), then read all the remaining data and reach the end of stream marker, then Dispose() the session.

I looked through all the usage of EventPipeSession in our existing tools and it looked like all of them were already using Stop() properly.
@noahfalk noahfalk requested a review from a team as a code owner October 11, 2024 23:38
@noahfalk
Copy link
Member Author

@dotnet/dotnet-diag PTAL
FYI @dotnet/dotnet-monitor - this PR includes a behavioral change for EventPipeSession.Dispose().

Copy link
Member

@tommcdon tommcdon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@noahfalk noahfalk merged commit 845230b into dotnet:main Oct 16, 2024
20 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Nov 15, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

dotnet stack hangup on trying to get the stackframes of a stuck process

3 participants