Skip to content

[release/7.0] [QUIC] Root listener/connection while waiting on new connection/stream event #74740

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

ManickaP
Copy link
Member

@ManickaP ManickaP commented Aug 29, 2022

Backport of the first commit of #74450 to release/7.0

Fixes #73377.

Root cause

When awaiting reading from Channel (whether it's channel of connections in listener or channel of stream in connection) we're technically awaiting a native event from msquic. During this awaiting we were hitting another instance of the problem stated in https://devblogs.microsoft.com/pfxteam/keeping-async-methods-alive/. In other words, GC would collect listener/connection while we were waiting for NEW_CONNECTION/NEW_STREAM events.

Fix

Added rooting of listener/connection in AcceptIncoming... methods. Also added tests directly targeted at this problem (they were consistently failing before the fix, always on the first run).

Customer Impact

The error manifests as a hang. A hang about which a user is not able to do anything. And it might lead to a false suspicion that a dead-lock is happening.
Note that we did have the same problem before (in 6.0), even with more places where GC had an opportunity to collect Listener/Connection/Stream. My suspicion is timing changes due to API changes and/or other changes elsewhere just uncovered this.

Testing

Re-enable failing tests as part of fix. Also added 2 more targeted tests that were failing before the fix and now are passing.

Risk

Low since Quic is in preview mode. Should not have impact on HTTP/3.

@ManickaP ManickaP requested a review from rzikm August 29, 2022 09:34
@ghost ghost added the area-System.Net label Aug 29, 2022
@ghost ghost assigned ManickaP Aug 29, 2022
@ghost
Copy link

ghost commented Aug 29, 2022

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Backport of the first commit of #74450 to release/7.0

Fixes #73377.

Root cause

When awaiting reading from Channel (whether it's channel of connections in listener or channel of stream in connection) we're technically awaiting a native event from msquic. During this awaiting we were hitting another instance of the problem stated in https://devblogs.microsoft.com/pfxteam/keeping-async-methods-alive/. In other words, GC would collect listener/connection while we were waiting for NEW_CONNECTION/NEW_STREAM events.

Fix

Added rooting of listener/connection in AcceptIncoming... methods. Also added tests directly targeted at this problem (they were consistently failing before the fix, always on the first run).

Customer Impact

The error manifests as a hang. A hang about which a user is not able to do anything. And it might lead to a false suspicion that a dead-lock is happening.
Note that we did have the same problem before (in 6.0), even with more places where GC had an opportunity to collect Listener/Connection/Stream. My suspicion is timing changes due to API changes and/or other changes elsewhere just uncovered this.

Testing

Re-enable failing tests as part of fix. Also added 2 more targeted tests that were failing before the fix and now are passing.

Risk

Low since Quic is in preview mode.

Author: ManickaP
Assignees: -
Labels:

area-System.Net

Milestone: -

Copy link
Member

@rzikm rzikm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@karelz karelz added this to the 7.0.0 milestone Aug 29, 2022
@danmoseley
Copy link
Member

Approved. Hang, we would service for this.

@carlossanlop
Copy link
Contributor

Approved, signed off, CI green. Ready to merge. :shipit:

@carlossanlop carlossanlop merged commit bdb78f4 into dotnet:release/7.0 Aug 29, 2022
@ManickaP ManickaP deleted the mapichov/7.0-root-quic-objects branch August 30, 2022 08:12
@ghost ghost locked as resolved and limited conversation to collaborators Sep 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants