Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Http 1.1's read-ahead task management #103257

Merged
merged 2 commits into from
Jun 22, 2024

Conversation

MihaZupan
Copy link
Member

Fixes #100616

Every HTTP/1.1 connection has a ValueTask<int> _readAheadTask field that gets set either:

  • from PrepareForReuse right before the call to SendAsync, or
  • by the connection pool every N seconds as part of its scavenging logic that looks for dead connections to clean up (those that have received EOF)

As a result of the latter, if a connection is added to the connection pool at any point (the shared _http11Connections stack), we must assume that the read-ahead may have been set.

When "consuming" connections, we carefully manage who may consume the read-ahead task to ensure that we don't leak unobserved task exceptions.
To achieve this, we're using an int _readAheadTaskStatus flag to signal which thread will observe the result / whether the exception should be suppressed.

The problem with the current state (before this PR) is that a connection may be briefly added to the connection pool and taken out right after to handle a pending request from the request queue.
Because we know that the connection was freshly added and therefore still good, we don't bother calling into PrepareForReuse, which means we weren't updating the _readAheadTaskStatus as expected. This results in hitting a debug assert in HttpConnection.SendAsync that checks that someone claimed ownership of the read-ahead completion before sending the request (#100616).


Why don't we always call PrepareForReuse then?

PrepareForReuse expects its caller to immediately call into SendAsync if the check is successful. This allows it to store the stream.ReadAsync into _readAheadTask directly, without any extra wrapping logic.

_readAheadTask = _stream.ReadAsync(_readBuffer.AvailableMemory);

But at this point in the connection management, we're not 100% certain that we will have a request for the connection yet - we have waiters that may get canceled at any point.
If we did call PrepareForReuse and then had no waiter to signal, we'd have to throw away the connection.

PrepareForReuse also uses different logic based on a bool async argument which we don't yet have at this point.


Why don't we give the connection to the waiter and have the consuming code call PrepareForReuse?

While not super likely, if the connection was already bad, it would mean that we now took a request from the head of the queue and pushed it back to the tail to retry.
It would also potentially mean needing an extra flag to remember if this is the first request on the connection to match how we currently handle new connections that are immediately bad. Without that, we could end up doing more connect retries to a faulty server than we currently do (maybe that'd be fine, but it's not an obvious consequence).

It should be a possible approach, but I found the following simpler to follow:


This PR:

I changed how we handle read-ahead completion ownership, allowing a thread to return its completion ownership:

  • When we take a connection from the pool, we check if a scavenging read-ahead was started, and if so try to take ownership of its completion.
  • If we then fail to signal a request waiter, we try to hand back the completion ownership.

If TryOwnScavengingTaskCompletion fails, the read-ahead already completed and any potential exceptions got ignored => the connection is not usable and we try again.
If TryReturnScavengingTaskCompletionOwnership fails, the read-ahead completed between us taking ownership and trying to return it => the connection is not usable and we try again.

@MihaZupan MihaZupan added this to the 9.0.0 milestone Jun 10, 2024
@MihaZupan MihaZupan requested a review from a team June 10, 2024 23:34
@MihaZupan MihaZupan self-assigned this Jun 10, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

@danmoseley
Copy link
Member

Not my area but I want to recognize that the change description here is really great 😀.

@MihaZupan
Copy link
Member Author

/azp run runtime-libraries-coreclr outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@wfurt wfurt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Is there chance this can also be related to #31254?
It seems like there may be mix of problems but some looked similar enough to me.

@@ -133,7 +137,9 @@ public bool PrepareForReuse(bool async)
// If the read-ahead task is completed, then we've received either EOF or erroneous data the connection, so it's not usable.
if (ReadAheadTaskHasStarted)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could still this be race condition? e.g. It did not start it but it will before we get to the next statement.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the time you get to PrepareForReuse, the caller is the only thing with a reference to this connection.

It's possible that the read was already started and is transitioning to Completed right under you, but it can't transition from NotStarted => something else, or from something to => NotStarted.

@MihaZupan
Copy link
Member Author

Is there chance this can also be related to #31254?

I don't think so. #31254 is happening because our HTTP/1.1 logic isn't flowing the cancellation token through to the underlying stream, instead using dispose to force pending operations to abort.

@MihaZupan MihaZupan merged commit 3526759 into dotnet:main Jun 22, 2024
80 of 83 checks passed
rzikm pushed a commit to rzikm/dotnet-runtime that referenced this pull request Jun 24, 2024
* Fix Http 1.1's read-ahead task management

* Fix comment typo
@github-actions github-actions bot locked and limited conversation to collaborators Jul 23, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Debug assertion failure in System.Net.Http.HttpConnection.SendAsync
3 participants