Skip to content

Conversation

@JamesNK
Copy link
Member

@JamesNK JamesNK commented Jun 26, 2023

Addresses #2161
Addresses #2146

The socket transports regularly checks that a socket is still alive. This PR changes its logic to not close the socket if there is pending data. The socket is now marked that there was unexpected data and left open. When it comes time to use the socket to create a SocketsHttpHandler stream, the socket is recreated then.

Pending data can happen because some servers (C-core, Java) eagerly send the SETTINGS frame to the client.

Pending data is now cached (up to a limit) and then replayed back to the client when the real connection is established. Fixes the problem and avoids the overhead of having to recreate the socket.

@JamesNK JamesNK changed the title Don't close socket on available data Socket transport manager: Don't close socket on available data Jun 26, 2023
@JamesNK JamesNK marked this pull request as ready for review June 26, 2023 06:08
@JamesNK JamesNK changed the title Socket transport manager: Don't close socket on available data Socket transport: Don't close socket on available data Jun 26, 2023
//
// If a socket with data is closed immediately then that might happen continuously.
// Chill, leave the socket alone, and be happy that it can be read from. We can create a new one later.
do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to check my understanding; it looks like this only happens during initial connect, and we're intentionally burning the data (and triggering a failure condition) because this is unexpected, but we want to be able to poll later, and thus want an empty state - is that right?

quick question: why do we explicitly not expect data initially? is that a protocol specification thing, that the server always talks first?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue arose because I didn't completely understand the HTTP/2 handshake process.

I thought a client always initializes an HTTP/2 connection by sending the HTTP/2 preface, and then the server responds with a SETTINGS frame. That's the way Kestrel/IIS/HTTP.sys work.

It turns out that some HTTP/2 server implementations don't wait for the preface. They preemptively send the SETTINGS frame. There isn't a rule against doing this.

See https://datatracker.ietf.org/doc/html/rfc7540#section-3.5

Copy link
Contributor

@mgravell mgravell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

going to mark approved, but: would still love an answer on that question, just to help my understanding

@kalduzov
Copy link

Hello.

We really need this fix.
We balance on GrpcChannel an average of 20 ip addresses (max about 70). In services that create a small load on GrpcCnannel, this bug causes a constant re-creation of the picker and a request for address resolution.
As a balancing policy, we use "weighted round robin" and some of the subchannels, depending on its weight, may not receive traffic at all.
On average, every 4-5 seconds a picker is recreated.

We have added this fix now separately to version 2.55 and made an internal fork of the lib - this fixes the problem. But we do not want to support the fork on our side - we would like to use the original packages.

@JamesNK JamesNK requested a review from mgravell July 20, 2023 06:57
@JamesNK
Copy link
Member Author

JamesNK commented Jul 20, 2023

going to mark approved, but: would still love an answer on that question, just to help my understanding

@mgravell I made big changes. Data is cached and replayed so a socket doesn't need to be recreated. Please take another look.

_inner.ReadAsync(buffer, cancellationToken);
public override async ValueTask<int> ReadAsync(Memory<byte> buffer, CancellationToken cancellationToken = default)
{
if (_initialSocketData != null && _initialSocketData.Count > 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is probably fine since we usually expect at most one initial buffer fragment; however, I wonder if this would be a lot simpler if ReadOnlySequence<byte> was used instead of List<ReadOnlyMemory<byte>> - they're semantically comparable, except a ROS can be sliced more simply

however, if you agree that this isn't high volume, might be better to leave it simple

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't use a ReadOnlySequence because they're a pain to create. But you're right that it would reading easier.

I expect only a small amount of data from the SETTINGS frame. It's likely the list will have one array in it.

Copy link
Contributor

@mgravell mgravell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking approved, but with a few notes - the "lock" one seeming the biggest, but please also consider the timer question

@JamesNK JamesNK merged commit 03d9b01 into grpc:master Jul 21, 2023
@JamesNK JamesNK deleted the jamesnk/allow-socket-data-pending-connection branch July 21, 2023 01:26
@JamesNK
Copy link
Member Author

JamesNK commented Jul 21, 2023

@kalduzov A package with this change will be available on the nightly feed in a day or two. You can temporarily use it https://github.com/grpc/grpc-dotnet#grpc-nuget-feed

It would be great to check whether your issue is resolved with the latest changes. That would avoid potential months of delay from releasing a new version of grpc-dotnet with a fix, finding out it didn't work in the real world, and having to do follow-up work and waiting for the next release.

@kalduzov
Copy link

kalduzov commented Jul 21, 2023

@JamesNK Thank you very much. We'll check it out soon.

@jojo-Fish
Copy link

@JamesNK hello,use this package something wrong like :
"message": "Convert error from exception handle middleware", "exception": "Grpc.Core.RpcException: Status(StatusCode="Internal", Detail="Error starting gRPC call. HttpRequestException: An HTTP/2 connection could not be established because the server did not complete the HTTP/2 handshake.", DebugException="System.Net.Http.HttpRequestException: An HTTP/2 connection could not be established because the server did not complete the HTTP/2 handshake.")\n ---> System.Net.Http.HttpRequestException: An HTTP/2 connection could not be established because the server did not complete the HTTP/2 handshake.\n at System.Net.Http.HttpConnectionPool.ReturnHttp2Connection(Http2Connection connection, Boolean isNewConnection)\n at System.Net.Http.HttpConnectionPool.AddHttp2ConnectionAsync(HttpRequestMessage request)\n at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine)\n at System.Net.Http.HttpConnectionPool.AddHttp2ConnectionAsync(HttpRequestMessage request)\n at System.Threading.Tasks.Task1.InnerInvoke()\n at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)\n at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)\n at System.Threading.ThreadPoolWorkQueue.Dispatch()\n at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()\n--- End of stack trace from previous location ---\n at System.Threading.Tasks.TaskCompletionSourceWithCancellation1.WaitWithCancellationAsync(CancellationToken cancellationToken)\n at System.Net.Http.HttpConnectionPool.GetHttp2ConnectionAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)\n at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)\n at System.Net.Http.DiagnosticsHandler.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)\n at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)\n at Grpc.Net.Client.Balancer.Internal.BalancerHttpHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)\n at System.Net.Http.HttpMessageInvoker.g__SendAsyncWithTelemetry|6_0(HttpMessageHandler handler, HttpRequestMessage request, CancellationToken cancellationToken)\n at Grpc.Net.Client.Internal.GrpcCall2.RunCall(HttpRequestMessage request, Nullable1 timeout)\n --- End of inner exception stack trace ---\n at

System.Private.CoreLib\nSystem.Net.Http.HttpRequestException: An HTTP/2 connection could not be established because the server did not complete the HTTP/2 handshake.\n at System.Net.Http.HttpConnectionPool.ReturnHttp2Connection(Http2Connection connection, Boolean isNewConnection)\n at System.Net.Http.HttpConnectionPool.AddHttp2ConnectionAsync(HttpRequestMessage request)\n at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine)\n at System.Net.Http.HttpConnectionPool.AddHttp2ConnectionAsync(HttpRequestMessage request)\n at System.Threading.Tasks.Task1.InnerInvoke()\n at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)\n at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)\n at System.Threading.ThreadPoolWorkQueue.Dispatch()\n at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()\n--- End of stack trace from previous location ---\n at System.Threading.Tasks.TaskCompletionSourceWithCancellation1.WaitWithCancellationAsync(CancellationToken cancellationToken)\n at System.Net.Http.HttpConnectionPool.GetHttp2ConnectionAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)\n at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)\n at System.Net.Http.DiagnosticsHandler.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)\n at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)\n at Grpc.Net.Client.Balancer.Internal.BalancerHttpHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)\n at System.Net.Http.HttpMessageInvoker.g__SendAsyncWithTelemetry|6_0(HttpMessageHandler handler, HttpRequestMessage request, CancellationToken cancellationToken)\n at Grpc.Net.Client.Internal.GrpcCall2.RunCall(HttpRequestMessage request, Nullable1 timeout) at System.Net.Http.HttpConnectionPool.ReturnHttp2Connection(Http2Connection connection, Boolean isNewConnection)\n at System.Net.Http.HttpConnectionPool.AddHttp2ConnectionAsync(HttpRequestMessage request)\n at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine)\n at System.Net.Http.HttpConnectionPool.AddHttp2ConnectionAsync(HttpRequestMessage request)\n at System.Threading.Tasks.Task1.InnerInvoke()\n at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)\n at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)\n at System.Threading.ThreadPoolWorkQueue.Dispatch()\n at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()\n--- End of stack trace from previous location ---\n at System.Threading.Tasks.TaskCompletionSourceWithCancellation1.WaitWithCancellationAsync(CancellationToken cancellationToken)\n at System.Net.Http.HttpConnectionPool.GetHttp2ConnectionAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)\n at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)\n at System.Net.Http.DiagnosticsHandler.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)\n at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)\n at Grpc.Net.Client.Balancer.Internal.BalancerHttpHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)\n at System.Net.Http.HttpMessageInvoker.g__SendAsyncWithTelemetry|6_0(HttpMessageHandler handler, HttpRequestMessage request, CancellationToken cancellationToken)\n at Grpc.Net.Client.Internal.GrpcCall2.RunCall(HttpRequestMessage request, Nullable1 timeout) System.Private.CoreLib" }

@JamesNK
Copy link
Member Author

JamesNK commented Jul 26, 2023

Please create a new issue.

I haven't seen that error during development. I need a minimal reproduction so I can see it for myself and fix it. Upload a reproduction to GitHub.

@jojo-Fish
Copy link

@JamesNK #2210

@kalduzov
Copy link

Hi @JamesNK.

We rolled out a nightly assembly with a fix in production. Everything is stable.

When is 2.56.0 scheduled to be released?

@JamesNK
Copy link
Member Author

JamesNK commented Jul 27, 2023

Great. Probably be a preview end of this week or next week, and a full version a couple of weeks later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants