Skip to content

"Lost connection to socket" every 5 seconds #2146

@giulianob

Description

@giulianob

What version of gRPC and what language are you using?

C# 2.53.0

What operating system (Linux, Windows,...) and version?

Issue observed both Windows 11 and Linux

What runtime / compiler are you using (e.g. .NET Core SDK version dotnet --info)

.NET 6.0.408

What did you do/see/expect?

I have a .NET client application talking to a Java gRPC server. My custom resolver was being unexpectedly called every 5 seconds so after enabling trace level logs on grpc-dotnet and adding logging for HTTP I see the following message occurring every 5 seconds is there is no traffic between the socket connecting and the first time SocketConnectivitySubchannelTransport.OnCheckSocketConnection is called:

frontend-58846576d8-8z8jd frontend 2023-05-31T12:21:23.459206138-07:00 trce: Grpc.Net.Client.Balancer.Internal.SocketConnectivitySubchannelTransport[4]
frontend-58846576d8-8z8jd frontend 2023-05-31T12:21:23.459255541-07:00       Subchannel id '1' checking socket 10.240.7.121:5001.
frontend-58846576d8-8z8jd frontend 2023-05-31T12:21:23.459264942-07:00 ############## [EventSource(Private.InternalDiagnostics.System.Net.Sockets, ae391de7-a2cb-557c-dd34-fe00d0b98c7f)] [Info] thisOrContextObject: Socket#43624646, memberName: Poll, message: Poll returns socketCount:0
frontend-58846576d8-8z8jd frontend 2023-05-31T12:21:23.459271942-07:00 ############## [EventSource(Private.InternalDiagnostics.System.Net.Sockets, ae391de7-a2cb-557c-dd34-fe00d0b98c7f)] [Info] thisOrContextObject: Socket#43624646, memberName: Dispose, message: timeout = -1
frontend-58846576d8-8z8jd frontend 2023-05-31T12:21:23.459278243-07:00 ############## [EventSource(Private.InternalDiagnostics.System.Net.Sockets, ae391de7-a2cb-557c-dd34-fe00d0b98c7f)] [Info] thisOrContextObject: Socket#43624646, memberName: Dispose, message: disposing:True Disposed:False
frontend-58846576d8-8z8jd frontend 2023-05-31T12:21:23.459289243-07:00 ############## [EventSource(Private.InternalDiagnostics.System.Net.Sockets, ae391de7-a2cb-557c-dd34-fe00d0b98c7f)] [Info] thisOrContextObject: Socket#43624646, memberName: Dispose, message: Calling _handle.CloseAsIs()
frontend-58846576d8-8z8jd frontend 2023-05-31T12:21:23.459330846-07:00 ############## [EventSource(Private.InternalDiagnostics.System.Net.Sockets, ae391de7-a2cb-557c-dd34-fe00d0b98c7f)] [Info] thisOrContextObject: SafeSocketHandle#46732916, memberName: CloseAsIs, message: shouldClose=True
frontend-58846576d8-8z8jd frontend 2023-05-31T12:21:23.459337746-07:00 ############## [EventSource(Private.InternalDiagnostics.System.Net.Sockets, ae391de7-a2cb-557c-dd34-fe00d0b98c7f)] [Info] thisOrContextObject: SafeSocketHandle#46732916, memberName: ReleaseHandle, message: shouldClose=False
frontend-58846576d8-8z8jd frontend 2023-05-31T12:21:23.459343647-07:00 ############## [EventSource(Private.InternalDiagnostics.System.Net.Sockets, ae391de7-a2cb-557c-dd34-fe00d0b98c7f)] [Info] thisOrContextObject: SafeSocketHandle#46732916, memberName: CloseHandle, message: handle:0x16B
frontend-58846576d8-8z8jd frontend 2023-05-31T12:21:23.459349747-07:00 ############## [EventSource(Private.InternalDiagnostics.System.Net.Sockets, ae391de7-a2cb-557c-dd34-fe00d0b98c7f)] [Info] thisOrContextObject: SafeSocketHandle#46732916, memberName: DoCloseHandle, message: handle:0x16B Following 'non-abortive' branch.
frontend-58846576d8-8z8jd frontend 2023-05-31T12:21:23.459390249-07:00 ############## [EventSource(Private.InternalDiagnostics.System.Net.Sockets, ae391de7-a2cb-557c-dd34-fe00d0b98c7f)] [Info] thisOrContextObject: SafeSocketHandle#46732916, memberName: CloseHandle, message: handle:363, close():SUCCESS
frontend-58846576d8-8z8jd frontend 2023-05-31T12:21:23.460124293-07:00 ############## [EventSource(System.Net.NameResolution, 4b326142-bfb5-5ed3-8585-7714181d14b0)] [ResolutionStart] hostNameOrAddress: fe-e2e-d42b40067b4f4c779181362-az22db33c3-search-svc.gbarberi3
frontend-58846576d8-8z8jd frontend 2023-05-31T12:21:23.460150295-07:00 dbug: Grpc.Net.Client.Balancer.Subchannel[11]
frontend-58846576d8-8z8jd frontend 2023-05-31T12:21:23.460158196-07:00       Subchannel id '1' state changed to Idle. Detail: 'Lost connection to socket.'.

My channel is configured like this:

                var channel = GrpcChannel.ForAddress(
                    address,
                    channelOptions: new GrpcChannelOptions
                    {
                        ThrowOperationCanceledOnCancellation = true,
                        LoggerFactory = services.GetRequiredService<ILoggerFactory>(),
                        Credentials = ChannelCredentials.Insecure,
                        MaxReconnectBackoff = serviceOptions.MaxReconnectBackoff,
                        MaxReceiveMessageSize = serviceOptions.MaxReceiveMessageSize,
                        MaxSendMessageSize = serviceOptions.MaxSendMessageSize,
                        ServiceConfig = new ServiceConfig { LoadBalancingConfigs = { new RoundRobinConfig() } },
                        HttpHandler = new SocketsHttpHandler
                        {
                            ConnectTimeout = serviceOptions.ConnectTimeout, // 30 sec by default
                            EnableMultipleHttp2Connections = true
                        },
                        // Required to dispose of manually created SocketsHttpHandler above
                        DisposeHttpClient = true,
                        // Create a service provider to inject the activation manager into the resolver factory
                        ServiceProvider = new ServiceCollection()
                            .AddSingleton<ResolverFactory, IndexServiceReaderResolverFactory>()
                            .AddSingleton<ResolverFactory, IndexServiceWriterResolverFactory>()
                            .BuildServiceProvider()
                    }
                );

I added a breakpoint to SocketConnectivitySubchannelTransport and see that IsSocketInBadState returns true when the transport is polling for status every 5 seconds which causes the transport to then disconnect.

I also added a breakpoint right after the connection is established in SocketConnectivitySubchannelTransport.TryConnectAsync here:

SocketConnectivitySubchannelTransportLog.ConnectingSocket(_logger, _subchannel.Id, currentAddress);
await _socketConnect(socket, currentAddress.EndPoint, context.CancellationToken).ConfigureAwait(false);
SocketConnectivitySubchannelTransportLog.ConnectedSocket(_logger, _subchannel.Id, currentAddress);

Calling IsSocketInBadState(socket) in the breakpoint above is returning true. The socket is healthy but there is data pending (socket.Available==40):

  {System.Net.Sockets.Socket}
    AddressFamily: InterNetworkV6
    Available: 40
    Blocking: true
    Connected: true
    DontFragment: Exception of type 'System.NotSupportedException' was thrown
    DualMode: true
    EnableBroadcast: Exception of type 'System.Net.Sockets.SocketException' was thrown
    ExclusiveAddressUse: false
    Handle: 0xd38
    IsBound: true
    LingerState: {System.Net.Sockets.LingerOption}
    LocalEndPoint: {[::ffff:127.0.0.1]:64546}
    MulticastLoopback: Exception of type 'System.Net.Sockets.SocketException' was thrown
    NoDelay: true
    ProtocolType: Tcp
    ReceiveBufferSize: 65536
    ReceiveTimeout: 0
    RemoteEndPoint: null
    SafeHandle: {System.Net.Sockets.SafeSocketHandle}
    SendBufferSize: 65536
    SendTimeout: 0
    SocketType: Stream
    Ttl: 128
    UseOnlyOverlappedIO: false

The IsSocketInBadState function will return true if there is data pending according to the comments and socket.Poll docs:

    private static bool IsSocketInBadState(Socket socket)
    {
        try
        {
            // Will return true if closed or there is pending data for some reason.
            return socket.Poll(0, SelectMode.SelectRead);
        }
        catch (Exception e) when (e is SocketException || e is ObjectDisposedException)
        {
            return false;
        }
    }

The 40 bytes in the socket are 00001204000000000000037FFFFFFF000400100000000600002000000004080000000000000F0001 which I don't know what they mean but searching for those bytes I see references to java gRPC service.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions