Skip to content

connection hangs for 10+ minutes when network errors occur in some situations #516

Closed
@nocnokneo

Description

@nocnokneo

Using the following:

         transport = WebsocketsTransport(
             url=self._webSocketUrl,
             init_payload=await self.authHeaders(),
             ping_interval=60,
             pong_timeout=10,
         )
        async with GqlClient(transport=transport, schema=schema, serialize_variables=True) as client:
            async for eventDict in client.subscribe(
                document=gql(OnResourceEvent.Meta.document),
                variable_values={"parent": parent, "resourceKinds": resourceKinds, "offset": offset},
            ):
                event = OnResourceEvent.parse_obj(eventDict)
                yield event.onResourceEvent

Can result in the following error under some network connection failures (debug logging enabled for gql.transport.websockets)

2025-01-17 13:44:17,348 - gql.transport.websockets:DEBUG: _fail: starting with exception: ConnectionClosedError(None, Close(code=1011, reason='keepalive ping timeout'), None)
2025-01-17 13:44:17,348 - gql.transport.websockets:DEBUG: Exiting _receive_data_loop()
2025-01-17 13:44:17,348 - gql.transport.websockets:DEBUG: _fail: starting with exception: ConnectionClosedError(None, Close(code=1011, reason='keepalive ping timeout'), None)
2025-01-17 13:44:17,348 - gql.transport.websockets:DEBUG: close_task is not None in _fail. Previous exception is: None New exception is: ConnectionClosedError(None, Close(code=1011, reason='keepalive ping timeout'), None)
2025-01-17 13:44:17,348 - gql.transport.websockets:DEBUG: _close_coro: starting
2025-01-17 13:44:17,349 - gql.transport.websockets:WARNING: Exception catched in _close_coro: ConnectionClosedError(None, Close(code=1011, reason='keepalive ping timeout'), None)
2025-01-17 13:44:17,349 - gql.transport.websockets:DEBUG: _close_coro: start cleanup
2025-01-17 13:44:17,349 - gql.transport.websockets:DEBUG: _close_coro: exiting

# hangs for more than 10 minutes

2025-01-17 13:55:16,988 - gql.transport.websockets:DEBUG: Exception in subscribe: CancelledError()
2025-01-17 13:55:16,989 - gql.transport.websockets:DEBUG: stop listener 1
2025-01-17 13:55:16,989 - gql.transport.websockets:DEBUG: In subscribe finally for query_id 1
2025-01-17 13:55:16,990 - gql.transport.websockets:DEBUG: listener 1 deleted, 0 remaining
2025-01-17 13:55:16,990 - gql.transport.websockets:DEBUG: close: starting
2025-01-17 13:55:16,990 - gql.transport.websockets:DEBUG: _fail: starting with exception: TransportClosed('Websocket GraphQL transport closed by user')
2025-01-17 13:55:16,990 - gql.transport.websockets:DEBUG: _fail started with self.websocket == None -> already closed
2025-01-17 13:55:16,991 - gql.transport.websockets:DEBUG: wait_close: starting
2025-01-17 13:55:16,991 - gql.transport.websockets:DEBUG: wait_close: done
2025-01-17 13:55:16,991 - gql.transport.websockets:DEBUG: close: done

To Reproduce
Steps to reproduce the behavior:

  1. Start the client, wait for a "ping" / "pong" messages to be logged
  2. Immediately afterward, kill the connection to the GQL server (e.g. using tc)
  3. Wait ~50s for the above error to occur.
  4. Immediately restore the network connection and see that gql hangs for more than 10 minutes

Expected behavior
gql raises the exception to the calling code within seconds, not minutes, to allow the calling application to retry the connection or any other error handling.

System info (please complete the following information):

  • OS: Ubuntu 24.04.1 LTS
  • Python version: Python 3.10.16
  • gql version: 3.5.0
  • graphql-core version: 3.2.5

Metadata

Metadata

Assignees

No one assigned

    Labels

    type: bugAn issue or pull request relating to a bug

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions