Skip to content

Heatbeat for azure.messaging.webpubsubclient #42162

@djwessel

Description

@djwessel

Is your feature request related to a problem? Please describe.

When connected to both the sync and async WebPubSubClient, I get the following messages every ~20 seconds when there is no traffic going over the websocket.

Sync

2025-07-22 15:06:10,767 - azure.messaging.webpubsubclient._client - WARNING - An error occurred when trying to connect: Connection to remote host was lost.
2025-07-22 15:06:10,767 - azure.messaging.webpubsubclient._client - INFO - WebSocket connection closed. Code: None, Reason: None
2025-07-22 15:06:10,987 - websocket - INFO - Websocket connected
2025-07-22 15:06:10,987 - azure.messaging.webpubsubclient._client - INFO - connected successfully
...
2025-07-22 15:06:31,290 - azure.messaging.webpubsubclient._client - WARNING - An error occurred when trying to connect: Connection to remote host was lost.
2025-07-22 15:06:31,290 - azure.messaging.webpubsubclient._client - INFO - WebSocket connection closed. Code: None, Reason: None
2025-07-22 15:06:31,637 - websocket - INFO - Websocket connected
2025-07-22 15:06:31,637 - azure.messaging.webpubsubclient._client - INFO - connected successfully

Async

2025-07-22 15:08:46,937 - azure.messaging.webpubsubclient.aio._client - INFO - WebSocket connection closed. Code: None, Reason: None
2025-07-22 15:08:47,085 - azure.messaging.webpubsubclient.aio._client - INFO - connected successfully
...
2025-07-22 15:09:07,259 - azure.messaging.webpubsubclient.aio._client - INFO - WebSocket connection closed. Code: None, Reason: None
2025-07-22 15:09:08,228 - azure.messaging.webpubsubclient.aio._client - INFO - connected successfully

I assume that the socket is closed after ~20 seconds of inactivity and then automatically reopened.

I dove into the codebase, and it looks like the websocket connection being opened under the hood is not making use of the heartbeat functionality built into aiohttp.connect_ws(..., heartbeat=5) and websocket.WebSocketApp.run_forever(ping_interval=5)

Describe the solution you'd like

Adapt both the async and sync WebPubSubClients to properly handle the heartbeat.

I played around with setting heartbeat=5 on the call to connect_ws, and while I no longer got the disconnects every 20 seconds, I still get the following disconnect message every 5-35 min (there is no fixed frequency of the messages like above).

azure.messaging.webpubsubclient.aio._client - WARNING - WebSocket error: 
2025-07-22 16:38:29,686 - azure.messaging.webpubsubclient.aio._client - WARNING - WebSocket error: No PONG received after 2.5 seconds
2025-07-22 16:38:29,687 - azure.messaging.webpubsubclient.aio._client - INFO - WebSocket connection closed. Code: 1008, Reason: No PONG received after 2.5 seconds
2025-07-22 16:38:29,687 - azure.messaging.webpubsubclient.aio._client - INFO - Staring a new connection

This indicates that the server doesn't always respond with a pong message within 2.5 seconds (aiohttp.connect_ws hardcodes the pong_timeout as heartbeat / 2). Even if we bump the heartbeat to 10 seconds, we still get pong_timeout when the response takes longer than 5 seconds.

On the sync side, I also tried out setting self._thread = threading.Thread(target=self._ws.run_forever, kwargs={"ping_interval": 5}, daemon=True). This seems to be working as expected, as the socket never disconnects (this is because per default, run_forever doesn't have a ping_time, meaning if the server doesn't respond with a pong, the connection doesn't disconnect unless you also pass ping_timeout to run_forever.

Describe alternatives you've considered

Not sure if there are other alternatives to avoiding this behavior unless we can prevent the server-side from disconnecting the connection after 20 seconds of inactivity.

Additional context
python: 3.11.4

aiohappyeyeballs==2.6.1
aiohttp==3.12.14
aiosignal==1.4.0
attrs==25.3.0
azure-core==1.35.0
azure-messaging-webpubsubclient==1.1.0
azure-messaging-webpubsubservice==1.3.0
certifi==2025.7.14
charset-normalizer==3.4.2
frozenlist==1.7.0
idna==3.10
isodate==0.7.2
multidict==6.6.3
propcache==0.3.2
pyjwt==2.10.1
requests==2.32.4
six==1.17.0
typing-extensions==4.14.1
urllib3==2.5.0
websocket-client==1.8.0
yarl==1.20.1

I vibe-coded 2 evaluation scripts that capture metrics on how often these disconnects happen. But the simplest way to reproduce the messages is to just initialize and open a connection to the client and wait 20 seconds.

Metadata

Metadata

Assignees

Labels

ClientThis issue points to a problem in the data-plane of the library.Service AttentionWorkflow: This issue is responsible by Azure service team.WebPubSubcustomer-reportedIssues that are reported by GitHub users external to the Azure organization.feature-requestThis issue requires a new behavior in the product in order be resolved.needs-team-attentionWorkflow: This issue needs attention from Azure service team or SDK team

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions