-
Notifications
You must be signed in to change notification settings - Fork 848
Description
(Filling out the following details about bugs will help us solve your issue sooner.)
Reproducible in:
pip freeze | grep slack
python --version
sw_vers && uname -v # or `ver`slack-sdk==3.21.3
Python 3.11.3
Debian 10
The Slack SDK version
slack-sdk==3.21.3
Python runtime version
Python 3.11.3
OS info
Debian 10
Steps to reproduce:
(Share the commands to run, source code, and project settings (e.g., setup.py))
- Use
slack_sdk.socket_mode.websocket_client.SocketModeClient - Let the app run for long enough
- Receive a websocket-level disconnect
Expected result:
The client should properly reconnect
Actual result:
The client repeatedly tries to reconnect without re-issuing a wss url, causing all connections to fail in a loop forever
The logs are roughly the same as described in slackapi/bolt-python#470. It repeatedly gets WebSocketConnectionClosedException. Specifically, websocket logs Connection to remote host was lost. - goodbye and slack_sdk.socket_mode.websocket_client logs on_error invoked (error: WebSocketConnectionClosedException, message: Connection to remote host was lost.)
I believe the problem is that SocketModeClient.is_connected() just checks whether self.current_session is not None (https://github.com/slackapi/python-slack-sdk/blob/v3.21.3/slack_sdk/socket_mode/websocket_client/__init__.py#L139). If you compare that with the builtin client, it also checks whether connection.is_active() (https://github.com/slackapi/python-slack-sdk/blob/v3.21.3/slack_sdk/socket_mode/builtin/client.py#L152) which checks if self.sock is not None (https://github.com/slackapi/python-slack-sdk/blob/v3.21.3/slack_sdk/socket_mode/builtin/connection.py#L198). I think what happens is that in a certain type of disconnect scenario, BaseSocketModeClient.connect_to_new_endpoint() doesn't call issue_new_wss_url() when it should, because it thinks we're still connected (https://github.com/slackapi/python-slack-sdk/blob/v3.21.3/slack_sdk/socket_mode/client.py#L75).
I'm not sure how reliable that is in theory, there might be race conditions still. However, empirically, it seems that websocket-client's connection class does a reasonable job of setting its own .sock attribute to None when handling disconnects. I subclassed slack_sdk.socket_mode.websocket_client.SocketModeClient and overrode is_connected() with a version that also checks if self.current_session.sock is falsey, and ran that in prod for a couple weeks, and it seems like when this happens it does correctly reconnect. Running the app in a HA configuration (at least two instances running) means that, as long as they both don't disconnect simultaneously, I think we retain at least one active connection consistently and don't drop messages.
I think the fix is probably simple: just check that relying on checking websocket._app.WebSocketApp.sock is something you're comfortable doing, and then add that condition to slack_sdk.socket_mode.websocket_client.SocketModeClient.is_connected().