Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

socket-mode 1.x: explicit server disconnect during 'connecting' state intermittently not handled #1787

Open
filmaj opened this issue May 6, 2024 · 5 comments
Labels
auto-triage-skip enhancement M-T: A feature request for new functionality pkg:socket-mode applies to `@slack/socket-mode` semver:patch

Comments

@filmaj
Copy link
Contributor

filmaj commented May 6, 2024

          @filmaj I seem to be hitting the following now on 1.3.5:
[INFO]  socket-mode:SocketModeClient:0 Going to establish a new connection to Slack ...
/home/patrick/slack/node_modules/finity/lib/core/StateMachine.js:76
      throw new Error('Unhandled event \'' + event + '\' in state \'' + this.currentState + '\'.');
            ^

Error: Unhandled event 'server explicit disconnect' in state 'connecting'.
    at StateMachine.handleUnhandledEvent (/home/patrick/slack/node_modules/finity/lib/core/StateMachine.js:76:13)
    at /home/patrick/slack/node_modules/finity/lib/core/HierarchicalStateMachine.js:79:33
    at TaskScheduler.execute (/home/patrick/slack/node_modules/finity/lib/core/TaskScheduler.js:29:7)
    at TaskScheduler.enqueue (/home/patrick/slack/node_modules/finity/lib/core/TaskScheduler.js:19:12)
    at HierarchicalStateMachine.handle (/home/patrick/slack/node_modules/finity/lib/core/HierarchicalStateMachine.js:72:24)
    at SocketModeClient.onWebSocketMessage (/home/patrick/slack/node_modules/@slack/socket-mode/dist/SocketModeClient.js:608:31)
    at WebSocket.onMessage (/home/patrick/slack/node_modules/ws/lib/event-target.js:132:16)
    at WebSocket.emit (node:events:518:28)
    at Receiver.receiverOnMessage (/home/patrick/slack/node_modules/ws/lib/websocket.js:1068:20)
    at Receiver.emit (node:events:518:28)

With debug logging it shows:

[DEBUG]  socket-mode:SocketModeClient:0 Received a message on the WebSocket: {"type":"disconnect","reason":"too_many_websockets","debug_info":{"host":"applink-2"}}
[DEBUG]  socket-mode:SocketModeClient:0 Received "disconnect" (reason: too_many_websockets) message - will attempt reconnect

Originally posted by @MadrMan in #1654 (comment)


Any PR to address this should target the socket-mode-1.x branch as this would be a 'maintenance' fix for the now-outdated 1.x line of socket-mode.

@filmaj filmaj added semver:patch good first issue enhancement M-T: A feature request for new functionality pkg:socket-mode applies to `@slack/socket-mode` auto-triage-skip labels May 6, 2024
@filmaj filmaj changed the title socket-mode: explicit server disconnect during 'connecting' state not handled socket-mode 1.x: explicit server disconnect during 'connecting' state not handled May 6, 2024
@filmaj filmaj changed the title socket-mode 1.x: explicit server disconnect during 'connecting' state not handled socket-mode 1.x: explicit server disconnect during 'connecting' state intermittently not handled May 29, 2024
@filmaj
Copy link
Contributor Author

filmaj commented May 29, 2024

This is a tricky issue to handle, because the 1.x line of socket-mode's architecture is such that there is a single state machine to handle transitions, but potentially, during a reconnect, two WebSocket connections are briefly active/overlapping as a connection is re-established. The underlying problem is, during this reconnect when we have two WS connections open, each WebSocket connection pumps events into the single state machine, causing unexpected state transitions.

Consider the case where, at more or less the same time, one WS connection sends a "disconnect" event to the state machine and another sends a "open" or "connected" event. The single state machine will get confused.

This issue is resolved in socket-mode 2.x, as the architecture has completely changed. Socket-mode 2.0 is available in bolt 4.0.0 rc1, and has been used successfully by some customers who have adopted it early.

@okovpashko
Copy link

okovpashko commented Sep 3, 2024

We recently stumbled upon this issue on multiple projects and here are the debug logs for the typical occurrence.

2024-09-03 02:39:12.801	Transitioning to state: connected:ready
2024-09-03 02:39:12.801	Started running a new heart beat job
2024-09-03 02:39:12.801	Terminated the old connection
2024-09-03 02:39:12.801	Switched to the secondary connection
2024-09-03 02:39:12.801	Switching to the secondary connection ...
2024-09-03 02:39:12.801	Transitioning to state: connected:preparing
2024-09-03 02:39:12.801	Now connected to Slack
2024-09-03 02:39:12.801	Transitioning to state: connected
2024-09-03 02:39:12.800	Received a message on the WebSocket: {\"type\":\"hello\",\"num_connections\":10,\"debug_info\":{\"host\":\"applink-9\",\"build_number\":97,\"approximate_connection_time\":18060},\"connection_info\":{\"app_id\":\"A03UYB9U6U9\}
2024-09-03 02:39:12.788	Transitioning to state: connecting:handshaking
2024-09-03 02:39:12.788	Secondary WebSocket open event received (connection established)
2024-09-03 02:39:12.693	Transitioning to state: connecting:authenticated
2024-09-03 02:39:12.693	apiCall('apps.connections.open') end
2024-09-03 02:39:12.693	http request result: {\"ok\":true,\"url\":\"wss://wss-primary.slack.com/link/?ticket=TICKET&app_id=APP_ID\",\"response_metadata\":{\"scopes\":[\"connections:write\"],\"acceptedScopes\":[\"connections:write\"]}}
2024-09-03 02:39:12.692	http response received
2024-09-03 02:39:12.596	http request headers: {}
2024-09-03 02:39:12.596	http request body: {}
2024-09-03 02:39:12.596	http request url: https://slack.com/api/apps.connections.open
2024-09-03 02:39:12.596	apiCall('apps.connections.open') start
2024-09-03 02:39:12.596	Going to retrieve a new WSS URL ...
2024-09-03 02:39:12.596	Transitioning to state: connecting:authenticating
2024-09-03 02:39:12.596	Going to establish a new connection to Slack ...
2024-09-03 02:39:12.596	Transitioning to state: connecting
2024-09-03 02:39:12.596	Reconnecting to Slack ...
2024-09-03 02:39:12.596	Transitioning to state: reconnecting
2024-09-03 02:39:12.593	Secondary WebSocket close event received (code: 1006, reason: )
2024-09-03 02:39:02.594	Received \"disconnect\" (reason: too_many_websockets) message - will attempt reconnect
2024-09-03 02:39:02.590	    at Receiver.emit (node:events:519:28)
2024-09-03 02:39:02.590	    at Receiver.receiverOnMessage (/app/node_modules/ws/lib/websocket.js:1070:20)
2024-09-03 02:39:02.590	    at WebSocket.emit (node:events:519:28)
2024-09-03 02:39:02.590	    at WebSocket.onMessage (/app/node_modules/ws/lib/event-target.js:132:16)
2024-09-03 02:39:02.590	    at SocketModeClient.onWebSocketMessage (/app/node_modules/@slack/socket-mode/dist/SocketModeClient.js:608:31)
2024-09-03 02:39:02.590	    at HierarchicalStateMachine.handle (/app/node_modules/finity/lib/core/HierarchicalStateMachine.js:72:24)
2024-09-03 02:39:02.590	    at TaskScheduler.enqueue (/app/node_modules/finity/lib/core/TaskScheduler.js:19:12)
2024-09-03 02:39:02.590	    at TaskScheduler.execute (/app/node_modules/finity/lib/core/TaskScheduler.js:29:7)
2024-09-03 02:39:02.590	    at /app/node_modules/finity/lib/core/HierarchicalStateMachine.js:79:33
2024-09-03 02:39:02.590	    at StateMachine.handleUnhandledEvent (/app/node_modules/finity/lib/core/StateMachine.js:76:13)
2024-09-03 02:39:02.590	Error: Unhandled event 'server explicit disconnect' in state 'connecting'.
2024-09-03 02:39:02.590	This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). The promise rejected with the reason:
2024-09-03 02:39:02.587	Received a message on the WebSocket: {\"type\":\"disconnect\",\"reason\":\"too_many_websockets\",\"debug_info\":{\"host\":\"applink-14\}
2024-09-03 02:39:02.579	Transitioning to state: connecting:handshaking
2024-09-03 02:39:02.578	Secondary WebSocket open event received (connection established)
2024-09-03 02:39:02.449	Transitioning to state: connecting:authenticated
2024-09-03 02:39:02.449	apiCall('apps.connections.open') end
2024-09-03 02:39:02.449	http request result: {\"ok\":true,\"url\":\"wss://wss-primary.slack.com/link/?ticket=TICKET&app_id=APP_ID\",\"response_metadata\":{\"scopes\":[\"connections:write\"],\"acceptedScopes\":[\"connections:write\"]}}
2024-09-03 02:39:02.448	http response received
2024-09-03 02:39:02.320	http request headers: {}
2024-09-03 02:39:02.320	http request body: {}
2024-09-03 02:39:02.320	http request url: https://slack.com/api/apps.connections.open
2024-09-03 02:39:02.320	apiCall('apps.connections.open') start
2024-09-03 02:39:02.320	Going to retrieve a new WSS URL ...
2024-09-03 02:39:02.320	Transitioning to state: connecting:authenticating
2024-09-03 02:39:02.320	Going to establish a new connection to Slack ...
2024-09-03 02:39:02.320	Transitioning to state: connecting
2024-09-03 02:39:02.320	Reconnecting to Slack ...
2024-09-03 02:39:02.320	Transitioning to state: reconnecting
2024-09-03 02:39:02.318	Secondary WebSocket close event received (code: 1006, reason: )
2024-09-03 02:38:52.319	Received \"disconnect\" (reason: too_many_websockets) message - will attempt reconnect
2024-09-03 02:38:52.316	    at Receiver.emit (node:events:519:28)
2024-09-03 02:38:52.316	    at Receiver.receiverOnMessage (/app/node_modules/ws/lib/websocket.js:1070:20)
2024-09-03 02:38:52.316	    at WebSocket.emit (node:events:519:28)
2024-09-03 02:38:52.316	    at WebSocket.onMessage (/app/node_modules/ws/lib/event-target.js:132:16)
2024-09-03 02:38:52.316	    at SocketModeClient.onWebSocketMessage (/app/node_modules/@slack/socket-mode/dist/SocketModeClient.js:608:31)
2024-09-03 02:38:52.316	    at HierarchicalStateMachine.handle (/app/node_modules/finity/lib/core/HierarchicalStateMachine.js:72:24)
2024-09-03 02:38:52.316	    at TaskScheduler.enqueue (/app/node_modules/finity/lib/core/TaskScheduler.js:19:12)
2024-09-03 02:38:52.316	    at TaskScheduler.execute (/app/node_modules/finity/lib/core/TaskScheduler.js:29:7)
2024-09-03 02:38:52.316	    at /app/node_modules/finity/lib/core/HierarchicalStateMachine.js:79:33
2024-09-03 02:38:52.316	    at StateMachine.handleUnhandledEvent (/app/node_modules/finity/lib/core/StateMachine.js:76:13)
2024-09-03 02:38:52.316	Error: Unhandled event 'server explicit disconnect' in state 'connecting'.
2024-09-03 02:38:52.316	This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). The promise rejected with the reason:
2024-09-03 02:38:52.315	Received a message on the WebSocket: {\"type\":\"disconnect\",\"reason\":\"too_many_websockets\",\"debug_info\":{\"host\":\"applink-4\}
2024-09-03 02:38:52.305	Transitioning to state: connecting:handshaking
2024-09-03 02:38:52.305	Secondary WebSocket open event received (connection established)
2024-09-03 02:38:52.198	Transitioning to state: connecting:authenticated
2024-09-03 02:38:52.198	apiCall('apps.connections.open') end
2024-09-03 02:38:52.198	http request result: {\"ok\":true,\"url\":\"wss://wss-primary.slack.com/link/?ticket=TICKET&app_id=APP_ID\",\"response_metadata\":{\"scopes\":[\"connections:write\"],\"acceptedScopes\":[\"connections:write\"]}}
2024-09-03 02:38:52.195	http response received
2024-09-03 02:38:52.097	http request headers: {}
2024-09-03 02:38:52.097	http request body: {}
2024-09-03 02:38:52.097	http request url: https://slack.com/api/apps.connections.open
2024-09-03 02:38:52.097	apiCall('apps.connections.open') start
2024-09-03 02:38:52.097	Going to retrieve a new WSS URL ...
2024-09-03 02:38:52.097	Transitioning to state: connecting:authenticating
2024-09-03 02:38:52.097	Going to establish a new connection to Slack ...
2024-09-03 02:38:52.097	Transitioning to state: connecting
2024-09-03 02:38:52.097	Reconnecting to Slack ...
2024-09-03 02:38:52.097	Transitioning to state: reconnecting
2024-09-03 02:38:52.094	Secondary WebSocket close event received (code: 1006, reason: )
2024-09-03 02:38:42.093	Received \"disconnect\" (reason: too_many_websockets) message - will attempt reconnect
2024-09-03 02:38:42.089	    at Receiver.emit (node:events:519:28)
2024-09-03 02:38:42.089	    at Receiver.receiverOnMessage (/app/node_modules/ws/lib/websocket.js:1070:20)
2024-09-03 02:38:42.089	    at WebSocket.emit (node:events:519:28)
2024-09-03 02:38:42.089	    at WebSocket.onMessage (/app/node_modules/ws/lib/event-target.js:132:16)
2024-09-03 02:38:42.089	    at SocketModeClient.onWebSocketMessage (/app/node_modules/@slack/socket-mode/dist/SocketModeClient.js:608:31)
2024-09-03 02:38:42.089	    at HierarchicalStateMachine.handle (/app/node_modules/finity/lib/core/HierarchicalStateMachine.js:72:24)
2024-09-03 02:38:42.089	    at TaskScheduler.enqueue (/app/node_modules/finity/lib/core/TaskScheduler.js:19:12)
2024-09-03 02:38:42.089	    at TaskScheduler.execute (/app/node_modules/finity/lib/core/TaskScheduler.js:29:7)
2024-09-03 02:38:42.089	    at /app/node_modules/finity/lib/core/HierarchicalStateMachine.js:79:33
2024-09-03 02:38:42.089	    at StateMachine.handleUnhandledEvent (/app/node_modules/finity/lib/core/StateMachine.js:76:13)
2024-09-03 02:38:42.089	Error: Unhandled event 'server explicit disconnect' in state 'connecting'.
2024-09-03 02:38:42.089	This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). The promise rejected with the reason:
2024-09-03 02:38:42.087	Received a message on the WebSocket: {\"type\":\"disconnect\",\"reason\":\"too_many_websockets\",\"debug_info\":{\"host\":\"applink-12\}
2024-09-03 02:38:42.080	Transitioning to state: connecting:handshaking
2024-09-03 02:38:42.078	Secondary WebSocket open event received (connection established)
2024-09-03 02:38:41.976	Transitioning to state: connecting:authenticated
2024-09-03 02:38:41.976	apiCall('apps.connections.open') end
2024-09-03 02:38:41.976	http request result: {\"ok\":true,\"url\":\"wss://wss-primary.slack.com/link/?ticket=TICKET&app_id=APP_ID\",\"response_metadata\":{\"scopes\":[\"connections:write\"],\"acceptedScopes\":[\"connections:write\"]}}
2024-09-03 02:38:41.974	http response received
2024-09-03 02:38:41.877	http request headers: {}
2024-09-03 02:38:41.877	http request body: {}
2024-09-03 02:38:41.877	http request url: https://slack.com/api/apps.connections.open
2024-09-03 02:38:41.877	apiCall('apps.connections.open') start
2024-09-03 02:38:41.877	Going to retrieve a new WSS URL ...
2024-09-03 02:38:41.877	Transitioning to state: connecting:authenticating
2024-09-03 02:38:41.877	Going to establish a new connection to Slack ...
2024-09-03 02:38:41.877	Transitioning to state: connecting
2024-09-03 02:38:41.877	Reconnecting to Slack ...
2024-09-03 02:38:41.877	Transitioning to state: reconnecting
2024-09-03 02:38:41.874	Secondary WebSocket close event received (code: 1006, reason: )
2024-09-03 02:38:31.878	Received \"disconnect\" (reason: too_many_websockets) message - will attempt reconnect
2024-09-03 02:38:31.872	    at Receiver.emit (node:events:519:28)
2024-09-03 02:38:31.872	    at Receiver.receiverOnMessage (/app/node_modules/ws/lib/websocket.js:1070:20)
2024-09-03 02:38:31.872	    at WebSocket.emit (node:events:519:28)
2024-09-03 02:38:31.872	    at WebSocket.onMessage (/app/node_modules/ws/lib/event-target.js:132:16)
2024-09-03 02:38:31.872	    at SocketModeClient.onWebSocketMessage (/app/node_modules/@slack/socket-mode/dist/SocketModeClient.js:608:31)
2024-09-03 02:38:31.872	    at HierarchicalStateMachine.handle (/app/node_modules/finity/lib/core/HierarchicalStateMachine.js:72:24)
2024-09-03 02:38:31.872	    at TaskScheduler.enqueue (/app/node_modules/finity/lib/core/TaskScheduler.js:19:12)
2024-09-03 02:38:31.872	    at TaskScheduler.execute (/app/node_modules/finity/lib/core/TaskScheduler.js:29:7)
2024-09-03 02:38:31.872	    at /app/node_modules/finity/lib/core/HierarchicalStateMachine.js:79:33
2024-09-03 02:38:31.872	    at StateMachine.handleUnhandledEvent (/app/node_modules/finity/lib/core/StateMachine.js:76:13)
2024-09-03 02:38:31.872	Error: Unhandled event 'server explicit disconnect' in state 'connecting'.
2024-09-03 02:38:31.872	This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). The promise rejected with the reason:
2024-09-03 02:38:31.870	Received a message on the WebSocket: {\"type\":\"disconnect\",\"reason\":\"too_many_websockets\",\"debug_info\":{\"host\":\"applink-1\}
2024-09-03 02:38:31.862	Transitioning to state: connecting:handshaking
2024-09-03 02:38:31.862	Secondary WebSocket open event received (connection established)
2024-09-03 02:38:31.743	Transitioning to state: connecting:authenticated
2024-09-03 02:38:31.743	apiCall('apps.connections.open') end
2024-09-03 02:38:31.743	http request result: {\"ok\":true,\"url\":\"wss://wss-primary.slack.com/link/?ticket=TICKET&app_id=APP_ID\",\"response_metadata\":{\"scopes\":[\"connections:write\"],\"acceptedScopes\":[\"connections:write\"]}}
2024-09-03 02:38:31.740	http response received
2024-09-03 02:38:31.610	http request headers: {}
2024-09-03 02:38:31.610	http request body: {}
2024-09-03 02:38:31.609	http request url: https://slack.com/api/apps.connections.open
2024-09-03 02:38:31.609	apiCall('apps.connections.open') start
2024-09-03 02:38:31.609	Going to retrieve a new WSS URL ...
2024-09-03 02:38:31.609	Transitioning to state: connecting:authenticating
2024-09-03 02:38:31.609	Going to establish a new connection to Slack ...
2024-09-03 02:38:31.609	Transitioning to state: connecting
2024-09-03 02:38:31.609	Reconnecting to Slack ...
2024-09-03 02:38:31.609	Transitioning to state: reconnecting
2024-09-03 02:38:31.609	Terminated the heart beat job
2024-09-03 02:38:31.609	Cancelled the job waiting for ping from Slack
2024-09-03 02:38:31.605	A pong wasn't received from the server before the timeout of 5000ms!

@filmaj
Copy link
Contributor Author

filmaj commented Sep 3, 2024

The first log is a big clue:

2024-09-03 02:38:31.605 A pong wasn't received from the server before the timeout of 5000ms!

This is not a good sign, and may actually point to an issue in the Slack backend; under the hood, socket-mode sends ping messages to the Slack backend as a kind of heartbeat, to ensure the connection is working properly. The fact that Slack did not respond with the per-specification pong message within 5 seconds is unexpected and unusual.

@weijiany
Copy link

Hi @filmaj do you have a plan to release the stable 4.x version? I want to determine whether we upgrade to rc1 or wait the stable version.

@filmaj
Copy link
Contributor Author

filmaj commented Sep 13, 2024

@weijiany do you mean a bolt-js v4 release? We have started to put together plans for it but there is no specific timeline. If this issue is affecting you, I recommend trying the v4 rc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-triage-skip enhancement M-T: A feature request for new functionality pkg:socket-mode applies to `@slack/socket-mode` semver:patch
Projects
None yet
Development

No branches or pull requests

3 participants