Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ACU: Difficulty recovering when crossbar connection drops #529

Open
BrianJKoopman opened this issue Sep 22, 2023 · 0 comments
Open

ACU: Difficulty recovering when crossbar connection drops #529

BrianJKoopman opened this issue Sep 22, 2023 · 0 comments
Labels
agent: acu bug Something isn't working

Comments

@BrianJKoopman
Copy link
Member

BrianJKoopman commented Sep 22, 2023

In a recent crash on satp2, it looks like something was causing the agent's connection to crossbar to drop. Then, when the agent reconnected to crossbar it had difficulty resuming its connection. Something weird might have been going on with the network, so maybe there's something else going on here, but that seems to be the case to me right now. Here's a log snippet from the last re-connection attempt:

2023-09-21T14:50:30Z    2023-09-21T14:50:30+0000 transport connected
2023-09-21T14:50:30Z    2023-09-21T14:50:30+0000 waiting for reconnect for 7.997567415237427 more seconds
2023-09-21T14:50:29Z    2023-09-21T14:50:29+0000 waiting for reconnect for 8.998709678649902 more seconds
2023-09-21T14:50:28Z    2023-09-21T14:50:28+0000 Scheduling retry 1 to connect <twisted.internet.endpoints.TCP4ClientEndpoint object at 0x7f8c6250f5e0> in 2.079377072774456 seconds.
2023-09-21T14:50:28Z    2023-09-21T14:50:28+0000 waiting for reconnect for 9.99999713897705 more seconds
2023-09-21T14:50:28Z    2023-09-21T14:50:28+0000 transport disconnected
2023-09-21T14:50:28Z    2023-09-21T14:50:28+0000 session left: CloseDetails(reason=<wamp.close.transport_lost>, message='WAMP transport was lost without closing the session 5996472324772386 before')
2023-09-21T14:50:28Z    2023-09-21T14:50:28+0000 Exception while trying to enable stream: HTTPConnectionPool(host='192.168.12.5', port=8080): Max retries exceeded with url: /?Module=Services.PositionBroadcast&Chapter=0 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8c620726a0>: Failed to establish a new connection: [Errno 113] No route to host'))
2023-09-21T14:50:18Z    2023-09-21T14:50:18+0000 Requesting UDP stream enable.
2023-09-21T14:49:22Z    2023-09-21T14:49:22+0000 Exception while trying to enable stream: HTTPConnectionPool(host='192.168.12.5', port=8080): Max retries exceeded with url: /?Module=Services.PositionBroadcast&Chapter=0 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8c62081640>: Failed to establish a new connection: [Errno 113] No route to host'))
2023-09-21T14:49:18Z    2023-09-21T14:49:18+0000 Requesting UDP stream enable.
2023-09-21T14:48:44Z    2023-09-21T14:48:44+0000 Data rate for "monitor" stream is now 0.017 Hz
2023-09-21T14:48:43Z    2023-09-21T14:48:43+0000 An error occurred while connecting: 113: No route to host.
2023-09-21T14:48:21Z    2023-09-21T14:48:21+0000 Exception while trying to enable stream: HTTPConnectionPool(host='192.168.12.5', port=8080): Max retries exceeded with url: /?Module=Services.PositionBroadcast&Chapter=0 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8c62036220>: Failed to establish a new connection: [Errno 113] No route to host'))
2023-09-21T14:48:18Z    2023-09-21T14:48:18+0000 Requesting UDP stream enable.
2023-09-21T14:47:21Z    2023-09-21T14:47:21+0000 Exception while trying to enable stream: HTTPConnectionPool(host='192.168.12.5', port=8080): Max retries exceeded with url: /?Module=Services.PositionBroadcast&Chapter=0 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8c62072d60>: Failed to establish a new connection: [Errno 113] No route to host'))
2023-09-21T14:47:18Z    2023-09-21T14:47:18+0000 Requesting UDP stream enable.
2023-09-21T14:46:21Z    2023-09-21T14:46:21+0000 Exception while trying to enable stream: HTTPConnectionPool(host='192.168.12.5', port=8080): Max retries exceeded with url: /?Module=Services.PositionBroadcast&Chapter=0 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8c62185130>: Failed to establish a new connection: [Errno 113] No route to host'))
2023-09-21T14:46:18Z    2023-09-21T14:46:18+0000 Requesting UDP stream enable.
2023-09-21T14:45:21Z    2023-09-21T14:45:21+0000 Exception while trying to enable stream: HTTPConnectionPool(host='192.168.12.5', port=8080): Max retries exceeded with url: /?Module=Services.PositionBroadcast&Chapter=0 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8c6215be50>: Failed to establish a new connection: [Errno 113] No route to host'))
2023-09-21T14:45:18Z    2023-09-21T14:45:18+0000 Requesting UDP stream enable.
2023-09-21T14:44:21Z    2023-09-21T14:44:21+0000 Exception while trying to enable stream: HTTPConnectionPool(host='192.168.12.5', port=8080): Max retries exceeded with url: /?Module=Services.PositionBroadcast&Chapter=0 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8c6212be80>: Failed to establish a new connection: [Errno 113] No route to host'))
2023-09-21T14:44:18Z    2023-09-21T14:44:18+0000 Requesting UDP stream enable.
2023-09-21T14:44:11Z    2023-09-21T14:44:11+0000 generate_scan:48 Status is now "done".
2023-09-21T14:44:11Z    2023-09-21T14:44:11+0000 generate_scan:48 Motion blocked; problem with "broadcast" data acq process.
2023-09-21T14:44:11Z    2023-09-21T14:44:11+0000 broadcast daq stream has old data (657.123274564743 seconds)
2023-09-21T14:44:11Z    2023-09-21T14:44:11+0000 generate_scan:48 Status is now "running".
2023-09-21T14:44:11Z    2023-09-21T14:44:11+0000 generate_scan:48 Status is now "starting".
2023-09-21T14:44:11Z    2023-09-21T14:44:11+0000 start called for generate_scan
2023-09-21T14:43:21Z    2023-09-21T14:43:21+0000 Exception while trying to enable stream: HTTPConnectionPool(host='192.168.12.5', port=8080): Max retries exceeded with url: /?Module=Services.PositionBroadcast&Chapter=0 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8c621e7d00>: Failed to establish a new connection: [Errno 113] No route to host'))
2023-09-21T14:43:18Z    2023-09-21T14:43:18+0000 Requesting UDP stream enable.
2023-09-21T14:42:21Z    2023-09-21T14:42:21+0000 Exception while trying to enable stream: HTTPConnectionPool(host='192.168.12.5', port=8080): Max retries exceeded with url: /?Module=Services.PositionBroadcast&Chapter=0 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8c62225f40>: Failed to establish a new connection: [Errno 113] No route to host'))
2023-09-21T14:42:18Z    2023-09-21T14:42:18+0000 Requesting UDP stream enable.
2023-09-21T14:41:21Z    2023-09-21T14:41:21+0000 Exception while trying to enable stream: HTTPConnectionPool(host='192.168.12.5', port=8080): Max retries exceeded with url: /?Module=Services.PositionBroadcast&Chapter=0 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8c620852e0>: Failed to establish a new connection: [Errno 113] No route to host'))
2023-09-21T14:41:18Z    2023-09-21T14:41:18+0000 Requesting UDP stream enable.
2023-09-21T14:40:21Z    2023-09-21T14:40:21+0000 Exception while trying to enable stream: HTTPConnectionPool(host='192.168.12.5', port=8080): Max retries exceeded with url: /?Module=Services.PositionBroadcast&Chapter=0 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8c621853d0>: Failed to establish a new connection: [Errno 113] No route to host'))
2023-09-21T14:40:18Z    2023-09-21T14:40:18+0000 Requesting UDP stream enable.
2023-09-21T14:39:21Z    2023-09-21T14:39:21+0000 Exception while trying to enable stream: HTTPConnectionPool(host='192.168.12.5', port=8080): Max retries exceeded with url: /?Module=Services.PositionBroadcast&Chapter=0 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8c621e5640>: Failed to establish a new connection: [Errno 113] No route to host'))
2023-09-21T14:39:18Z    2023-09-21T14:39:18+0000 Requesting UDP stream enable.
2023-09-21T14:38:21Z    2023-09-21T14:38:21+0000 Exception while trying to enable stream: HTTPConnectionPool(host='192.168.12.5', port=8080): Max retries exceeded with url: /?Module=Services.PositionBroadcast&Chapter=0 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8c620e51f0>: Failed to establish a new connection: [Errno 113] No route to host'))
2023-09-21T14:38:18Z    2023-09-21T14:38:18+0000 Requesting UDP stream enable.
2023-09-21T14:37:21Z    2023-09-21T14:37:21+0000 Exception while trying to enable stream: HTTPConnectionPool(host='192.168.12.5', port=8080): Max retries exceeded with url: /?Module=Services.PositionBroadcast&Chapter=0 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8c620fcd90>: Failed to establish a new connection: [Errno 113] No route to host'))
2023-09-21T14:37:18Z    2023-09-21T14:37:18+0000 Requesting UDP stream enable.
2023-09-21T14:36:21Z    2023-09-21T14:36:21+0000 Exception while trying to enable stream: HTTPConnectionPool(host='192.168.12.5', port=8080): Max retries exceeded with url: /?Module=Services.PositionBroadcast&Chapter=0 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8c620816d0>: Failed to establish a new connection: [Errno 113] No route to host'))
2023-09-21T14:36:18Z    2023-09-21T14:36:18+0000 Requesting UDP stream enable.
2023-09-21T14:35:21Z    2023-09-21T14:35:21+0000 Exception while trying to enable stream: HTTPConnectionPool(host='192.168.12.5', port=8080): Max retries exceeded with url: /?Module=Services.PositionBroadcast&Chapter=0 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8c621e7700>: Failed to establish a new connection: [Errno 113] No route to host'))
2023-09-21T14:35:18Z    2023-09-21T14:35:18+0000 Requesting UDP stream enable.
2023-09-21T14:34:18Z    2023-09-21T14:34:18+0000 Exception while trying to enable stream: HTTPConnectionPool(host='192.168.12.5', port=8080): Max retries exceeded with url: /?Module=Services.PositionBroadcast&Chapter=0 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8c620e5490>: Failed to establish a new connection: [Errno 113] No route to host'))
2023-09-21T14:34:18Z    2023-09-21T14:34:18+0000 Requesting UDP stream enable.
2023-09-21T14:33:23Z    2023-09-21T14:33:23+0000 Exception while trying to enable stream: HTTPConnectionPool(host='192.168.12.5', port=8080): Max retries exceeded with url: /?Module=Services.PositionBroadcast&Chapter=0 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8c6213d9d0>: Failed to establish a new connection: [Errno 113] No route to host'))
2023-09-21T14:33:17Z    2023-09-21T14:33:17+0000 Requesting UDP stream enable.
2023-09-21T14:33:17Z    2023-09-21T14:33:17+0000 No UDP packets are being received.
2023-09-21T14:24:23Z    2023-09-21T14:24:23+0000 Elevation_mode has changed to Stop
2023-09-21T14:24:23Z    2023-09-21T14:24:23+0000 Azimuth_mode has changed to Stop
2023-09-21T14:22:33Z    2023-09-21T14:22:33+0000 Data rate for "monitor" stream is now 19.167 Hz
2023-09-21T14:22:13Z    2023-09-21T14:22:13+0000 Data rate for "monitor" stream is now 18.312 Hz

On the nextline frontend side we saw this:

Traceback (most recent call last):
  File "<string>", line 191, in <module>
  File "/usr/local/lib/python3.8/dist-packages/sorunlib/seq.py", line 48, in scan
    check_response(resp)
  File "/usr/local/lib/python3.8/dist-packages/sorunlib/_internal.py", line 35, in check_response
    raise RuntimeError(error)
RuntimeError: Task failed to complete successfully.
OCSReply: OK : Operation "generate_scan" is currently not running (FAILED).
  generate_scan[session=48]; status=done with ERROR 20.7 mins ago, took 0.001600 s
  messages (4 of 4):
    1695307451.340 Status is now "starting".
    1695307451.341 Status is now "running".
    1695307451.341 Motion blocked; problem with "broadcast" data acq process.
    1695307451.341 Status is now "done".
  other keys in .session: op_code, data

socs version: v0.4.3-35-g79b92b5-dev

@BrianJKoopman BrianJKoopman added bug Something isn't working agent: acu labels Sep 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agent: acu bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant