You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm not sure what the issue is here. The issue may lie in the gocql logic itself, but we have a problem where a rolling restart causes disconnects and failure to reconnect until the app is restarted. What seems to happen is that the driver disconnects when a node is restarted. It reconnects to another node, and continues to do this until the last node is restarted. Once that last node is restarted, the driver will not reconnect. It's like the driver is holding onto a list of nodes that were dead and not updating that list.
There's a removehost and hostdown function that may be removing the nodes from the available list and never pushing them back to alive once they come back up.
Here's the log when the connection is failing:
01:58:48.338 app-stg-ue4-gke flight-plan-service-worker 2023/02/22 07:58:48 gocql: unable to dial control conn 10.16.9.2:29042: EOF
01:58:48.341 app-stg-ue4-gke flight-plan-service-worker 2023/02/22 07:58:48 gocql: unable to dial control conn 10.16.9.2:29042: EOF
01:58:48.342 app-stg-ue4-gke flight-plan-service-worker 2023/02/22 07:58:48 gocql: unable to dial control conn 10.16.9.2:29042: EOF
01:58:48.640 app-stg-ue4-gke flight-plan-service-worker 2023/02/22 07:58:48 gocql: unable setup control conn 10.16.9.2:29042: EOF
01:58:48.640 app-stg-ue4-gke flight-plan-service-worker 2023/02/22 07:58:48 gocql: unable setup control conn 10.16.9.2:29042: EOF
01:58:48.640 app-stg-ue4-gke flight-plan-service-worker 2023/02/22 07:58:48 gocql: unable setup control conn 10.16.9.2:29042: EOF
01:58:48.640 app-stg-ue4-gke flight-plan-service-worker 2023/02/22 07:58:48 gocql: unable setup control conn 10.16.9.2:29042: EOF
01:58:48.640 app-stg-ue4-gke flight-plan-service-worker 2023/02/22 07:58:48 gocql: unable setup control conn 10.16.9.2:29042: EOF
01:58:48.641 app-stg-ue4-gke flight-plan-service-worker 2023/02/22 07:58:48 gocql: unable setup control conn 10.16.9.2:29042: EOF
01:58:48.641 app-stg-ue4-gke flight-plan-service-worker 2023/02/22 07:58:48 gocql: unable setup control conn 10.16.9.2:29042: EOF
01:58:48.641 app-stg-ue4-gke flight-plan-service-worker 2023/02/22 07:58:48 gocql: unable setup control conn 10.16.9.2:29042: EOF
01:58:48.642 app-stg-ue4-gke flight-plan-service-worker 2023/02/22 07:58:48 gocql: unable setup control conn 10.16.9.2:29042: EOF
01:59:58.481 app-stg-ue4-gke flight-plan-service-worker 2023/02/22 07:59:58 gocql: unable setup control conn 10.16.27.3:29042: Unexpected persistence error: No local service found for tenant 20116ecf-27f9-4540-8f42-db0faa162e9d
01:59:58.483 app-stg-ue4-gke flight-plan-service-worker 2023/02/22 07:59:58 gocql: unable setup control conn 10.16.27.3:29042: Unexpected persistence error: No local service found for tenant 20116ecf-27f9-4540-8f42-db0faa162e9d
02:06:15.441 app-stg-ue4-gke flight-plan-service-worker 2023/02/22 08:06:15 gocql: unable setup control conn 10.16.17.4:29042: Unexpected persistence error: No local service found for tenant 20116ecf-27f9-4540-8f42-db0faa162e9d
02:06:15.446 app-stg-ue4-gke flight-plan-service-worker 2023/02/22 08:06:15 gocql: unable setup control conn 10.16.17.4:29042: Unexpected persistence error: No local service found for tenant 20116ecf-27f9-4540-8f42-db0faa162e9d
02:06:15.446 app-stg-ue4-gke flight-plan-service-worker 2023/02/22 08:06:15 gocql: unable setup control conn 10.16.17.4:29
The text was updated successfully, but these errors were encountered:
I'm not sure what the issue is here. The issue may lie in the gocql logic itself, but we have a problem where a rolling restart causes disconnects and failure to reconnect until the app is restarted. What seems to happen is that the driver disconnects when a node is restarted. It reconnects to another node, and continues to do this until the last node is restarted. Once that last node is restarted, the driver will not reconnect. It's like the driver is holding onto a list of nodes that were dead and not updating that list.
My assumption is the policy here is a problem:
https://github.com/gocql/gocql/blob/v1.3.1/policies.go#L824
There's a removehost and hostdown function that may be removing the nodes from the available list and never pushing them back to alive once they come back up.
Here's the log when the connection is failing:
The text was updated successfully, but these errors were encountered: