Skip to content

Nodes that form a new cluster may not cluster correctly #662

Closed
@gerhard

Description

@gerhard

Describe the bug

Since #621 was introduced, nodes that form a new cluster may not cluster correctly.

In our (+@ansd) case, pod quick-rabbit-server-0 formed its own cluster, while quick-rabbit-server-1 & quick-rabbit-server-2 formed a second cluster with the same name as the first cluster. Everything looks healthy from K8S (3 ready pods) & Erlang perspective (6 healthy distribution links), but we have 2 RabbitMQ clusters, one with 1 node & one with 2 nodes, and this is clearly wrong.

To Reproduce

This is a difficult one to reproduce as it's timing-specific. We are including all the logs and this can be reproduced with https://github.com/rabbitmq/observability-2021/tree/bf77efebc6760e16d8176bc0c7b750204d8b2a7e/talks/emea-tech-talk (private repo available to all maintainers of this repo) using the following steps:

make 1.k8s 2.k8s-rabbitmq 4.resolve-first-problem 5.second-problem

While 5.second-problem is not technically needed, it makes it very obvious as to what the problem is:

image
image

Version and environment information

Quick fix

Our quick fix was to reset quick-rabbit-server-0:

rabbitmqctl stop_app
rabbitmqctl reset
rabbitmqctl start_app

Additional context

kubectl logs --selector app.kubernetes.io/name=quick-rabbit --prefix=true --tail=-1
...
[pod/quick-rabbit-server-0/rabbitmq] 2021-04-14 14:38:40.634 [info] <0.273.0> Node database directory at /var/lib/rabbitmq/mnesia/rabbit@quick-rabbit-server-0.quick-rabbit-nodes.default is empty. Assuming we need to join an existing cluster or initialise from scratch...
[pod/quick-rabbit-server-0/rabbitmq] 2021-04-14 14:38:40.634 [info] <0.273.0> Configured peer discovery backend: rabbit_peer_discovery_k8s
[pod/quick-rabbit-server-0/rabbitmq] 2021-04-14 14:38:40.634 [info] <0.273.0> Will try to lock with peer discovery backend rabbit_peer_discovery_k8s
[pod/quick-rabbit-server-0/rabbitmq] 2021-04-14 14:38:40.634 [info] <0.273.0> Peer discovery backend does not support locking, falling back to randomized delay
[pod/quick-rabbit-server-0/rabbitmq] 2021-04-14 14:38:40.634 [info] <0.273.0> Peer discovery backend rabbit_peer_discovery_k8s supports registration.
[pod/quick-rabbit-server-0/rabbitmq] 2021-04-14 14:38:40.634 [info] <0.273.0> Will wait for 17965 milliseconds before proceeding with registration...
[pod/quick-rabbit-server-0/rabbitmq] 2021-04-14 14:38:58.615 [info] <0.273.0> All discovered existing cluster peers: rabbit@quick-rabbit-server-0.quick-rabbit-nodes.default, rabbit@quick-rabbit-server-1.quick-rabbit-nodes.default, rabbit@quick-rabbit-server-2.quick-rabbit-nodes.default
[pod/quick-rabbit-server-0/rabbitmq] 2021-04-14 14:38:58.615 [info] <0.273.0> Peer nodes we can cluster with: rabbit@quick-rabbit-server-1.quick-rabbit-nodes.default, rabbit@quick-rabbit-server-2.quick-rabbit-nodes.default
[pod/quick-rabbit-server-0/rabbitmq] 2021-04-14 14:38:58.616 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-1.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-0/rabbitmq] 2021-04-14 14:38:58.616 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-2.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-0/rabbitmq] 2021-04-14 14:38:58.616 [error] <0.273.0> Trying to join discovered peers failed. Will retry after a delay of 500 ms, 9 retries left...
[pod/quick-rabbit-server-0/rabbitmq] 2021-04-14 14:38:59.117 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-1.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-0/rabbitmq] 2021-04-14 14:38:59.118 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-2.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-0/rabbitmq] 2021-04-14 14:38:59.118 [error] <0.273.0> Trying to join discovered peers failed. Will retry after a delay of 500 ms, 8 retries left...
[pod/quick-rabbit-server-0/rabbitmq] 2021-04-14 14:38:59.619 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-1.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-0/rabbitmq] 2021-04-14 14:38:59.620 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-2.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-0/rabbitmq] 2021-04-14 14:38:59.620 [error] <0.273.0> Trying to join discovered peers failed. Will retry after a delay of 500 ms, 7 retries left...
[pod/quick-rabbit-server-0/rabbitmq] 2021-04-14 14:39:00.122 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-1.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-0/rabbitmq] 2021-04-14 14:39:00.123 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-2.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-0/rabbitmq] 2021-04-14 14:39:00.123 [error] <0.273.0> Trying to join discovered peers failed. Will retry after a delay of 500 ms, 6 retries left...
[pod/quick-rabbit-server-0/rabbitmq] 2021-04-14 14:39:00.624 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-1.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-0/rabbitmq] 2021-04-14 14:39:00.625 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-2.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-0/rabbitmq] 2021-04-14 14:39:00.625 [error] <0.273.0> Trying to join discovered peers failed. Will retry after a delay of 500 ms, 5 retries left...
[pod/quick-rabbit-server-0/rabbitmq] 2021-04-14 14:39:01.126 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-1.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-0/rabbitmq] 2021-04-14 14:39:01.127 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-2.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-0/rabbitmq] 2021-04-14 14:39:01.127 [error] <0.273.0> Trying to join discovered peers failed. Will retry after a delay of 500 ms, 4 retries left...
[pod/quick-rabbit-server-0/rabbitmq] 2021-04-14 14:39:01.628 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-1.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-0/rabbitmq] 2021-04-14 14:39:01.629 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-2.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-0/rabbitmq] 2021-04-14 14:39:01.629 [error] <0.273.0> Trying to join discovered peers failed. Will retry after a delay of 500 ms, 3 retries left...
[pod/quick-rabbit-server-0/rabbitmq] 2021-04-14 14:39:02.130 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-1.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-0/rabbitmq] 2021-04-14 14:39:02.131 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-2.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-0/rabbitmq] 2021-04-14 14:39:02.131 [error] <0.273.0> Trying to join discovered peers failed. Will retry after a delay of 500 ms, 2 retries left...
[pod/quick-rabbit-server-0/rabbitmq] 2021-04-14 14:39:02.632 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-1.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-0/rabbitmq] 2021-04-14 14:39:02.633 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-2.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-0/rabbitmq] 2021-04-14 14:39:02.633 [error] <0.273.0> Trying to join discovered peers failed. Will retry after a delay of 500 ms, 1 retries left...
[pod/quick-rabbit-server-0/rabbitmq] 2021-04-14 14:39:03.134 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-1.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-0/rabbitmq] 2021-04-14 14:39:03.135 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-2.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-0/rabbitmq] 2021-04-14 14:39:03.135 [error] <0.273.0> Trying to join discovered peers failed. Will retry after a delay of 500 ms, 0 retries left...
[pod/quick-rabbit-server-0/rabbitmq] 2021-04-14 14:39:03.637 [warning] <0.273.0> Could not successfully contact any node of: rabbit@quick-rabbit-server-1.quick-rabbit-nodes.default,rabbit@quick-rabbit-server-2.quick-rabbit-nodes.default (as in Erlang distribution). Starting as a blank standalone node...
...
[pod/quick-rabbit-server-2/rabbitmq] 2021-04-14 14:38:40.764 [info] <0.273.0> Node database directory at /var/lib/rabbitmq/mnesia/rabbit@quick-rabbit-server-2.quick-rabbit-nodes.default is empty. Assuming we need to join an existing cluster or initialise from scratch...
[pod/quick-rabbit-server-2/rabbitmq] 2021-04-14 14:38:40.764 [info] <0.273.0> Configured peer discovery backend: rabbit_peer_discovery_k8s
[pod/quick-rabbit-server-2/rabbitmq] 2021-04-14 14:38:40.764 [info] <0.273.0> Will try to lock with peer discovery backend rabbit_peer_discovery_k8s
[pod/quick-rabbit-server-2/rabbitmq] 2021-04-14 14:38:40.764 [info] <0.273.0> Peer discovery backend does not support locking, falling back to randomized delay
[pod/quick-rabbit-server-2/rabbitmq] 2021-04-14 14:38:40.764 [info] <0.273.0> Peer discovery backend rabbit_peer_discovery_k8s supports registration.
[pod/quick-rabbit-server-2/rabbitmq] 2021-04-14 14:38:40.764 [info] <0.273.0> Will wait for 21437 milliseconds before proceeding with registration...
[pod/quick-rabbit-server-2/rabbitmq] 2021-04-14 14:39:02.211 [info] <0.273.0> All discovered existing cluster peers: rabbit@quick-rabbit-server-0.quick-rabbit-nodes.default, rabbit@quick-rabbit-server-1.quick-rabbit-nodes.default, rabbit@quick-rabbit-server-2.quick-rabbit-nodes.default
[pod/quick-rabbit-server-2/rabbitmq] 2021-04-14 14:39:02.211 [info] <0.273.0> Peer nodes we can cluster with: rabbit@quick-rabbit-server-0.quick-rabbit-nodes.default, rabbit@quick-rabbit-server-1.quick-rabbit-nodes.default
[pod/quick-rabbit-server-2/rabbitmq] 2021-04-14 14:39:02.212 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-0.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-2/rabbitmq] 2021-04-14 14:39:02.212 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-1.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-2/rabbitmq] 2021-04-14 14:39:02.212 [error] <0.273.0> Trying to join discovered peers failed. Will retry after a delay of 500 ms, 9 retries left...
[pod/quick-rabbit-server-2/rabbitmq] 2021-04-14 14:39:02.714 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-0.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-2/rabbitmq] 2021-04-14 14:39:02.715 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-1.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-2/rabbitmq] 2021-04-14 14:39:02.715 [error] <0.273.0> Trying to join discovered peers failed. Will retry after a delay of 500 ms, 8 retries left...
[pod/quick-rabbit-server-2/rabbitmq] 2021-04-14 14:39:03.216 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-0.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-2/rabbitmq] 2021-04-14 14:39:03.217 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-1.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-2/rabbitmq] 2021-04-14 14:39:03.217 [error] <0.273.0> Trying to join discovered peers failed. Will retry after a delay of 500 ms, 7 retries left...
[pod/quick-rabbit-server-2/rabbitmq] 2021-04-14 14:39:03.717 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-0.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-2/rabbitmq] 2021-04-14 14:39:03.733 [info] <0.273.0> Node 'rabbit@quick-rabbit-server-1.quick-rabbit-nodes.default' selected for auto-clustering
...
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:38:40.763 [info] <0.273.0> Node database directory at /var/lib/rabbitmq/mnesia/rabbit@quick-rabbit-server-1.quick-rabbit-nodes.default is empty. Assuming we need to join an existing cluster or initialise from scratch...
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:38:40.763 [info] <0.273.0> Configured peer discovery backend: rabbit_peer_discovery_k8s
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:38:40.763 [info] <0.273.0> Will try to lock with peer discovery backend rabbit_peer_discovery_k8s
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:38:40.763 [info] <0.273.0> Peer discovery backend does not support locking, falling back to randomized delay
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:38:40.763 [info] <0.273.0> Peer discovery backend rabbit_peer_discovery_k8s supports registration.
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:38:40.764 [info] <0.273.0> Will wait for 17512 milliseconds before proceeding with registration...
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:38:58.292 [info] <0.273.0> All discovered existing cluster peers: rabbit@quick-rabbit-server-0.quick-rabbit-nodes.default, rabbit@quick-rabbit-server-1.quick-rabbit-nodes.default, rabbit@quick-rabbit-server-2.quick-rabbit-nodes.default
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:38:58.292 [info] <0.273.0> Peer nodes we can cluster with: rabbit@quick-rabbit-server-0.quick-rabbit-nodes.default, rabbit@quick-rabbit-server-2.quick-rabbit-nodes.default
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:38:58.297 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-0.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:38:58.302 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-2.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:38:58.302 [error] <0.273.0> Trying to join discovered peers failed. Will retry after a delay of 500 ms, 9 retries left...
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:38:58.803 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-0.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:38:58.804 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-2.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:38:58.804 [error] <0.273.0> Trying to join discovered peers failed. Will retry after a delay of 500 ms, 8 retries left...
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:38:59.305 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-0.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:38:59.306 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-2.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:38:59.306 [error] <0.273.0> Trying to join discovered peers failed. Will retry after a delay of 500 ms, 7 retries left...
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:38:59.807 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-0.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:38:59.808 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-2.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:38:59.808 [error] <0.273.0> Trying to join discovered peers failed. Will retry after a delay of 500 ms, 6 retries left...
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:39:00.309 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-0.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:39:00.310 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-2.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:39:00.310 [error] <0.273.0> Trying to join discovered peers failed. Will retry after a delay of 500 ms, 5 retries left...
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:39:00.811 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-0.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:39:00.812 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-2.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:39:00.812 [error] <0.273.0> Trying to join discovered peers failed. Will retry after a delay of 500 ms, 4 retries left...
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:39:01.313 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-0.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:39:01.314 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-2.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:39:01.314 [error] <0.273.0> Trying to join discovered peers failed. Will retry after a delay of 500 ms, 3 retries left...
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:39:01.815 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-0.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:39:01.816 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-2.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:39:01.816 [error] <0.273.0> Trying to join discovered peers failed. Will retry after a delay of 500 ms, 2 retries left...
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:39:02.317 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-0.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:39:02.318 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-2.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:39:02.318 [error] <0.273.0> Trying to join discovered peers failed. Will retry after a delay of 500 ms, 1 retries left...
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:39:02.819 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-0.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:39:02.820 [warning] <0.273.0> Could not auto-cluster with node rabbit@quick-rabbit-server-2.quick-rabbit-nodes.default: {error,tables_not_present}
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:39:02.820 [error] <0.273.0> Trying to join discovered peers failed. Will retry after a delay of 500 ms, 0 retries left...
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:39:03.320 [warning] <0.273.0> Could not successfully contact any node of: rabbit@quick-rabbit-server-0.quick-rabbit-nodes.default,rabbit@quick-rabbit-server-2.quick-rabbit-nodes.default (as in Erlang distribution). Starting as a blank standalone node...
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:39:03.324 [info] <0.44.0> Application mnesia exited with reason: stopped
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:39:03.324 [info] <0.44.0> Application mnesia exited with reason: stopped
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:39:03.351 [info] <0.44.0> Application mnesia started on node 'rabbit@quick-rabbit-server-1.quick-rabbit-nodes.default'
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:39:03.461 [info] <0.273.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:39:03.461 [info] <0.273.0> Successfully synced tables from a peer
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:39:03.494 [info] <0.273.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
[pod/quick-rabbit-server-1/rabbitmq] 2021-04-14 14:39:03.494 [info] <0.273.0> Successfully synced tables from a peer
...

Attaching all logs as a file (1000+ lines): parallel-startup-problem.txt

Related improvement

In the context of alerts, we are missing metrics that would enable us to alert when the expected number of RabbitMQ nodes are not present in the cluster. In this case, we were expecting RabbitMQ to form a 3-node cluster. If that doesn't happen, we should have an alert that would catch it. Something for myself & @ansd to follow-up on. I'm adding it here so that we have it all in a single place.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions