-
Notifications
You must be signed in to change notification settings - Fork 295
Description
Describe the bug
RabbitMQ cluster cannot recover if someone deletes all pods in the rabbitmq cluster using kubectl cli.
To Reproduce
Steps to reproduce the behavior:
- Create RabbitmqCluster with 3 replicas.
- Once cluster is healthy, then delete all 3 pods with kubectl cli.
kubectl delete pods rabbitmq-server-0 rabbitmq-server-1 rabbitmq-server-2
- Verify rabbitmq pods status.
kubectl get pods
- NAME READY STATUS RESTARTS AGE
rabbitmq-server-0 0/1 Running 5 69m
kubectl logs -f rabbitmq-server-0
2021-02-17 14:29:41.001 [info] <0.273.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2021-02-17 14:30:11.002 [warning] <0.273.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,['rabbit@rabbitmq-server-2.rabbitmq-nodes.rabbitmq-cl-op-poc','rabbit@rabbitmq-server-1.rabbitmq-nodes.rabbitmq-cl-op-poc','rabbit@rabbitmq-server-0.rabbitmq-nodes.rabbitmq-cl-op-poc'],[rabbit_durable_queue]}
2021-02-17 14:30:11.002 [info] <0.273.0> Waiting for Mnesia tables for 30000 ms, 8 retries left
2021-02-17 14:30:41.002 [warning] <0.273.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,['rabbit@rabbitmq-server-2.rabbitmq-nodes.rabbitmq-cl-op-poc','rabbit@rabbitmq-server-1.rabbitmq-nodes.rabbitmq-cl-op-poc','rabbit@rabbitmq-server-0.rabbitmq-nodes.rabbitmq-cl-op-poc'],[rabbit_durable_queue]}
2021-02-17 14:30:41.003 [info] <0.273.0> Waiting for Mnesia tables for 30000 ms, 7 retries left
2021-02-17 14:31:11.004 [warning] <0.273.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,['rabbit@rabbitmq-server-2.rabbitmq-nodes.rabbitmq-cl-op-poc','rabbit@rabbitmq-server-1.rabbitmq-nodes.rabbitmq-cl-op-poc','rabbit@rabbitmq-server-0.rabbitmq-nodes.rabbitmq-cl-op-poc'],[rabbit_durable_queue]}
2021-02-17 14:31:11.004 [info] <0.273.0> Waiting for Mnesia tables for 30000 ms, 6 retries left
2021-02-17 14:31:41.005 [warning] <0.273.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,['rabbit@rabbitmq-server-2.rabbitmq-nodes.rabbitmq-cl-op-poc','rabbit@rabbitmq-server-1.rabbitmq-nodes.rabbitmq-cl-op-poc','rabbit@rabbitmq-server-0.rabbitmq-nodes.rabbitmq-cl-op-poc'],[rabbit_durable_queue]}
2021-02-17 14:31:41.005 [info] <0.273.0> Waiting for Mnesia tables for 30000 ms, 5 retries left
2021-02-17 14:32:11.006 [warning] <0.273.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,['rabbit@rabbitmq-server-2.rabbitmq-nodes.rabbitmq-cl-op-poc','rabbit@rabbitmq-server-1.rabbitmq-nodes.rabbitmq-cl-op-poc','rabbit@rabbitmq-server-0.rabbitmq-nodes.rabbitmq-cl-op-poc'],[rabbit_durable_queue]}
2021-02-17 14:32:11.006 [info] <0.273.0> Waiting for Mnesia tables for 30000 ms, 4 retries left
2021-02-17 14:32:41.007 [warning] <0.273.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,['rabbit@rabbitmq-server-2.rabbitmq-nodes.rabbitmq-cl-op-poc','rabbit@rabbitmq-server-1.rabbitmq-nodes.rabbitmq-cl-op-poc','rabbit@rabbitmq-server-0.rabbitmq-nodes.rabbitmq-cl-op-poc'],[rabbit_durable_queue]}
2021-02-17 14:32:41.007 [info] <0.273.0> Waiting for Mnesia tables for 30000 ms, 3 retries left
2021-02-17 14:33:11.008 [warning] <0.273.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,['rabbit@rabbitmq-server-2.rabbitmq-nodes.rabbitmq-cl-op-poc','rabbit@rabbitmq-server-1.rabbitmq-nodes.rabbitmq-cl-op-poc','rabbit@rabbitmq-server-0.rabbitmq-nodes.rabbitmq-cl-op-poc'],[rabbit_durable_queue]}
2021-02-17 14:33:11.008 [info] <0.273.0> Waiting for Mnesia tables for 30000 ms, 2 retries left
2021-02-17 14:33:41.009 [warning] <0.273.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,['rabbit@rabbitmq-server-2.rabbitmq-nodes.rabbitmq-cl-op-poc','rabbit@rabbitmq-server-1.rabbitmq-nodes.rabbitmq-cl-op-poc','rabbit@rabbitmq-server-0.rabbitmq-nodes.rabbitmq-cl-op-poc'],[rabbit_durable_queue]}
2021-02-17 14:33:41.009 [info] <0.273.0> Waiting for Mnesia tables for 30000 ms, 1 retries left
Below values.yaml file used with https://github.com/rabbitmq/cluster-operator/tree/main/charts/rabbitmq
labels:
label1: foo
label2: bar
annotations:
annotation1: foo
annotation2: bar
replicas: 3
imagePullSecrets:
- name: foo
service:
type: LoadBalancer
resources:
requests:
cpu: 100m
memory: 1Gi
limits:
cpu: 100m
memory: 1Gi
tolerations:
- key: "dedicated"
operator: "Equal"
value: "rabbitmq"
effect: "NoSchedule"
rabbitmq:
additionalPlugins:
- rabbitmq_shovel
- rabbitmq_shovel_management
additionalConfig: |
cluster_formation.peer_discovery_backend = rabbit_peer_discovery_k8s
envConfig: |
PLUGINS_DIR=/opt/rabbitmq/plugins:/opt/rabbitmq/community-plugins
advancedConfig: |
[
{ra, [
{wal_data_dir, '/var/lib/rabbitmq/quorum-wal'}
]}
].
terminationGracePeriodSeconds: 42
skipPostDeploySteps: true
override:
statefulSet:
spec:
template:
spec:
containers:
- name: rabbitmq
ports:
- containerPort: 12345 # opens an additional port on the rabbitmq server container
name: additional-port
protocol: TCP
Expected behavior
We had seen this problem when were using bitnami images, and solution for this problem documented here https://github.com/bitnami/charts/tree/master/bitnami/rabbitmq#recovering-the-cluster-from-complete-shutdown
May be it's good to document same for cluster-operator as well
Screenshots
If applicable, add screenshots to help explain your problem.
Version and environment information
- RabbitMQ: 3.8.11
- RabbitMQ Cluster Operator: 1.1.0
- Kubernetes: v1.17.8
- vmware(PKS)
Additional context
Add any other context about the problem here.
https://github.com/bitnami/charts/tree/master/bitnami/rabbitmq#recovering-the-cluster-from-complete-shutdown