Skip to content

Cluster Formation not working as expected #989

Closed
@aaguilartablada

Description

@aaguilartablada

Describe the bug

Sometimes the RabbitMQCluster objects create a cluster, but often they create standalone nodes.

To Reproduce

After installing 8 clusters as follows:

apiVersion: rabbitmq.com/v1beta1
kind: RabbitmqCluster
metadata:
  name: <name>
spec:
  replicas: 3
  resources:
    requests:
      cpu: 250m
      memory: 1Gi
    limits:
      cpu: 250m
      memory: 1Gi
  rabbitmq:
    additionalConfig: |
      cluster_partition_handling = pause_minority
      vm_memory_high_watermark_paging_ratio = 0.99
      disk_free_limit.relative = 1.0
      collect_statistics_interval = 10000
  persistence:
    storageClassName: default
    storage: "32Gi"

Only 3 of them are 3-nodes clusters. The 5 remaining are 2-nodes and 1-node clusters. This behavior seems to be completely random.

For example:

[scrm-az@localhost ~]$ kubectl -n test get pod -l app.kubernetes.io/name=one
NAME           READY   STATUS    RESTARTS   AGE
one-server-0   1/1     Running   0          35m
one-server-1   1/1     Running   0          35m
one-server-2   1/1     Running   0          35m
[scrm-az@localhost ~]$ kubectl -n test exec -it one-server-0 -- rabbitmqctl cluster_status
Cluster status of node rabbit@one-server-0.one-nodes.test ...
Basics

Cluster name: one

Disk Nodes

rabbit@one-server-0.one-nodes.test
rabbit@one-server-1.one-nodes.test

Running Nodes

rabbit@one-server-0.one-nodes.test
rabbit@one-server-1.one-nodes.test
[scrm-az@localhost ~]$ kubectl -n test exec -it one-server-2 -- rabbitmqctl cluster_status
Cluster status of node rabbit@one-server-2.one-nodes.test ...
Basics

Cluster name: one

Disk Nodes

rabbit@one-server-2.one-nodes.test

Running Nodes

rabbit@one-server-2.one-nodes.test

Expected behavior
I expect having eight 3-nodes clusters.

Version and environment information

  • RabbitMQ: 3.8.27-debian-10-r49
  • RabbitMQ Cluster Operator: 1.12.1-scratch-r0
  • Kubernetes: v1.22.6
  • Cloud provider: Azure AKS

UPDATE

Overriding StatefulSet spec to set podManagementPolicy as OrderedReady solve the problem. I relaunched the 8 clusters several times and we had 100% success forming clusters.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingclosed-staleIssue or PR closed due to long period of inactivitystaleIssue or PR with long period of inactivity

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions