Switchover fails randomly in synchronous replication

- **Which image of the operator are you using?** e.g. registry.opensource.zalan.do/acid/postgres-operator:v1.7.0
- **Where do you run it - cloud or metal? Kubernetes or OpenShift?** GCP
- **Are you running Postgres Operator in production?** no
- **Type of issue?** Bug report

Here is my cluster manifest:

```yaml
apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
  name: foo-cluster
  namespace: default
spec:
  teamId: "foo"
  numberOfInstances: 3
  volume:
    size: 1Gi
  postgresql:
    version: "13"
  patroni:
    synchronous_mode: true
```

Before performing the rolling update:

```shell-session
$ kubectl exec foo-cluster-0 -- patronictl list
+ Cluster: foo-cluster (7030642225746063429) --------+----+-----------+
| Member        | Host      | Role         | State   | TL | Lag in MB |
+---------------+-----------+--------------+---------+----+-----------+
| foo-cluster-0 | 10.4.0.16 | Replica      | running |  4 |         0 |
| foo-cluster-1 | 10.4.1.10 | Leader       | running |  4 |           |
| foo-cluster-2 | 10.4.0.15 | Sync Standby | running |  4 |         0 |
+---------------+-----------+--------------+---------+----+-----------+
```

Before performing the switchover:

```shell-session
$ kubectl exec foo-cluster-0 -- patronictl list
+ Cluster: foo-cluster (7030642225746063429) --------+----+-----------+
| Member        | Host      | Role         | State   | TL | Lag in MB |
+---------------+-----------+--------------+---------+----+-----------+
| foo-cluster-0 | 10.4.0.17 | Sync Standby | running |  4 |         0 |
| foo-cluster-1 | 10.4.1.10 | Leader       | running |  4 |           |
| foo-cluster-2 | 10.4.0.18 | Replica      | running |  4 |         0 |
+---------------+-----------+--------------+---------+----+-----------+
```

and:

```shell-session
$ kubectl logs foo-cluster-1
...
2021-11-15 04:22:28,454 INFO: received failover request with leader=foo-cluster-1 candidate=foo-cluster-2 scheduled_at=None
2021-11-15 04:22:28,474 INFO: Got response from foo-cluster-2 http://10.4.0.18:8008/patroni: {"state": "running", "postmaster_start_time": "2021-11-15 04:22:27.147246+00:00", "role": "replica", "server_version": 130004, "cluster_unlocked": false, "xlog": {"received_location": 134217728, "replayed_location": 135242840, "replayed_timestamp": "2021-11-15 04:18:58.350110+00:00", "paused": false}, "timeline": 4, "database_system_identifier": "7030642225746063429", "patroni": {"version": "2.1.1", "scope": "foo-cluster"}}
2021-11-15 04:22:28,604 INFO: Lock owner: foo-cluster-1; I am foo-cluster-1
2021-11-15 04:22:28,612 WARNING: Failover candidate=foo-cluster-2 does not match with sync_standbys=foo-cluster-0
2021-11-15 04:22:28,612 WARNING: manual failover: members list is empty
2021-11-15 04:22:28,612 WARNING: manual failover: no healthy members found, failover is not possible
2021-11-15 04:22:28,612 INFO: Cleaning up failover key
...
(foo-cluster-1 is going to be shutdown and recreated...)
```

After completing the rolling update:

```shell-session
$ kubectl exec foo-cluster-0 -- patronictl list
+ Cluster: foo-cluster (7030642225746063429) --------+----+-----------+
| Member        | Host      | Role         | State   | TL | Lag in MB |
+---------------+-----------+--------------+---------+----+-----------+
| foo-cluster-0 | 10.4.0.17 | Leader       | running |  5 |           |
| foo-cluster-1 | 10.4.1.11 | Replica      | running |  5 |         0 |
| foo-cluster-2 | 10.4.0.18 | Sync Standby | running |  5 |         0 |
+---------------+-----------+--------------+---------+----+-----------+
```

I also got:

```shell-session
$ kubectl get events --sort-by=.lastTimestamp
...
2m11s       Normal    Update                    postgresql/foo-cluster                          Performing rolling update
2m11s       Normal    Killing                   pod/foo-cluster-0                               Stopping container postgres
111s        Normal    Scheduled                 pod/foo-cluster-0                               Successfully assigned default/foo-cluster-0 to gke-cluster-1-default-pool-f03fb5a7-3whg
111s        Normal    SuccessfulCreate          statefulset/foo-cluster                         create Pod foo-cluster-0 in StatefulSet foo-cluster successful
101s        Normal    SuccessfulAttachVolume    pod/foo-cluster-0                               AttachVolume.Attach succeeded for volume "pvc-9f2ee3a4-e211-4d64-afd5-14beef28685d"
93s         Normal    Created                   pod/foo-cluster-0                               Created container postgres
93s         Normal    Started                   pod/foo-cluster-0                               Started container postgres
93s         Normal    Pulled                    pod/foo-cluster-0                               Container image "registry.opensource.zalan.do/acid/spilo-14:2.1-p3" already present on machine
89s         Normal    Killing                   pod/foo-cluster-2                               Stopping container postgres
61s         Normal    Scheduled                 pod/foo-cluster-2                               Successfully assigned default/foo-cluster-2 to gke-cluster-1-default-pool-f03fb5a7-3whg
61s         Normal    SuccessfulCreate          statefulset/foo-cluster                         create Pod foo-cluster-2 in StatefulSet foo-cluster successful
49s         Normal    SuccessfulAttachVolume    pod/foo-cluster-2                               AttachVolume.Attach succeeded for volume "pvc-0215d8ca-3ca7-4535-b29c-c40c203c75b2"
43s         Normal    Pulled                    pod/foo-cluster-2                               Container image "registry.opensource.zalan.do/acid/spilo-14:2.1-p3" already present on machine
43s         Normal    Started                   pod/foo-cluster-2                               Started container postgres
43s         Normal    Created                   pod/foo-cluster-2                               Created container postgres
39s         Normal    Switchover                postgresql/foo-cluster                          Switching over from "foo-cluster-1" to "default/foo-cluster-2"
38s         Normal    Switchover                postgresql/foo-cluster                          Switchover from "foo-cluster-1" to "default/foo-cluster-2" FAILED: could not switch over from "foo-cluster-1" to "default/foo-cluster-2": patroni returned 'Failover failed'
38s         Normal    Killing                   pod/foo-cluster-1                               Stopping container postgres
19s         Normal    SuccessfulCreate          statefulset/foo-cluster                         create Pod foo-cluster-1 in StatefulSet foo-cluster successful
19s         Normal    Scheduled                 pod/foo-cluster-1                               Successfully assigned default/foo-cluster-1 to gke-cluster-1-default-pool-f03fb5a7-hwqt
12s         Normal    SuccessfulAttachVolume    pod/foo-cluster-1                               AttachVolume.Attach succeeded for volume "pvc-d6dcc1ad-1653-48d4-a9df-de14c7b3f639"
9s          Normal    Created                   pod/foo-cluster-1                               Created container postgres
9s          Normal    Started                   pod/foo-cluster-1                               Started container postgres
9s          Normal    Pulled                    pod/foo-cluster-1                               Container image "registry.opensource.zalan.do/acid/spilo-14:2.1-p3" already present on machine
5s          Normal    Update                    postgresql/foo-cluster                          Rolling update done - pods have been recreated
```

This problem is caused by the fact that the switchover destination is randomly selected from replicas, and in the above example, the switchover from `foo-cluster-1` (Master) to `foo-cluster-2` (Replica) fails.

https://github.com/zalando/postgres-operator/blob/1eafd688d0a100a8a1c5c76fd87e29e09b14d209/pkg/cluster/pod.go#L462

https://github.com/zalando/postgres-operator/blob/1eafd688d0a100a8a1c5c76fd87e29e09b14d209/pkg/cluster/util.go#L528-L530

When using synchronous replication, the switchover destination should be selected from synchronous standbys.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Switchover fails randomly in synchronous replication #1686

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	func masterCandidate(replicas []spec.NamespacedName) spec.NamespacedName {
	return replicas[rand.Intn(len(replicas))]
	}

Switchover fails randomly in synchronous replication #1686

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions