fix(failover): prevent double failover in case of lost connectivity by leonardoce · Pull Request #5788 · cloudnative-pg/cloudnative-pg

leonardoce · 2024-10-10T15:32:30Z

This patch ensures the operator does not trigger two failovers when a primary Pod loses connectivity and fails to recognize its role change from primary to replica.

Previously, the first failover occurred when the operator detected that the primary Pod was no longer ready or present. A second failover could be triggered if the old primary Pod recovered before the Kubelet timeout, with the operator potentially promoting it to primary again based on the Pod list.

With this patch, the operator will wait for the recovered Pod to acknowledge its new role before taking further action, preventing unnecessary failovers.

Closes: #2513

Release notes

Prevent double failover in case of lost connectivity

github-actions · 2024-10-10T15:32:46Z

❗ By default, the pull request is configured to backport to all release branches.

To stop backporting this pr, remove the label: backport-requested ◀️ or add the label 'do not backport'
To stop backporting this pr to a certain release branch, remove the specific branch label: release-x.y

leonardoce · 2024-10-14T13:54:44Z

E2e: https://github.com/EnterpriseDB/cloudnative-pg/actions/runs/11329017284

…tivity This patch prevents the operator from failing over two times when a Pod loses connectivity and doesn't notice the change of its current role from primary to replica. The first failover would happen when the operator notices the primary Pod is not ready/present anymore. The second failover will happen if the old primary comes back to life before the Kubelet timeout, with the operator potentially failing over to the first one of the Pod list. When this happens, we will wait for the Pod to understand its current role. Closes: cloudnative-pg#2513 Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com>

Signed-off-by: Gabriele Bartolini <gabriele.bartolini@enterprisedb.com>

fcanovai · 2024-10-15T07:41:01Z

/ok-to-merge

…5788) This patch ensures the operator does not trigger two failovers when a primary Pod loses connectivity and fails to recognize its role change from primary to replica. Previously, the first failover occurred when the operator detected that the primary Pod was no longer ready or present. A second failover could be triggered if the old primary Pod recovered before the Kubelet timeout, with the operator potentially promoting it to primary again based on the Pod list. With this patch, the operator will wait for the recovered Pod to acknowledge its new role before taking further action, preventing unnecessary failovers. Closes: #2513 --------- Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com> Signed-off-by: Gabriele Bartolini <gabriele.bartolini@enterprisedb.com> Co-authored-by: Gabriele Bartolini <gabriele.bartolini@enterprisedb.com> (cherry picked from commit 3618164)

leonardoce requested a review from a team as a code owner October 10, 2024 15:32

cnpg-bot added backport-requested ◀️ This pull request should be backported to all supported releases release-1.22 release-1.23 release-1.24 labels Oct 10, 2024

leonardoce force-pushed the api-split branch 3 times, most recently from 4f0e04f to cc666ff Compare October 12, 2024 11:48

fcanovai force-pushed the api-split branch from cc666ff to b4beee2 Compare October 14, 2024 11:31

gbartolini changed the title ~~fix(failover): avoid failing over multiple times with unstable connectivity~~ fix(failover): prevent double failover in case of lost connectivity Oct 14, 2024

gbartolini approved these changes Oct 14, 2024

View reviewed changes

fcanovai approved these changes Oct 14, 2024

View reviewed changes

leonardoce and others added 2 commits October 15, 2024 09:17

chore: comments

92acae2

Signed-off-by: Gabriele Bartolini <gabriele.bartolini@enterprisedb.com>

leonardoce force-pushed the api-split branch from 9f5c562 to 92acae2 Compare October 15, 2024 07:17

cnpg-bot added the ok to merge 👌 This PR can be merged label Oct 15, 2024

fcanovai merged commit 3618164 into cloudnative-pg:main Oct 15, 2024

ardentperf mentioned this pull request Sep 8, 2025

merge upstream geico/cloudnative-pg#2

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(failover): prevent double failover in case of lost connectivity#5788

fix(failover): prevent double failover in case of lost connectivity#5788
fcanovai merged 2 commits intocloudnative-pg:mainfrom
leonardoce:api-split

leonardoce commented Oct 10, 2024 •

edited by gbartolini

Loading

Uh oh!

github-actions bot commented Oct 10, 2024

Uh oh!

leonardoce commented Oct 14, 2024 •

edited

Loading

Uh oh!

fcanovai commented Oct 15, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

leonardoce commented Oct 10, 2024 • edited by gbartolini Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Release notes

Uh oh!

github-actions bot commented Oct 10, 2024

Uh oh!

leonardoce commented Oct 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fcanovai commented Oct 15, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

leonardoce commented Oct 10, 2024 •

edited by gbartolini

Loading

leonardoce commented Oct 14, 2024 •

edited

Loading