Keep synchronizing slots when others are lagging on primary. #31

rdunklau · 2023-09-29T13:30:30Z

Instead of blocking indefinitely for a replication slot to be syncable, introduce a new GUC pg_failover_slots.sync_timeout after which we will move to the next one.

To avoid waiting from scratch, we create the replication as temporary ones instead of ephemeral ones, allowing them to keep their state between runs. When the slot is finally synced, we persist it to disk.

Since we do not block in waiting state anymore, we need to cleanup the inconsistent slots after promotion.

This solves the possible issue of having an inactive slot in the primary which prevents every other slots to be synced to the standby.

kfcss · 2023-10-05T08:24:43Z

We have noticed the same problems. This results in the secondary consuming more disk space than the primary.

PJMODOS

This looks interesting, I am starting to wonder whether we need some kind of status function that will report state of all the slots in some user consumable way.

rdunklau · 2023-10-25T06:29:37Z

This looks interesting, I am starting to wonder whether we need some kind of status function that will report state of all the slots in some user consumable way.

For monitoring purposes that's a pretty good idea. As of now, one can rely on the persistence of the slot to infer it's status but it's not ideal.

rdunklau · 2023-12-11T12:51:13Z

Do you have any opinion on that design ?

Instead of blocking indefinitely for a replication slot to be syncable, introduce a new GUC pg_failover_slots.sync_timeout after which we will move to the next one. To avoid waiting from scratch, we create the replication as temporary ones instead of ephemeral ones, allowing them to keep their state between runs. When the slot is finally synced, we persist it to disk. Since we do not block in waiting state anymore, we need to cleanup the inconsistent slots after promotion.

rdunklau · 2024-09-20T12:33:09Z

I just rebased it on the current master branch.

PJMODOS reviewed Oct 23, 2023

View reviewed changes

rdunklau mentioned this pull request Jan 9, 2024

Do not wait for an inactive replication slot to catch up. #36

Open

rdunklau force-pushed the add_sync_timeout branch from 26ef7c5 to db2c9e6 Compare February 14, 2024 06:07

PJMODOS requested a review from petere August 5, 2024 21:43

rdunklau force-pushed the add_sync_timeout branch from db2c9e6 to 60644bb Compare September 20, 2024 12:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keep synchronizing slots when others are lagging on primary. #31

Keep synchronizing slots when others are lagging on primary. #31

rdunklau commented Sep 29, 2023

kfcss commented Oct 5, 2023

PJMODOS left a comment

rdunklau commented Oct 25, 2023

rdunklau commented Dec 11, 2023

rdunklau commented Sep 20, 2024

Keep synchronizing slots when others are lagging on primary. #31

Are you sure you want to change the base?

Keep synchronizing slots when others are lagging on primary. #31

Conversation

rdunklau commented Sep 29, 2023

kfcss commented Oct 5, 2023

PJMODOS left a comment

Choose a reason for hiding this comment

rdunklau commented Oct 25, 2023

rdunklau commented Dec 11, 2023

rdunklau commented Sep 20, 2024