-
Notifications
You must be signed in to change notification settings - Fork 78
Description
If a user uses pg_rman, postgresql which version is 12 or higner, and pacemaker pgsql resource agent,
there is a case postgresql can't start properly. The case is following.
(1) restore with pg_rman
(2) start postgresql server and make the archive recovery done
(3) stop postgresql server
(4) start postgresql server as standby by pacemaker pgsql resource agent
(But, the postgresql can't reach consistency and can't accept connections.)
The reason is that postgresql regards old "recovery_target_timeline" value as valid in (4)
although the timeline ID was incremented in (2).
For example,
(1) restore with pg_rman
// ex. pg_rman restores with "recovery_target_timeline = 4"
(2) start postgresql server and make the recovery done
// ex. timeline id is incremented to "5"
(3) stop postgresql server
(4) start postgresql server as standby by the pacemaker pgsql resource agent
// ex. timeline id is "5", but recovery_target_timeline = "4"
// The postgresql can't find the checkpoint wal record when postgres startups because the timeline is not valid.
// So, the postgresql can't reach consistency.
To avoid the issue, users need to remove the "recovery_target_timeline" before executing (4).
In essence, the issue occurs with a combination of PITR with "recovery_target_timline" and
the pacemaker pgsql resource agent, not with pg_rman. But, it's better to add notes in pg_rman's documentation.
Reported-by: NTT COMWARE Corporation (Tatsuro Yamada)