Skip to content

Conversation

@bchrobot
Copy link

For example, a Postgres 12 cluster with backups saved for 7 days may have timelines 5-8. Upon major version upgrade, the timeline is reset to 1. After a few days there may now be timelines 1-4 (PG 13) and 5-8 (PG 12) in cloud storage. A recovery attempt without a specific timeline set via recovery_target_timeline will fail as the PG 12 timeline 5 does not follow the PG 13 timeline 4.

@CyberDem0n
Copy link
Contributor

Upon major version upgrade, the timeline is reset to 1

Yes, this is absolutely standard behavior. Major upgrade with pg_upgrade involves initializing the new PGDATA with initdb, which has timeline=1

A recovery attempt without a specific timeline set via recovery_target_timeline will fail as the PG 12 timeline 5 does not follow the PG 13 timeline 4.

Sorry, but this is not true. Backups for different major versions are written to different places in the bucket:
https://github.com/zalando/spilo/blob/c91248e26e2ea910304d04a3acbeda1e965e2e42/postgres-appliance/scripts/configure_spilo.py#L763

bucket_path = '/spilo/{WAL_BUCKET_SCOPE_PREFIX}{SCOPE}{WAL_BUCKET_SCOPE_SUFFIX}/wal/{PGVERSION}'.format(**wale)

I.e., for version 12 it would be /spilo/very-long-uid/my-cluster-name/wal/12 and for version 13 it would be /spilo/very-long-uid/my-cluster-name/wal/13.

When restoring from the backup you either have to specify the exact location where to restore from, or the backup-restore script will try to restore from all possible locations until it finds something:

/spilo/very-long-uid/my-cluster-name/wal/13
/spilo/very-long-uid/my-cluster-name/wal/12
/spilo/very-long-uid/my-cluster-name/wal/11
/spilo/very-long-uid/my-cluster-name/wal/10
/spilo/very-long-uid/my-cluster-name/wal/9.6
/spilo/very-long-uid/my-cluster-name/wal/9.5
/spilo/very-long-uid/my-cluster-name/wal

And once the suitable backup was found it will stick to this location.

If it started restoring the backup from version 13 there is no way it could jump back to 12.

@bchrobot
Copy link
Author

bchrobot commented Apr 23, 2021

Ah, reviewing the physical backups documentation added in #1367 I see that our pod config envvars are likely to blame

When we initially set up postgres-operator the only fully-worked example we could find and get to work was this:
https://www.redpill-linpro.com/techblog/2019/09/28/postgres-in-kubernetes.html#backup-configuration

which shows defining WALE_*_PREFIX, stripping the version and uid.

Thank you for the explanation @CyberDem0n !

@bchrobot bchrobot closed this Apr 23, 2021
@bchrobot bchrobot deleted the docs-major-version-upgrade branch April 23, 2021 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants