docs: add note about archive timeline reset on upgrade #1463

bchrobot · 2021-04-19T14:09:54Z

For example, a Postgres 12 cluster with backups saved for 7 days may have timelines 5-8. Upon major version upgrade, the timeline is reset to 1. After a few days there may now be timelines 1-4 (PG 13) and 5-8 (PG 12) in cloud storage. A recovery attempt without a specific timeline set via recovery_target_timeline will fail as the PG 12 timeline 5 does not follow the PG 13 timeline 4.

CyberDem0n · 2021-04-23T13:49:36Z

Upon major version upgrade, the timeline is reset to 1

Yes, this is absolutely standard behavior. Major upgrade with pg_upgrade involves initializing the new PGDATA with initdb, which has timeline=1

A recovery attempt without a specific timeline set via recovery_target_timeline will fail as the PG 12 timeline 5 does not follow the PG 13 timeline 4.

Sorry, but this is not true. Backups for different major versions are written to different places in the bucket:
https://github.com/zalando/spilo/blob/c91248e26e2ea910304d04a3acbeda1e965e2e42/postgres-appliance/scripts/configure_spilo.py#L763

bucket_path = '/spilo/{WAL_BUCKET_SCOPE_PREFIX}{SCOPE}{WAL_BUCKET_SCOPE_SUFFIX}/wal/{PGVERSION}'.format(**wale)

I.e., for version 12 it would be /spilo/very-long-uid/my-cluster-name/wal/12 and for version 13 it would be /spilo/very-long-uid/my-cluster-name/wal/13.

When restoring from the backup you either have to specify the exact location where to restore from, or the backup-restore script will try to restore from all possible locations until it finds something:

/spilo/very-long-uid/my-cluster-name/wal/13
/spilo/very-long-uid/my-cluster-name/wal/12
/spilo/very-long-uid/my-cluster-name/wal/11
/spilo/very-long-uid/my-cluster-name/wal/10
/spilo/very-long-uid/my-cluster-name/wal/9.6
/spilo/very-long-uid/my-cluster-name/wal/9.5
/spilo/very-long-uid/my-cluster-name/wal

And once the suitable backup was found it will stick to this location.

If it started restoring the backup from version 13 there is no way it could jump back to 12.

bchrobot · 2021-04-23T14:35:14Z

Ah, reviewing the physical backups documentation added in #1367 I see that our pod config envvars are likely to blame

When we initially set up postgres-operator the only fully-worked example we could find and get to work was this:
https://www.redpill-linpro.com/techblog/2019/09/28/postgres-in-kubernetes.html#backup-configuration

which shows defining WALE_*_PREFIX, stripping the version and uid.

Thank you for the explanation @CyberDem0n !

docs: add note about archive timeline reset on upgrade

54fa976

bchrobot requested review from CyberDem0n, erthalion, FxKu, Jan-M, RafiaSabih and sdudoladov as code owners April 19, 2021 14:09

bchrobot closed this Apr 23, 2021

bchrobot deleted the docs-major-version-upgrade branch April 23, 2021 14:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add note about archive timeline reset on upgrade #1463

docs: add note about archive timeline reset on upgrade #1463

bchrobot commented Apr 19, 2021

CyberDem0n commented Apr 23, 2021

bchrobot commented Apr 23, 2021 •

edited

Loading

docs: add note about archive timeline reset on upgrade #1463

docs: add note about archive timeline reset on upgrade #1463

Conversation

bchrobot commented Apr 19, 2021

CyberDem0n commented Apr 23, 2021

bchrobot commented Apr 23, 2021 • edited Loading

bchrobot commented Apr 23, 2021 •

edited

Loading