Point-In-Time Recovery (PITR) in PostgreSQL together with pgbackrest allows you to restore a database to a specific point in time.
This examples shows how to do an "interactive" backup, the first checkpoint is
2024-06-17 09:18:22+00
, multiple records inserted after that time, but we
want to go back to that specific time to find if there is the required data
if timezone is omitted, local time is assumed, for UTC use +00
Stop Postgresql
First you need to stop postgres:
systemctl stop postgresql.service
this varies depending on your operating system here I am using Debian 12
Remove PGDATA
Now remove the PGDATA directory, in my case:
rm -rf /db/16
Restore (--target-action=pause)
Perform the initialy recovery with pgbackrest restore, restore to the
desired point in time, but it is importan to set --target-action=pause
so
recovery pauses once the target is reached instead of promoting the node:
pgbackrest --stanza=standalone \
--delta \
--log-level-console=info \
--type=time \
--target="2024-06-17 09:18:22+00" \
--target-action=pause restore
This will output something like this:
$ pgbackrest --stanza=standalone --delta --log-level-console=info --type=time --target="2024-06-17 09:18:22+00" --target-action=pause restore
2024-06-18 16:21:02.983 P00 INFO: restore command begin 2.52: --delta --exec-id=721872-e8f6a592 --log-level-console=info --log-path=/db/pgbackrest/log --pg1-path=/db/16 --process-max=3 --repo1-cipher-pass=<redacted> --repo1-cipher-type=aes-256-cbc --repo1-path=/repo1 --repo1-s3-bucket=pg16 --repo1-s3-endpoint=my-s3-endpoint --repo1-s3-key=<redacted> --repo1-s3-key-secret=<redacted> --repo1-s3-region=region --no-repo1-storage-verify-tls --repo1-type=s3 --spool-path=/db/pgbackrest/spool --stanza=standalone --target="2024-06-17 09:18:22+00" --target-action=pause --type=time
WARN: --delta or --force specified but unable to find 'PG_VERSION' or 'backup.manifest' in '/db/16' to confirm that this is a valid $PGDATA directory. --delta and --force have been disabled and if any files exist in the destination directories the restore will be aborted.
2024-06-18 16:21:03.398 P00 INFO: repo1: restore backup set 20240612-190741F_20240615-101611I, recovery will start at 2024-06-15 10:16:11
2024-06-18 16:21:46.235 P00 INFO: write updated /db/16/postgresql.auto.conf
2024-06-18 16:21:46.240 P00 INFO: restore global/pg_control (performed last to ensure aborted restores cannot be started)
2024-06-18 16:21:46.240 P00 INFO: restore size = 29.6MB, file total = 1270
2024-06-18 16:21:46.241 P00 INFO: restore command end: completed successfully (43260ms)
pg_ctl
Start using pg_ctl -D PGDATA start
/usr/lib/postgresql/16/bin/pg_ctl -D /db/16 start
In the Postgres logs, you can check that WAL replay paused once the configured recovery target was reached:
024-06-18 16:24:19.279 P00 INFO: found 00000001000000000000008B in the archive asynchronously
2024-06-18 16:24:19.279 P00 INFO: archive-get command end: completed successfully (2ms)
time=2024-06-18 16:24:19 UTC, pid=721892 LOG: restored log file "00000001000000000000008B" from archive
time=2024-06-18 16:24:19 UTC, pid=721892 LOG: recovery stopping before commit of transaction 2996, time 2024-06-17 14:24:18.691936+00
time=2024-06-18 16:24:19 UTC, pid=721892 LOG: pausing at the end of recovery
time=2024-06-18 16:24:19 UTC, pid=721892 HINT: Execute pg_wal_replay_resume() to promote.
You can also verify WAL reply is paused by checking the output of the function pg_is_wal_replay_paused
, for example:
$ psql -c "SELECT pg_is_wal_replay_paused();"
pg_is_wal_replay_paused
-------------------------
t
(1 row)
Connect and check if your data is at the desired state, if not, set a new recovery target recovery_target_time
, then restart Postgres.
To check the current recovery_target_time:
$ psql -c "show recovery_target_time"
recovery_target_time
------------------------
2024-06-17 09:18:22+00
(1 row)
(1 row)
recovery_target_time
To set a new target:
psql -c "ALTER SYSTEM SET recovery_target_time TO '2024-06-17 09:30:00+00'"
This will modify the postgresql.auto.conf
:
$ cat /db/16/postgresql.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
restore_command = 'pgbackrest --stanza=standalone archive-get %f "%p"'
recovery_target_time = '2024-06-17 09:30:00+00'
Now, restart postgres:
$ /usr/lib/postgresql/16/bin/pg_ctl -D /db/16 restart
waiting for server to shut down.... done
server stopped
waiting for server to start....time=2024-06-18 17:13:37 UTC, pid=723863 LOG: redirecting log output to logging collector process
time=2024-06-18 17:13:37 UTC, pid=723863 HINT: Future log output will appear in directory "/db/log".
. done
server started
If data is not at the desired point, set a new recovery_target_time
and repeat previous steps.
pg_promote
When data is at the desired point, promote Postgres to primary:
$ psql -c "SELECT pg_promote();"
pg_promote
------------
t
(1 row)
And you can now stop postgres with:
/usr/lib/postgresql/16/bin/pg_ctl -D /db/16 stop
Start Postgresql
Start now using:
systemctl start postgresql.service
Patroni
Stop all nodes in your cluster and follow the restore procesure in node 1 after you are at the desired point promote the server and then start Patroni so that it can take the management over.
Archive
In postgresql.conf
check this options:
archive_command = 'pgbackrest --config=/etc/pgbackrest/pgbackrest.conf --stanza=<stanza_name> archive-push %p'
archive_mode = on
archive_timeout = 300
To archive manually switching WAL files ensures that the current WAL file is completed and archived promptly:
psql -c "select pg_switch_wal();"