“TIMELINES ARE THE DIVERGENT POINTS”
Let’s assume you did a point in time recovery after a wrong transaction, PostgreSQL branches to a new timeline and continue with the operations.
But what happens after you perform a point in time recovery, realized you made a mistake again?
That’s where recovery_target_timeline comes into picture.
In this post we are going to understand everything about recovery_target_timeline and timelines in PostgreSQL in general.
Table of Contents
PostgreSQL Timeline
Everytime you do a transaction in PostgreSQL the information is recorded in a wal file under $DATADIR/wal location.
The first file that is created is
000000010000000000000001
and when it is filled the next wal will be created with the name 000000010000000000000002 and so on.(It is a HEX notation and more information is beyond the scope of this post)
Here, the first eight digits represent PostgreSQL timeline.
In our example, the database cluster is in timeline 1.
After every point in time recovery, the timeline id will be increased by 1 and a new file called NewTimelineID.history is created.
recovery_target_timeline is a parameter which helps us to take our cluster to any timeline in the history provided a valid base backup and all the archivelogs in place.
Lets consider below example.,
I have initialized and started a new cluster with the below command
1 2 |
-bash-4.1$ initdb -D basebackup1 -bash-4.1$ pg_ctl start -D /u02/basebackup1 |
Then I created a table and inserted a record into it
1 2 3 4 5 6 7 8 9 10 11 |
postgres=# create table timeline(tid int, remarks varchar(1000)); CREATE TABLE postgres=# insert into timeline values('1','This is timeline id 1'); INSERT 0 1 postgres=# checkpoint; CHECKPOINT postgres=# select pg_switch_wal(); pg_switch_wal --------------- 0/15D4B70 (1 row) |
My record is somewhere in my wal 000000010000000000000001
After few switches, I have taken a full backup when my wal at 000000010000000000000005
1 2 3 4 5 6 7 8 9 10 |
-bash-4.1$ ls -rlt total 147460 -rw------- 1 postgres postgres 16777216 Nov 22 13:03 000000010000000000000001 -rw------- 1 postgres postgres 16777216 Nov 22 13:03 000000010000000000000002 -rw------- 1 postgres postgres 16777216 Nov 22 13:03 000000010000000000000003 -rw------- 1 postgres postgres 16777216 Nov 22 13:05 000000010000000000000004 -rw------- 1 postgres postgres 16777216 Nov 22 13:05 000000010000000000000005 -rw------- 1 postgres postgres 337 Nov 22 13:05 000000010000000000000005.00000028.backup -rw------- 1 postgres postgres 16777216 Nov 22 13:06 000000010000000000000006 -rw------- 1 postgres postgres 16777216 Nov 22 13:06 000000010000000000000007 |
Then I made a few switches and when my wal is at 000000010000000000000008
I inserted a new record.
1 2 3 4 |
postgres=# insert into timeline values('1','This is timeline id 1 after basebackup'); INSERT 0 1 postgres=# checkpoint; CHECKPOINT |
1 2 3 4 5 |
-bash-4.1$ pg_waldump 000000010000000000000008 | grep INSERT rmgr: Heap len (rec/tot): 54/ 214, tx: 487, lsn: 0/08000110, prev 0/080000D8, desc: <span style="color: #ff0000;">INSERT</span> off 2 flags 0x00, blkref #0: rel 1663/13530/16384 blk 0 FPW -bash-4.1$ |
Then I made few switches and my current archived wal location is something like this
1 2 3 4 5 6 7 8 9 10 11 12 13 |
-bash-4.1$ ls -rlt total 311308 -rw------- 1 16777216 Nov 22 13:03 000000010000000000000001 -rw------- 1 16777216 Nov 22 13:03 000000010000000000000002 -rw------- 1 16777216 Nov 22 13:03 000000010000000000000003 -rw------- 1 16777216 Nov 22 13:05 000000010000000000000004 -rw------- 1 16777216 Nov 22 13:05 000000010000000000000005 -rw------- 1 337 Nov 22 13:05 000000010000000000000005.00000028.backup -rw------- 1 16777216 Nov 22 13:06 000000010000000000000006 -rw------- 1 16777216 Nov 22 13:06 000000010000000000000007 -rw------- 1 16777216 Nov 22 13:07 000000010000000000000008 -rw------- 1 16777216 Nov 22 13:07 000000010000000000000009 -rw------- 1 16777216 Nov 22 13:09 00000001000000000000000A |
At this point of time, I have done a point in time recovery to some point in 000000010000000000000007 wal, before second insert.
so I gave recovery target lsn as ‘0/07000060’ in postgresql.conf file.
1 2 3 |
-bash-4.1$ cat postgresql.conf | grep lsn recovery_target_lsn = '0/07000060' # the WAL LSN up to which recovery will proceed -bash-4.1$ |
After successful point in time recovery, the PostgreSQL branched to a new timeline.
At the end of recovery the following things will happen
- End of recovery means the point where the the database opens up for writing
- New timeline is chosen
- A timeline history file is written
- The partial last WAL file on the previous timeline is copied with the new timeline’s ID
- A checkpoint record is written on the new timeline
alert log says,
1 2 3 4 5 6 7 8 9 10 |
LOG: starting point-in-time recovery to WAL location (LSN) "0/7000060" LOG: restored log file "000000010000000000000005" from archive LOG: redo starts at 0/5000028 LOG: consistent recovery state reached at 0/5000138 LOG: database system is ready to accept read only connections LOG: restored log file "000000010000000000000006" from archive LOG: restored log file "000000010000000000000007" from archive LOG: recovery stopping after WAL location (LSN) "0/7000060" LOG: pausing at the end of recovery HINT: Execute pg_wal_replay_resume() to promote. |
after executing pg_wal_replay_resume(), the postgresql changed to timeline 2 and the same information is recorded in archivelog location.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
-rw------- 1 postgres postgres 16777216 Nov 22 13:05 000000010000000000000005 -rw------- 1 postgres postgres 337 Nov 22 13:05 000000010000000000000005.00000028.backup -rw------- 1 postgres postgres 16777216 Nov 22 13:06 000000010000000000000006 -rw------- 1 postgres postgres 16777216 Nov 22 13:06 <span style="color: #ff0000;">000000010000000000000007</span> -rw------- 1 postgres postgres 16777216 Nov 22 13:07 <span style="color: #ff0000;">000000010000000000000008</span> -rw------- 1 postgres postgres 16777216 Nov 22 13:07 <span style="color: #ff0000;">000000010000000000000009</span> -rw------- 1 postgres postgres 16777216 Nov 22 13:09 <span style="color: #ff0000;">00000001000000000000000A</span> -rw-r----- 1 postgres postgres 33 Nov 22 13:12 <span style="color: #ff0000;">00000002.history</span> -rw-r----- 1 postgres postgres 16777216 Nov 22 13:13 <span style="color: #ff0000;">000000020000000000000007</span> -rw------- 1 postgres postgres 16777216 Nov 22 13:13 <span style="color: #ff0000;">000000020000000000000008 <span style="color: #000000;">-rw------- 1 postgres postgres 16777216 Nov 22 13:14 <span style="color: #ff0000;">000000020000000000000009</span></span> <span style="color: #000000;">-rw-r----- 1 postgres postgres 16777216 Nov 22 13:15 <span style="color: #ff0000;">00000002000000000000000A</span></span> </span> -bash-4.1$ |
Here, the PostgreSQL has branched to a new timeline at walfile 7 and started creating new walfiles with timeline id 2.
The 00000002.history file confirms that the PostgreSQL has branched to new timeline.
The history file is a small text file that read
1 2 |
-bash-4.1$ cat 00000002.history 1 0/70000D8 after LSN 0/7000060 |
Here
1<parentTLI> 0/70000D8 <switchpoint> after LSN 0/7000060<reason>
parentTLI ID of the parent timeline
switchpoint XLogRecPtr of the WAL location where the switch happened
reason human-readable explanation of why the timeline was changed
Now, I inserted one record at 00000002000000000000000A (0/A000060)
1 2 |
postgres=# insert into timeline values('2','This is timeline id 2 correct'); INSERT 0 1 |
and another record at 00000002000000000000000D (0/D000000)
1 2 |
postgres=# insert into timeline values('2','This is timeline id 2 wrong at 0/D000000'); INSERT 0 1 |
At this point of time I realized that I did a mistake at 00000002000000000000000D and has to rollback to 00000002000000000000000C of timeline 2.
This can be achieved by setting below parameters in postgresql.conf file
1 2 |
recovery_target_timeline = '2' recovery_target_lsn = '0/0C000060' |
After setting up above parameters, I started the cluster and the alert log says
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
LOG: database system was interrupted; last known up at 2020-11-22 13:05:01 IST LOG: restored log file "<span style="color: rgb(255, 0, 0);" data-mce-style="color: #ff0000;">00000002.history</span>" from archive cp: cannot stat `/u02/archivelogs/00000003.history': No such file or directory LOG: starting point-in-time recovery to WAL location (LSN) "0/C000060" LOG: restored log file "00000002.history" from archive LOG: restored log file "<span style="color: rgb(255, 0, 0);" data-mce-style="color: #ff0000;">000000010000000000000005</span>" from archive LOG: redo starts at 0/5000028 LOG: consistent recovery state reached at 0/5000138 LOG: database system is ready to accept read only connections LOG: restored log file "000000010000000000000006" from archive LOG: restored log file "000000020000000000000007" from archive LOG: restored log file "000000020000000000000008" from archive LOG: restored log file "000000020000000000000009" from archive LOG: restored log file "00000002000000000000000A" from archive LOG: restored log file "00000002000000000000000B" from archive LOG: restored log file "<span style="color: rgb(255, 0, 0);" data-mce-style="color: #ff0000;">00000002000000000000000C</span>" from archive LOG: recovery stopping after WAL location (LSN) "<span style="color: rgb(255, 0, 0);" data-mce-style="color: #ff0000;">0/C000060</span>" LOG: pausing at the end of recovery HINT: Execute pg_wal_replay_resume() to promote. .. LOG: redo done at 0/C000060 LOG: last completed transaction was at log time 2020-11-22 13:15:29.696929+05:30 <span style="color: #ff0000;">LOG: selected new timeline ID: 3</span> |
When I selected the table,
1 2 3 4 5 6 |
postgres=# select * from timeline; tid | remarks -----+------------------------------- 1 | This is timeline id 1 2 | This is timeline id 2 correct (2 rows) |
The history file will is recorded with below details
1 2 3 4 5 |
-bash-4.1$ cat 00000003.history 1 0/70000D8 after LSN 0/7000060 2 0/C0000D8 after LSN 0/C000060 |
Here
timeline 1 branched at 0/70000D8
timeline 2 branched at 0/C0000D8
and current timeline is going to be 3.
1 2 3 |
-rw-r----- 1 postgres postgres 16777216 Nov 22 13:16 00000002000000000000000F -rw-r----- 1 postgres postgres 67 Nov 22 15:59 00000003.history -rw-r----- 1 postgres postgres 16777216 Nov 22 16:05 00000003000000000000000C |
Active data guard in PostgreSQL:
Let us consider there is an active streaming replication in place (streaming replication configuration is not the part of this post)
The left hand side demonstrates primary and right hand side indicates that standby is in sync with primary.
During the cutover, the wal files diverged at wal file 16
1 2 3 4 5 6 7 8 9 |
-bash-4.1$ cd /u02/pgsql/16/pg_wal -bash-4.1$ ls -lrt total 163848 -rw------- 1 postgres postgres 16777216 Nov 27 14:35 00000002000000000000001B -rw------- 1 postgres postgres 16777216 Nov 27 14:39 00000002000000000000001E -rw------- 1 postgres postgres 16777216 Nov 27 14:40 000000010000000000000016.partial -rw------- 1 postgres postgres 42 Nov 27 14:40 00000002.history -rw------- 1 postgres postgres 16777216 Nov 27 14:42 00000002000000000000001C -rw------- 1 postgres postgres 16777216 Nov 27 14:42 000000020000000000000016 |
Here, from 16 onwards, both primary and standby started creating its own wal files.
I inserted few records in same table in both machines and have done few log switches.
Here, which indicates that I used my standby as an active data guard.
At this point of time, I have done with my testing, now I wanted to make my secondary machine as a standby for primary again.
How can I do that ?
Using divergent timeline, just rewind the standby cluster and set up replication again(create recovery.conf file).
- Stop standby cluster.
- use pg_rewind to revert the changes made after divergent.
pg_rewind –target-pgdata=/u02/pgsql/16 –source-server=”port=5432 user=postgres host=192.168.1.128″
1 2 3 4 |
-bash-4.1$ pg_rewind --target-pgdata=/u02/pgsql/16 --source-server="port=5432 user=postgres host=192.168.1.128" servers diverged at WAL location 0/16000310 on timeline 1 <span style="color: #ff0000;">rewinding from last common checkpoint at 0/16000268 on timeline 1</span> Done! |
- Create a recovery.conf file and place it in standby machine
1 2 3 4 |
standby_mode='on' primary_conninfo = 'user=postgres password=postgres host=192.168.1.128 port=5432 sslmode=disable sslcompression=1' restore_command='cp /u02/archivelogs/%f %p' trigger_file='/tmp/postgresql.trigger.5432' |
- start the standby cluster
pg_ctl start -D /u02/pgsql/16
alert log says,
1 2 3 4 5 6 7 8 9 10 11 |
2020-11-27 14:54:16 IST [15113]: [2-1] user=,db=,app=,client= HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. 2020-11-27 14:54:16 IST [15113]: [3-1] user=,db=,app=,client= LOG: entering standby mode 2020-11-27 14:54:16 IST [15113]: [4-1] user=,db=,app=,client= LOG: restored log file "000000010000000000000016" from archive 2020-11-27 14:54:16 IST [15113]: [5-1] user=,db=,app=,client= LOG: redo starts at 0/16000230 2020-11-27 14:54:16 IST [15113]: [6-1] user=,db=,app=,client= LOG: restored log file "000000010000000000000017" from archive 2020-11-27 14:54:16 IST [15113]: [7-1] user=,db=,app=,client= LOG: restored log file "000000010000000000000018" from archive 2020-11-27 14:54:16 IST [15113]: [8-1] user=,db=,app=,client= LOG: restored log file "000000010000000000000019" from archive cp: cannot stat `/u02/archivelogs/00000001000000000000001A': No such file or directory 2020-11-27 14:54:16 IST [15121]: [1-1] user=,db=,app=,client= LOG: started streaming WAL from primary at 0/1A000000 on timeline 1 2020-11-27 14:54:16 IST [15113]: [9-1] user=,db=,app=,client= LOG: consistent recovery state reached at 0/1A02E548 2020-11-27 14:54:16 IST [15111]: [6-1] user=,db=,app=,client= LOG: database system is ready to accept read only connections |
- Check if the recovery is getting continued from where it was stopped during divergent.
Records inserted after divergent are gone and my standby has started receiving data from primary.
How to resolve requested starting point XX on timeline x is not in this server’s history?
Let us conclude the timeline discussion with a final simulation.
From above output, it is confirmed that my standby started receiving data.
Here, I made divergence again and inserted a record
8 | simulate but in the history of the requested timeline
in old primary, at the same time inserted a record
‘8’,’simulate new record after divergent’
in new primary.
Ensured that both had different checkpoints after divergence.
Now when I tried to make old primary as standby to current primary by adding a recovery.conf file, I got below error when starting the cluster.
FATAL: could not start WAL streaming: ERROR: requested starting point 0/1D000000 on timeline 2 is not in this server’s history
DETAIL: This server’s history forked from timeline 2 at 0/1C0004C8.
The error is because, after divergence when I tried to make old primary as standby I need to ensure that all the changes made in old primary has to be reverted.
this can be achived with pg_rewind.
so I did pg_rewind and started my old primary cluster.
pg_rewind –target-pgdata=/u02/pgsql/15 –source-server=”port=5433 user=postgres host=192.168.1.128″
1 2 3 4 5 6 |
-bash-4.1$ pg_rewind --target-pgdata=/u02/pgsql/16 --source-server="port=5432 user=postgres host=192.168.1.128" <span style="color: #ff0000;">servers diverged at WAL location 0/1C0004C8</span> on timeline 2 rewinding from last common checkpoint at 0/1C000420 on timeline 2 -bash-4.1$ pg_ctl start -D /u02/pgsql/16 |
The database started with alert log
1 2 3 4 5 6 7 8 |
LOG: entering standby mode LOG: restored log file "00000003.history" from archive LOG: restored log file "00000002.history" from archive LOG: redo starts at 0/1C0003E8 LOG: invalid record length at 0/1C01C2D0: wanted 24, got 0 LOG: started streaming WAL from primary at 0/1C000000 on timeline 3 LOG: consistent recovery state reached at 0/1C01C308 LOG: database system is ready to accept read only connections |
Words from postgreshelp
Thank you for giving your valuable time to read the above information. I hope the content served your purpose in reaching out the blog.
Suggestions for improvement of the blog are highly appreciable. Please contact us for any information/suggestion/feedback.
If you want to be updated with all our articles
please follow us on Facebook | Twitter
Please subscribe to our newsletter.