PostgreSQL Checkpoint: Database blocks are temporarily stored in Database shared buffers. As blocks are read, they are stored in DB shared buffers so that if any user accesses them later, they are available in memory and need not be read from the disk.

When we update any row, the buffer in DB shared buffers corresponding to the block containing that row is updated in memory. Record of the change made is kept in wal buffer. On commit, the changes we made are written to the wal file on disk thereby making them permanent. If the instance crashed right then, the DB shared buffers will be wiped out but on restarting the database, PostgreSQL will apply the changes recorded in WAL files to the corresponding data files.

Why doesn’t PostgreSQL write the changes to data files right away when we commit the transaction?

The reason is simple. If it chose to write directly to the data files, it will have to physically locate the data block in the datafile first and then update it which means that after committing, a user has to wait until it finds and writes.

Moreover writing to datafile is not sequential whereas writing to WAL is sequential.
So wal writer process writes the changes to WAL file and later Checkpointer writes the corresponding dirty blocks to the respective data file.

wait, what?

checkpointer process?

Yes,

Dirty blocks are written into datafile by 3 different processes, Checkpointer process, writer process and user backend process.

We will limit our discussion here to CHECKPOINT only.

Table of Contents

What is a checkpoint?

A checkpoint is a synchronization event at a specific point in time which causes some or all dirty pages in the shared buffer to be written to disk.

But why?

In the event of a crash, the crash recovery procedure looks at the latest checkpoint record to determine the point in the Write ahead log from which it should start the REDO operation.

Confused?

Let’s understand the life cycle of a database change.

As we all know PostgreSQL keeps a cache of pages in RAM(shared_buffers) to make it available to all backends for faster access.

How does a change happen in a database?

STEP 1: Whenever PostgreSQL needs to change anything on a give a table, it checks if it is in shared_buffers, if yes, it modifies them directly, and if not – it loads the required pages into shared_buffers and then modifies them.

STEP 2: The modified pages are termed as dirty pages.

STEP 3: Upon commit, the changes we made are written to the Write ahead log thereby making them permanent on disk.

STEP 4: The dirty pages are flushed to the respective file to disk periodically.

But how often dirty pages are written to disk?

This is where checkpoint comes in to picture.

At checkpoint time, all dirty data pages are flushed to disk and a special checkpoint record is written to the log file. Any changes made to data files before that point are guaranteed to be already on disk.

If your system crashes, recovery from the crash will start from this new last point where the database was sure that everything on disk was consistent.

A checkpoint occurs in the following cases:

 When the administrator runs the CHECKPOINT statement.
 With the interval specified in parameter checkpoint_timeout(default 300 seconds)
 Amount of data written to the WAL has reached a parameter max_wal_size (default: 1GB).
 At the start of an online backup
 At the execution of pg_start_backup function
 At the execution of a pg_basebackup command
 At the shutdown of an instance(Except for the pg_ctl stop -m immediate command execution)
 At the time of database configuration such as CREATE DATABASE / DROP DATABASE statement

during a CHECKPOINT, the database needs to perform these three basic steps

1. Identify all the dirty pages in shared buffers.
2. Write the dirty pages to the respective files.
3. Issue fsync() on all the modified files to disk.

NOTE:The fsync() function is intended to force a physical write of data from the buffer cache, and to assure that after a system crash or other failure that all data up to the time of the fsync() call is recorded on the disk.

Checkpoint parameters in PostgreSQL:

1. checkpoint_timeout:

It is the maximum time between automatic WAL checkpoints(default is 5 min).Increasing this parameter can increase the amount of time needed for crash recovery

2. max_wal_size:

Maximum size to let the WAL grow to between automatic WAL checkpoints. The default is 1 GB. Increasing this parameter can increase the amount of time needed for crash recovery.

If we set both the parameters, checkpoint triggers whichever comes first.

3. min_wal_size:

As long as WAL disk usage stays below this setting, old WAL files are always recycled for future use at a checkpoint, rather than removed. This can be used to ensure that enough WAL space is reserved to handle spikes in WAL usage, for example when running large batch jobs. (default is 80 MB)

4. checkpoint_completion_target :

As Checkpoint occurs for every 5 min or for every max_wal_size threshold reached, during the checkpoint time all the dirty pages that are there in the shared buffers will be flushed to disk thereby causing huge IO.

checkpoint_completion_target comes here for rescue.

It makes the flushing speed as slower which means the PostgreSQL should take checkpoint_completion_target * checkpoint_timeout amount of time to write the data.

For example., if my checkpoint_completion_target is 0.5 and the database will throttle the writes so that the last write is done after 2.5 minutes.

More about the checkpoint_completion_target is discussed here

5. wal_buffers :

The amount of shared memory used for WAL data that has not yet been written to disk. The default setting of -1 selects a size equal to 1/32nd (about 3%) of shared_buffers, but not less than 64kB nor more than the size of one WAL segment, typically 16MB.

6. checkpoint_flush_after:

Whenever more than checkpoint_flush_after bytes have been written while performing a checkpoint, attempt to force the OS to issue these writes to the underlying storage. Doing so will limit the amount of dirty data in the kernel’s page cache, reducing the likelihood of stalls when a fsync is issued at the end of the checkpoint.
This setting may have no effect on some platforms.

7. checkpoint_warning: Write a message to the server log if checkpoints caused by the filling of checkpoint segment files happen closer together than this many seconds.

CASE STUDY on CHECKPOINT:

Let us do a case study on a table “checkpoint_example” to know about checkpoints.

We will first find the object ID for the checkpoint_eg table with the following simple commands.

[postgres@asristgdb ~]$ oid2name -d testdb -t checkpoint_example
From database "testdb":
Filenode Table Name
------------------------------
33428 checkpoint_example

[postgres@asristgdb ~]$ oid2name -d testdb -t checkpoint_example

From database "testdb":

Filenode Table Name

------------------------------

33428 checkpoint_example

[postgres@asristgdb ~]$

[postgres@asristgdb ~]$ oid2name

All databases:

Oid Database Name Tablespace
----------------------------------
16738 nhps pg_default
25044 pkdb pg_default
13269 postgres pg_default
13268 template0 pg_default
1 template1 pg_default
16749 test1 pg_default
16386 testdb pg_default
25042 tkdb pg_default
[postgres@asristgdb ~]$

[postgres@asristgdb ~]$

[postgres@asristgdb ~]$ oid2name

All databases:

Oid Database Name Tablespace

----------------------------------

16738 nhps pg_default

25044 pkdb pg_default

13269 postgres pg_default

13268 template0 pg_default

1 template1 pg_default

16749 test1 pg_default

16386 testdb pg_default

25042 tkdb pg_default

[postgres@asristgdb ~]$

Now, we will update the record into the table

testdb=# update checkpoint_example set city='CHENNAI' where city='HYDERABAD';
UPDATE 1
testdb=# commit;
WARNING: there is no transaction in progress
COMMIT

testdb=# update checkpoint_example set city='CHENNAI' where city='HYDERABAD';

UPDATE 1

testdb=# commit;

WARNING: there is no transaction in progress

COMMIT

On selecting the page for the status, it is found to be dirty

testdb=# SELECT reldatabase,relfilenode,isdirty FROM pg_buffercache WHERE relfilenode='33428';
reldatabase | relfilenode | isdirty
-------------+-------------+---------
16386 | 33428 | t
(1 row)

testdb=#

testdb=# SELECT reldatabase,relfilenode,isdirty FROM pg_buffercache WHERE relfilenode='33428';

reldatabase | relfilenode | isdirty

-------------+-------------+---------

16386 | 33428 | t

(1 row)

testdb=#

Here, the page becomes dirty after the change. Now, let’s issue checkpoint.

testdb=#
testdb=# checkpoint;
CHECKPOINT

testdb=# SELECT reldatabase,relfilenode,isdirty FROM pg_buffercache WHERE relfilenode='33428';
reldatabase | relfilenode | isdirty
-------------+-------------+---------
16386 | 33428 | f
(1 row)

testdb=#

testdb=# checkpoint;

CHECKPOINT

testdb=# SELECT reldatabase,relfilenode,isdirty FROM pg_buffercache WHERE relfilenode='33428';

reldatabase | relfilenode | isdirty

-------------+-------------+---------

16386 | 33428 | f

(1 row)

testdb=#

The alert recorded says.

LOG: checkpoint starting: immediate force wait
LOG: checkpoint complete: wrote 3 buffers (0.0%); 
0 transaction log file(s) added, 0 removed, 
0 recycled; write=0.016 s, sync=0.001 s, 
total=0.020 s; sync files=3, longest=0.001 s, 
average=0.000 s; distance=17 kB, estimate=14895 kB

LOG: checkpoint starting: immediate force wait

LOG: checkpoint complete: wrote 3 buffers (0.0%);

0 transaction log file(s) added, 0 removed,

0 recycled; write=0.016 s, sync=0.001 s,

total=0.020 s; sync files=3, longest=0.001 s,

average=0.000 s; distance=17 kB, estimate=14895 kB

Conclusion:

Checkpointer process writes dirty buffers from shared buffers to respective data file on periodic basis depending on the configuration.
checkpoint_timeout and max_wal_size are the two important one has to look for when configuring checkpoint

Words from postgreshelp

Thank you for giving your valuable time to read the above information. I hope the content served your purpose in reaching out the blog.
Suggestions for improvement of the blog are highly appreciable. Please contact us for any information/suggestion/feedback.

If you want to be updated with all our articles

please follow us on Facebook | Twitter
Please subscribe to our newsletter.

This Post Has 2 Comments

J September 23, 2020 Log in to Reply

Hi
Where can I find oid2name? On my installation (9.5.20) it is not there) neither when I am at the bash shell nor when using psql.
Thanks.
1. J September 24, 2020 Log in to Reply
  
  I found it – it was in /usr/bin.

PostgreSQL Checkpoint Demystified

What is a checkpoint?

But why?

How does a change happen in a database?

But how often dirty pages are written to disk?

A checkpoint occurs in the following cases:

Checkpoint parameters in PostgreSQL:

CASE STUDY on CHECKPOINT:

This Post Has 2 Comments

Leave a Reply Cancel reply

What is a checkpoint?

But why?

How does a change happen in a database?

But how often dirty pages are written to disk?

A checkpoint occurs in the following cases:

Checkpoint parameters in PostgreSQL:

CASE STUDY on CHECKPOINT:

You Might Also Like

A Comprehensive Guide: PostgreSQL Page Layout

Case study: Operating system kill signals on PostgreSQL

MMAP vs POSIX : PostgreSQL Dynamic Shared Memory

This Post Has 2 Comments

Leave a Reply Cancel reply