A Comprehensive Guide: PostgreSQL Page Layout

In this post we are going to understand PostgreSQL Page Layout in detail.

Bonus is coming up towards the end!!

Do I have ORACLE’s PCTFREE and PCTUSED kind of mechanism in PostgreSQL?

PostgreSQL Page Layout

Each data file in PostgreSQL is divided into PostgreSQL disk blocks of 8 Kb each.The blocks are numbered sequentially, 0 to 0xFFFFFFFE. A PostgreSQL disk page is a memory version of a PostgreSQL disk block.

Below image depicts the page format used within PostgreSQL tables and indexes.

Inside the data file (heap table and index, as well as the free space map and visibility map), it is divided into pages (or blocks) of fixed length, the default is 8192 byte (8 KB). Those pages within each file are numbered sequentially from 0, and such numbers are called as block numbers.

A page is full when nothing can be added between pd_lower and pd_upper. If the file has been filled up, PostgreSQL adds a new empty page to the end of the file to increase the file size.

We will keep the PageHeaderData understanding to later this post and we continue with linep1..N.

linp1..N form an ItemId (line pointer) array. ItemPointers point to a physical block number and a logical offset (line pointer number) within that block/page.

tuple1..N are added “backwards” on the page. Since an ItemPointer offset is used to access an ItemId entry rather than an actual byte-offset position, tuples can be physically shuffled on a page whenever the need arises. This indirection also keeps crash recovery relatively simple, because the low-level details of page space management can be controlled by standard buffer page code during logging, and during recovery.

pd_upper: starting position of the new record to be inserted.

pd_lower: maximum allowed location in to the page.

The final section is the “special section which can contain anything the access method wishes to store. For example, b-tree indexes store links to the page’s left and right siblings, as well as some other data relevant to the index structure. Ordinary tables do not use a special section at all.

Example.,

here, I just created a table with two records, lets understand its page structure with linux ‘od’ command.

It is confirmed that my first record ABC has gone to bottom of the page and the next DEF is coming towards pg_lower. The first couple of lines indicate PageHeaderData.

PageHeaderData Layout in PostgreSQL

pageheader information can be viewed with functions.To work with these functions we must install pageinspect extension.

Now, by querying page_header() we get page header data

lets breakdown the columns.

pd_lsn – identifies xlog record for last change to this page.
0/17F6E50 – Current WAL position

pd_checksum – page checksum, if set.

0 (because not set) If enabled, checksums are calculated for each data page. The detection of a checksum failure will cause an error when reading data and will abort the transaction currently running. So, this brings additional control for the detection of an I/O or hardware problem directly at the level of the database server.

pd_flags – flag bits.

#define PD_HAS_FREE_LINES 0x0001 /* are there any unused line pointers? */
#define PD_PAGE_FULL 0x0002 /* not enough free space for new tuple? */
#define PD_ALL_VISIBLE 0x0004 /* all tuples on page are visible to
* everyone */
#define PD_VALID_FLAG_BITS 0x0007 /* OR of all valid pd_flags bits */

pd_lower – offset to start of free space.

32

pd_upper – offset to end of free space.

8192

pd_special – offset to start of special space.

pd_pagesize_version – size in bytes and page layout version number.

since 7.3 page version and page size is clubbed.

pd_prune_xid – oldest XID among potentially prunable tuples on page.

any dead tuples?

PostgreSQL Data Structure in Page

To find the data structure in page we need to query heap_page_items()

Here

lp

A line pointer on a buffer page

lp_off

A line pointer offset

lp_flags

#define LP_UNUSED 0 /* unused (should always have lp_len=0) */

#define LP_NORMAL 1 /* used (should always have lp_len>0) */

#define LP_REDIRECT 2 /* HOT redirect (should have lp_len=0) */

#define LP_DEAD 3 /* dead, may or may not have storage */

lp_len

byte length of a tuple.

t_xmin

txid of insert command

t_xmax

txid of change command

t_field3

t_field3 is an Xvac identifier. This is an older feature that tracks the transaction id of the older VACUUM FULL command. t_field3 is blank in this case because there is not a transaction in progress.

t_ctid

A word about t_ctid: whenever a new tuple is stored on disk, its t_ctid is initialized with its own TID (location). If the tuple is ever updated,

its t_ctid is changed to point to the replacement version of the tuple.

To make minimal changes on disk PostgreSQL will write the new row, update thet_xmax of the original row, and write to the t_ctid field of the original.

t_infomask2 | t_infomask

Botht_infomask2 and t_infomask are flags and attributes which tells more about XMAX that is current state of the row.

t_hoff

The t_hoff field is the header offset – this tells us where we can find the beginning of data inside the row.

t_bits

t_bits is a variable length bitmap of null values.

Bonus

Do I have ORACLE’s PCTFREE and PCTUSED kind of mechanism in PostgreSQL?

FILLFACTOR attribute

When INSERT statement is executed, the tuple is added to the page. If executor cannot store the tuple into the working page anymore, it searches the next free page. FILLFACTOR is an attribute to indicate the percentage of the size in the page where tuples can be stored. The default value for FILLFACTOR is 100 (%). For this reason, tuples are normally stored in the page without gaps. FILLFACTOR can be also specified to the index.

The advantage of the decrease of FILL FACTOR from 100% is the performance improvement of the access in page unit, because free spaces are used in case of update using UPDATE statement. On the other hand, since the number of the pages used by the table to expand, I/O increases in case of reading entire table. For the table or index on which update is frequently performed is recommended to lower the value of the FILLFACTOR from the default value.

FILLFACTOR verification at the time of CREATE TABLE statement is executed

To set the FILLFACTOR when creating tables, it should be written to the WITH clause of the CREATE TABLE statement. To verify, refer the reloptions column of pg_class catalog.

Overall PostgreSQL Page Layout

Item Description
PageHeaderData 24 bytes long. Contains general information about the page, including free space pointers.
ItemIdData Array of item identifiers pointing to the actual items. Each entry is an (offset,length) pair. 4 bytes per item.
Free space The unallocated space. New item identifiers are allocated from the start of this area, new items from the end.
Items The actual items themselves.
Special space Index access method specific data. Different methods store different data. Empty in ordinary tables.

Overall PostgreSQL Page Header Layout

Field Type Length Description
pd_lsn PageXLogRecPtr 8 bytes LSN: next byte after last byte of WAL record for last change to this page
pd_checksum uint16 2 bytes Page checksum
pd_flags uint16 2 bytes Flag bits
pd_lower LocationIndex 2 bytes Offset to start of free space
pd_upper LocationIndex 2 bytes Offset to end of free space
pd_special LocationIndex 2 bytes Offset to start of special space
pd_pagesize_version uint16 2 bytes Page size and layout version number information
pd_prune_xid TransactionId 4 bytes Oldest unpruned XMAX on page, or zero if none

 

Thank you for giving your valuable time to read the above information. I hope the content served your purpose in reaching out to the blog.
Suggestions for improvement of the blog are highly appreciable. Please contact us for any information/suggestions/feedback.

If you want to be updated with all our articles

please follow us on Facebook Twitter
Please subscribe to our newsletter.

Leave a Reply