Table of Contents
PostgreSQL vacuum:Introduction:
PostgreSQL vacuum is an important topic in PostgreSQL database administration. Before we get into PostgreSQL vacuum we first have to understand MVCC architecture in PostgreSQL.
In a multi-user database management system, a transaction must support the following two properties
Data concurrency: means that many users can access data at the same time.
Data consistency: means that each user sees a consistent view of the data, including visible changes made by the user’s own transactions and transactions of other users.
PostgreSQL achieves the consistency with its MVCC architecture.
PostgreSQL MVCC:
MVCC stands for Multiversion Concurrency Control, which means that each SQL sees a snapshot of the data as it was some time ago, regardless of the current state of the underlying data. This prevents statements from viewing inconsistent data produced by concurrent transactions performing updates on the same data rows, providing transaction isolation for each database session.
For example., When you update/delete a record in a table, the record is multiplexed within the table and the old version will be available till the latest version of the table is either committed or rollback.
Once the record is committed or rollbacked the old version becomes obsolete or dead. The dead record or dead tuple is then cleaned by vacuuming the table by Autovacuum Daemon.
Let’s understand PostgreSQL vacuum with a case study.
Pre-requisites: To understand how vacuum works we have to install the following two extensions.
- pg_freespacemap
- pageinspect
1 2 3 4 5 6 7 8 |
testdb=# CREATE extension pg_freespacemap; CREATE EXTENSION testdb=# testdb=# testdb=# CREATE extension pageinspect; CREATE EXTENSION testdb=# testdb=# |
Case study: PostgreSQL Vacuum table example
Create an example table
1 2 3 |
testdb=# create table vac_table (id bigint, name character varying(10)); CREATE TABLE testdb=# |
We have created a table called vac_table.
Let’s insert some records into the table
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
testdb=# insert into vac_table values('1', 'record1'); INSERT 0 1 testdb=# insert into vac_table values('2', 'record2'); INSERT 0 1 testdb=# insert into vac_table values('3', 'record3'); INSERT 0 1 testdb=# insert into vac_table values('4', 'record4'); INSERT 0 1 testdb=# insert into vac_table values('5', 'record5'); INSERT 0 1 testdb=# testdb=# begin; BEGIN testdb=# insert into vac_table values('6', 'record6'); INSERT 0 1 testdb=# insert into vac_table values('7', 'record7'); INSERT 0 1 testdb=# insert into vac_table values('8', 'record8'); INSERT 0 1 testdb=# testdb=# commit; COMMIT |
Here, I have inserted records in two types, record 1 through 5 is with each insert in a separate transaction, and record 6 through 8 in a single transaction.
Let’s understand the table structure with system columns as well.
1 2 3 4 5 6 7 8 9 10 11 12 |
testdb=# SELECT xmin, xmax, * FROM vac_table; xmin | xmax | id | name ---------+------+----+--------- 2088709 | 0 | 1 | record1 2088710 | 0 | 2 | record2 2088711 | 0 | 3 | record3 2088712 | 0 | 4 | record4 2088713 | 0 | 5 | record5 2088714 | 0 | 6 | record6 2088714 | 0 | 7 | record7 2088714 | 0 | 8 | record8 (8 rows) |
Here,
XMIN is the identity (transaction ID) of the inserting transaction for this row version.
XMAX is the identity (transaction ID) of the deleting transaction or 0 for an undeleted row version
as you can see, the XMIN for first 5 records are different and for the 6,7,8th record, the transaction ID is the same.
XMAX for every record is 0 because the record has not undergone any update or delete.
Now, Let’s understand how an update or delete affects the table
a heap_page_items function is used to understand the MVCC architecture, we can get the view with pageinspect extension
So, before updating the record, we first check how our table is there
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
testdb=# select lp, t_xmin, t_xmax,t_ctid,t_data from heap_page_items(get_raw_page('vac_table', 0)); lp | t_xmin | t_xmax | t_ctid | t_data ----+---------+--------+--------+------------------------------------ 1 | 2088709 | 0 | (0,1) | \x0100000000000000117265636f726431 2 | 2088710 | 0 | (0,2) | \x0200000000000000117265636f726432 3 | 2088711 | 0 | (0,3) | \x0300000000000000117265636f726433 4 | 2088712 | 0 | (0,4) | \x0400000000000000117265636f726434 5 | 2088713 | 0 | (0,5) | \x0500000000000000117265636f726435 6 | 2088714 | 0 | (0,6) | \x0600000000000000117265636f726436 7 | 2088714 | 0 | (0,7) | \x0700000000000000117265636f726437 8 | 2088714 | 0 | (0,8) | \x0800000000000000117265636f726438 (8 rows) testdb=# |
Update an existing record in the table
1 2 3 |
testdb=# update vac_table set name='record9' where id='8'; UPDATE 1 testdb=# |
we now check the page structure with heap_page_items
1 2 3 4 5 6 7 8 9 10 11 12 13 |
testdb=# select lp, t_xmin, t_xmax,t_ctid,t_data from heap_page_items(get_raw_page('vac_table', 0)); lp | t_xmin | t_xmax | t_ctid | t_data ----+---------+---------+--------+------------------------------------ 1 | 2088709 | 0 | (0,1) | \x0100000000000000117265636f726431 2 | 2088710 | 0 | (0,2) | \x0200000000000000117265636f726432 3 | 2088711 | 0 | (0,3) | \x0300000000000000117265636f726433 4 | 2088712 | 0 | (0,4) | \x0400000000000000117265636f726434 5 | 2088713 | 0 | (0,5) | \x0500000000000000117265636f726435 6 | 2088714 | 0 | (0,6) | \x0600000000000000117265636f726436 7 | 2088714 | 0 | (0,7) | \x0700000000000000117265636f726437 8 | 2088714 | 2088715 | (0,9) | \x0800000000000000117265636f726438 9 | 2088715 | 0 | (0,9) | \x0800000000000000117265636f726439 (9 rows) |
Here, a new record is added to the table with t_xmin as “2088715” and for the previous t_xmin, a new t_xmax “2088715” is added.
t_xmax for the tuple changes only when it is deleted and dead, but I have updated, right?
why did it change?
We will discuss that here.
Let me update the record again and check the status
1 2 |
testdb=# update vac_table set name='record10' where id='8'; UPDATE 1 |
its status now is
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
testdb=# select lp, t_xmin, t_xmax,t_ctid,t_data from heap_page_items(get_raw_page('vac_table', 0)); lp | t_xmin | t_xmax | t_ctid | t_data ----+---------+---------+--------+-------------------------------------- 1 | 2088709 | 0 | (0,1) | \x0100000000000000117265636f726431 2 | 2088710 | 0 | (0,2) | \x0200000000000000117265636f726432 3 | 2088711 | 0 | (0,3) | \x0300000000000000117265636f726433 4 | 2088712 | 0 | (0,4) | \x0400000000000000117265636f726434 5 | 2088713 | 0 | (0,5) | \x0500000000000000117265636f726435 6 | 2088714 | 0 | (0,6) | \x0600000000000000117265636f726436 7 | 2088714 | 0 | (0,7) | \x0700000000000000117265636f726437 8 | 2088714 | 2088715 | (0,9) | \x0800000000000000117265636f726438 9 | 2088715 | 2088716 | (0,10) | \x0800000000000000117265636f726439 10 | 2088716 | 0 | (0,10) | \x0800000000000000137265636f72643130 (10 rows) |
Here, for every update, a new record is getting created and the old record is getting obsolete or dead.
Now, on our table, we have two dead tuples,with vacuum we can delete the dead tuples and release the space to the table.
1 2 |
testdb=# vacuum vac_table; VACUUM |
The status of the table has now changed to
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
testdb=# select lp, t_xmin, t_xmax,t_ctid,t_data from heap_page_items(get_raw_page('vac_table', 0)); lp | t_xmin | t_xmax | t_ctid | t_data ----+---------+--------+--------+-------------------------------------- 1 | 2088709 | 0 | (0,1) | \x0100000000000000117265636f726431 2 | 2088710 | 0 | (0,2) | \x0200000000000000117265636f726432 3 | 2088711 | 0 | (0,3) | \x0300000000000000117265636f726433 4 | 2088712 | 0 | (0,4) | \x0400000000000000117265636f726434 5 | 2088713 | 0 | (0,5) | \x0500000000000000117265636f726435 6 | 2088714 | 0 | (0,6) | \x0600000000000000117265636f726436 7 | 2088714 | 0 | (0,7) | \x0700000000000000117265636f726437 8 | | | | 9 | | | | 10 | 2088716 | 0 | (0,10) | \x0800000000000000137265636f72643130 (10 rows) |
Here, records 8 and 9 are deleted but are not given to the operating system and any other table cannot access this free space.
To release the space to the operating system we user vacuum full
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
testdb=# vacuum full vac_table; VACUUM testdb=# select lp, t_xmin, t_xmax,t_ctid,t_data from heap_page_items(get_raw_page('vac_table', 0)); lp | t_xmin | t_xmax | t_ctid | t_data ----+---------+--------+--------+------------------------------------ 1 | 2088709 | 0 | (0,1) | \x0100000000000000117265636f726431 2 | 2088710 | 0 | (0,2) | \x0200000000000000117265636f726432 3 | 2088711 | 0 | (0,3) | \x0300000000000000117265636f726433 4 | 2088712 | 0 | (0,4) | \x0400000000000000117265636f726434 5 | 2088713 | 0 | (0,5) | \x0500000000000000117265636f726435 6 | 2088714 | 0 | (0,6) | \x0600000000000000117265636f726436 7 | 2088714 | 0 | (0,7) | \x0700000000000000117265636f726437 8 | 2088719 | 0 | (0,8) | \x08000000000000000d7265633132 (8 rows) |
But vacuum full literally creates a new table by dropping the existing table which is quite cumbersome in production environment.
To avoid that we make use of vacuum tuning techniques.
But when does this vacuum happen?
vacuum can be automated using autovacuum and can be invoked manually.
we will discuss the autovacuum and its tuning techniques in detail in the next post.
Conclusion:
- “Vacuum table” clears the dead tuples and releases the space to the table.
- update a record in a table is nothing but delete the record and insert the record with a new value.
- t_xmax is 0 for a new insert tuple changes when you updated record.
Words from postgreshelp
Thank you for giving your valuable time to read the above information. I hope the content served your purpose in reaching out the blog.
Suggestions for improvement of the blog are highly appreciable. Please contact us for any information/suggestion/feedback.
If you want to be updated with all our articles
please follow us on Facebook | Twitter
Please subscribe to our newsletter.