PostgreSQL Vacuum – What you need to know?

PostgreSQL vacuum:Introduction:

PostgreSQL vacuum is an important topic in PostgreSQL database administration. Before we get into PostgreSQL vacuum we first have to understand MVCC architecture in PostgreSQL.

In a multi-user database management system, a transaction must support the following two properties

Data concurrency: means that many users can access data at the same time.

Data consistency: means that each user sees a consistent view of the data, including visible changes made by the user’s own transactions and transactions of other users.

PostgreSQL achieves the consistency with its MVCC architecture.

PostgreSQL MVCC:

MVCC stands for Multiversion Concurrency Control, which means that each SQL sees a snapshot of the data as it was some time ago, regardless of the current state of the underlying data. This prevents statements from viewing inconsistent data produced by concurrent transactions performing updates on the same data rows, providing transaction isolation for each database session.

For example., When you update/delete a record in a table, the record is multiplexed within the table and the old version will be available till the latest version of the table is either committed or rollback.

Once the record is committed or rollbacked the old version becomes obsolete or dead. The dead record or dead tuple is then cleaned by vacuuming the table by Autovacuum Daemon.

Let’s understand PostgreSQL vacuum with a case study.

Pre-requisites: To understand how vacuum works we have to install the following two extensions.

  1. pg_freespacemap
  2. pageinspect

Case study: PostgreSQL Vacuum table example

Create an example table

We have created a table called vac_table.

Let’s insert some records into the table

Here, I have inserted records in two types, record 1 through 5 is with each insert in a separate transaction, and record 6 through 8 in a single transaction.

Let’s understand the table structure with system columns as well.

Here,

XMIN is the identity (transaction ID) of the inserting transaction for this row version.

XMAX is the identity (transaction ID) of the deleting transaction or 0 for an undeleted row version

as you can see, the XMIN for first 5 records are different and for the 6,7,8th record, the transaction ID is the same.

XMAX for every record is 0 because the record has not undergone any update or delete.

Now, Let’s understand how an update or delete affects the table

a heap_page_items function is used to understand the MVCC architecture, we can get the view with pageinspect extension

So, before updating the record, we first check how our table is there

Update an existing record in the table

we now check the page structure with heap_page_items

Here, a new record is added to the table with t_xmin as “2088715” and for the previous t_xmin, a new t_xmax “2088715” is added.

t_xmax for the tuple changes only when it is deleted and dead, but I have updated, right?

why did it change?

We will discuss that here.

Let me update the record again and check the status

its status now is

Here, for every update, a new record is getting created and the old record is getting obsolete or dead.

Now, on our table, we have two dead tuples,with vacuum we can delete the dead tuples and release the space to the table.

The status of the table has now changed to

Here, records 8 and 9 are deleted but are not given to the operating system and any other table cannot access this free space.

vacuum table postgresql

To release the space to the operating system we user vacuum full

But vacuum full literally creates a new table by dropping the existing table which is quite cumbersome in production environment.

To avoid that we make use of vacuum tuning techniques.

But when does this vacuum happen?

vacuum can be automated using autovacuum and can be invoked manually.

we will discuss the autovacuum and its tuning techniques in detail in the next post.

Conclusion:

  1. “Vacuum table” clears the dead tuples and releases the space to the table.
  2. update a record in a table is nothing but delete the record and insert the record with a new value.
  3. t_xmax is 0 for a new insert tuple changes when you updated record.

Words from postgreshelp

Thank you for giving your valuable time to read the above information. I hope the content served your purpose in reaching out the blog.
Suggestions for improvement of the blog are highly appreciable. Please contact us for any information/suggestion/feedback.

If you want to be updated with all our articles

please follow us on Facebook Twitter
Please subscribe to our newsletter.

Leave a Reply