Case study on table conflict issues during PostgreSQL vacuum

PostgreSQL Vacuum is a vast subject. There are many things that can be discussed in vacuuming. But in this post, I am going to touch something interesting.

Before we begin, I have a question for you.

Is vacuum table_name releases space to disk?

And if your answer is NO, then you are wrong.

The answer is MAYBE (terms and conditions apply).

Example.,

postgres=# vacuum pgbench_accounts;

In this post, we are going to understand

  • What had happened in the above example?
  • What else could happen?
  • What can I do to troubleshoot the issues on vacuuming pertaining to the above activity?

By the end of this post, you will be able to understand how to resolve

“pgbench_accounts”: stopping truncate due to conflicting lock request [AND]
“pgbench_accounts”: suspending truncate due to conflicting lock request

You can directly jump into the case study here, but I would recommend you go through some basic theory first to understand it better.

What is PostgreSQL Vacuum:

VACUUM is a garbage collection mechanism in PostgreSQL. It reclaims storage occupied by dead tuples. In normal PostgreSQL operation, tuples that are modified by an update/delete are not physically removed from their table; they remain present until a VACUUM is done.

There are many phases involved in the PostgreSQL Vacuum process of which the following 4 phases(scanning heap, vacuuming indexes, vacuuming heap, truncating heap) take out the major part of your vacuum time and you can monitor the process with pg_stat_progress_vacuum view.

From a broad perspective, the vacuum can be done in two ways.

  • Plain VACUUM (without FULL) simply reclaims space and makes it available for re-use.
  • VACUUM FULL which can reclaim space and gives it to the Operating system.

This vacuum process requires locks.

From here you can understand that PLAIN VACUUM acquires ShareUpdateExclusiveLock.

And From here you can understand that VACUUM FULL acquires AccessExclusiveLock

Both ShareUpdateExclusiveLock and AccessExclusiveLock has a different locking conflict throughout the process.

As soon as you run vacuum table_name; following sequence of things will happen

  1. scanning heap: scans the table and collects the TIDs of all the dead tuples.
  2. vacuuming indexes: Then it scans the indexes to remove all entries for those TIDs from the indexes.
  3. vacuuming heap: It then removes dead tuples from the table.
  4. truncating heap: Finally, it Specifies that VACUUMshould attempt to truncate off any empty pages at the end of the table and allow the disk space for the truncated pages to be returned to the operating system.

Here, for performing the first three phases, the vacuum process acquires ShareUpdateExclusiveLock lock, and for truncating, it acquires AccessExclusiveLock.

During the first three phases, my pg_locks output showed

And during truncate phase it is

What exactly happens at each of these phases?

Phase Description Lock Acquired
scanning heap VACUUM is currently scanning the heap. It will prune and defragment each page if required, and possibly perform freezing activity. The heap_blks_scanned column can be used to monitor the progress of the scan. ShareUpdateExclusiveLock
vacuuming indexes VACUUM is currently vacuuming the indexes. If a table has any indexes, this will happen at least once per vacuum, after the heap has been completely scanned. It may happen multiple times per vacuum if maintenance_work_mem is insufficient to store the number of dead tuples found. ShareUpdateExclusiveLock
vacuuming heap VACUUM is currently vacuuming the heap. Vacuuming the heap is distinct from scanning the heap, and occurs after each instance of vacuuming indexes. If heap_blks_scanned is less than heap_blks_total, the system will return to scanning the heap after this phase is completed; otherwise, it will begin cleaning up indexes after this phase is completed. ShareUpdateExclusiveLock
truncating heap VACUUM is currently truncating the heap so as to return empty pages at the end of the relation to the operating system. This occurs after cleaning up indexes. AccessExclusiveLock

Let us consider an example of table vacuum

postgres=# vacuum pgbench_accounts;

VACUUM

We can understand the phases by querying pg_stat_progress_vacuum

Phase 1

Phase 2

Phase 3

Phase 4

Here,

heap_tuples_scanned Number of heap tuples scanned. This counter only advances when the phase is seq scanning heap, index scanning heap or writing new heap.
heap_tuples_written Number of heap tuples written. This counter only advances when the phase is seq scanning heap, index scanning heap or writing new heap.
heap_blks_total Total number of heap blocks in the table. This number is reported as of the beginning of seq scanning heap.
heap_blks_scanned Number of heap blocks scanned. This counter only advances when the phase is seq scanning heap.
index_rebuild_count Number of indexes rebuilt. This counter only advances when the phase is rebuilding index.

What happens when another session try to access the table on which vacuum is running.

PostgreSQL Vacuum Conflicts: Case Study

Example 1, When vacuum initiated on already running “select table”

In session 1, I selected the entire table for an update.

In session 2, I tried to do vacuum

20200424-19:17:12 (SESSION 1) # vacuum analyze verbose pgbench_accounts;
INFO:  vacuuming “public.pgbench_accounts”
<FIRST THREE PHASES OUTPUT TRUNCATED>
INFO:  “pgbench_accounts”: stopping truncate due to conflicting lock request
INFO:  analyzing “public.pgbench_accounts”
INFO:  “pgbench_accounts”: scanned 30000 of 131148 pages, containing 0 live rows and 488 dead rows; 0 rows in sample, 0 estimated total rows

VACUUM

Example 2, When select initiated on already running “vacuum table”

I tried the case study with other way around. I first tried to vacuum then selected the same table before my vacuum entered truncate phase.

Session 1: initiate vacuum

While my session 1 has just entered the truncate phase, I selected the table from another session.

Session 2: Select on the same table

Session 1: My Vacuum process aborted with below error during truncate phase

INFO:  “pgbench_accounts”: suspending truncate due to conflicting lock request
INFO:  “pgbench_accounts”: truncated 196722 to 19904 pages
DETAIL:  CPU: user: 0.13 s, system: 1.77 s, elapsed: 1.96 s
INFO:  “pgbench_accounts”: truncated 19904 to 1 pages
DETAIL:  CPU: user: 0.15 s, system: 1.83 s, elapsed: 2.64 s
INFO:  analyzing “public.pgbench_accounts”
INFO:  “pgbench_accounts”: scanned 1 of 1 pages, containing 4 live rows and 0 dead rows; 4 rows in sample, 4 estimated total rows

VACUUM

What is the big deal here?

During the truncate phase, even if you do “vacuum table name;” it actually enters into vacuum full mode for few seconds to minutes depending on the number of pages your vacuum process truncates.

In a busy OLTP environment, the truncate phase may produce severe effects on your database.

When VACUUM tries to truncate the trailing empty pages, it scans shared_buffers to invalidate the pages-to-truncate during holding an AccessExclusive lock on the relation.

page invalidation is nothing but releasing all buffers of the table,

  • It must ensure that no dirty buffer is left outside the start of the range to invalidate
  • No I/O is underway against any of the blocks which are outside the truncation point.  Because the process is about to free (and possibly reuse) those blocks on-disk.

So if shared_buffers is huge, other transactions need to wait for a very long time before accessing to the relation which would cause the response-time spikes.

So what can I do if I get the below error often?

INFO:  “pgbench_accounts”: suspending truncate due to conflicting lock request
INFO:  “pgbench_accounts”: truncated 196722 to 19904 pages
DETAIL:  CPU: user: 0.13 s, system: 1.77 s, elapsed: 1.96 s

You can avoid the error by tweaking the changes in your vacuum parameters which I have discussed a few here

What changes have come in PostgreSQL 12 to address this issue?

PostgreSQL 12 Vacuum has come up with an extra option for the table to avoid truncating pages on a busy OLTP system with the vacuum_truncate command.

What documentation says,

TRUNCATE

Specifies that VACUUM should attempt to truncate off any empty pages at the end of the table and allow the disk space for the truncated pages to be returned to the operating system. This is normally the desired behavior and is the default unless the vacuum_truncate option has been set to false for the table to be vacuumed. Setting this option to false may be useful to avoid ACCESS EXCLUSIVE lock on the table that the truncation requires. This option is ignored if the FULL option is used.

Conclusion: The vacuum table would actually try to truncate the trailing empty pages of each table when you initiate the vacuum table, during the truncation process it acquires the exclusive lock and doesn’t allow the other sessions to do anything on the same table, this issue can be addressed with a parameter vacuum_truncate in PostgreSQL 12.

Learned something new?  Please rate the post below.

Words from postgreshelp

Thank you for giving your valuable time to read the above information. I hope the content served your purpose in reaching out the blog.
Suggestions for improvement of the blog are highly appreciable. Please contact us for any information/suggestions/feedback.

If you want to be updated with all our articles

please follow us on Facebook Twitter
Please subscribe to our newsletter.

Leave a Reply