PostgreSQL Autovacuum – Demystified

PostgreSQL Autovacuum

In the last post, we understood that PostgreSQL Vacuum helps in clearing the dead tuples in the table and releasing the space, but how often the vacuum happens on a table?PostgreSQL Autovacuum helps here!!

Vacuum can be initiated manually and it can be automated using the autovacuum daemon.

By default, autovacuum is enabled in PostgreSQL.

The purpose of autovacuum is to automate the execution of VACUUM and ANALYZE commands. When enabled, autovacuum checks for tables that have had a large number of inserted, updated or deleted tuples and then vacuum or analyze the table based on the threshold.

What is a threshold when autovacuum automatically analyzes or vacuums the table?

The threshold for vacuum is

The threshold for analyze is

Let’s understand the above parameters in details

Default value for

vacuum base threshold = 50

vacuum scale factor=0.2

i.e., if a table has 1000 records, the table is a candidate for autovacuum if it has 50+0.2*1000 records i.e., 250 changed.

check the autovaccum status of vac_table we query

SELECT schemaname,relname,last_autovacuum,last_autoanalyze FROM pg_stat_all_tables where schemaname =’testuser’ and relname=’vac_table’;

As my table was not a candidate for autovacuum, I made few changes and waited for sometime

its vacuumed automatically.

Log says,

Here, as mentioned earlier in the post, we will get a problem if the table size is too large, let’s say 1,00,000 records.

Its a candidate for autovacuum only if 50+0.2*100000 = 20050 records have been modified, that’s a huge right?

we can work on the issue with PostgreSQL autovacuum tuning parameters by setting them in PostgreSQL.conf file.

Query to check the tables and its dead tuples:

Reference : PostgreSQL 10 High Performance – Text

Tuning PostgreSQL Autovacuum

Most of the times Autovacuum is PostgreSQL decides when to vacuum your tables but it is equally important for us to see the tables and decide if the table needs autovacuum even if it is not a candidate for autovacuum.

Following are the parameters we need to look for when tuning autovacuum.

  • autovacuum_vacuum_threshold = 50
  • autovacuum_vacuum_scale_factor = 0.2
  • vacuum_cost_page_hit = 1
  • vacuum_cost_page_miss = 10
  • vacuum_cost_page_dirty = 20
  • autovacuum_vacuum_cost_delay = 20ms
  • autovacuum_vacuum_cost_limit = 200

As per the above example, if we change autovacuum_vacuum_scale_factor from 0.2 to 0.02 then it will become a candidate when 2050 records updated.

But these changes in PostgreSQL.conf affect all tables and it may undesirably affect cleanups of small tables, including for example system catalogs.

We can set a scale factor individually with the below commands.

How PostgreSQL autovacuum works?

Continuously autovacuum reads 8KB  pages of a table from disk and modifies/writes to the pages containing dead tuples. This is a resource(CPU and disk I/O) intensive operation and if a page read doesn’t contain any dead tuples it just ignores the page.

This is where Cost-based vacuuming comes into the picture.

The costing is based on defining cost for three basic operations:

vacuum_cost_page_hit: This means that the page is already there in the shared buffer, the default is 1.
vacuum_cost_page_miss: This means that the page has to be read from disk, default 10.
vacuum_cost_page_dirty: This means that the page read from disk has to be modified because of dead tuples, default 20.

Once the sum of costs has reached autovacuum_cost_limit (default 200 for autovacuum, disabled for manual VACUUM), the VACUUM process will sleep and do nothing for autovacuum_vacuum_cost_delay (default 20 ms).

With the default parameters, that means that autovacuum will at most write 4MB/s to disk, and read 8MB/s from disk or the OS page cache.

Takeaways:

Yeah, still there are many things to discuss, we almost covered the basics of autovacuum, we finally conclude with the takeaways from this two-part PostgreSQL: Vacuum and PostgreSQL Autovacuum.

1. Don’t disable autovacuum unless really important.
2. On high OLTP machines it is recommended to have a manual VACUUM on tables.
2. Lower autovacuum_vacuum_scale_factor on the databases with high transaction count.
3. Adjust the autovacuum_vacuum_scale_factor on tables with old, inactive data.
5. Adjust the throttling parameters depending on your hardware to increase the performance.

 

Thank you for giving your valuable time to read the above information. I hope the content served your purpose in reaching out the blog.
Suggestions for improvement of the blog are highly appreciable. Please contact us for any information/suggestion/feedback.

If you want to be updated with all our articles

please follow us on Facebook Twitter
Please subscribe to our newsletter.

 

This Post Has One Comment

  1. Ludovic

    Hello,
    In the very useful (thank you !) “Query to check the tables and its dead tuples”, I think there is a little mistake here :
    “date_trunc(‘minute’,greatest(pg_stat_get_last_analyze_time(C.oid), pg_stat_get_last_analyze_time(C.oid))) AS last_analyze”

    I think it should be “pg_stat_get_last_AUTOanalyze_time” the second time, as in the model above for vacuum/autovacuum
    “date_trunc(‘minute’,greatest(pg_stat_get_last_vacuum_time(C.oid), pg_stat_get_last_autovacuum_time(C.oid))) AS last_vacuum”

    And why date_trunc by minutes (after the greatest) ?
    In the final output, it prints with seconds anyhow : “2019-12-28 05:59:02”

Leave a Reply