Media failure is one of the crucial things that the database administrator should be aware of. Media failure is nothing but a physical problem reading or writing to files on the storage medium.
A typical example of media failure is a disk head crash, which causes the loss of all files on a disk drive. All files associated with a database are vulnerable to a disk crash, including datafiles, wal files, and control files.
This is the comprehensive post which focuses on disk failure in PostgreSQL and the ways you can retrieve the data from PostgreSQL Database after failure(other than restoring the backup).
In this post, we are going to do archaeology on the below error and will understand how to solve the error.
WARNING: page verification failed, calculated checksum 21135 but expected 3252
ERROR: invalid page in block 0 of relation base/13455/16395
During the process, you are going to learn a whole new bunch of stuff in PostgreSQL.
Table of Contents
PostgreSQL Checksum: The definitive guide
- Chapter 1: What is a checksum?
- Chapter 2: PostgreSQL checksum: Practical implementation
- Chapter 3: How to resolve the PostgreSQL corrupted page issue?
With v9.3, PostgreSQL introduced a feature known as data checksums and it has undergone many changes since then. Now we have a well-sophisticated view in PostgreSQL v12 to find the checksums called pg_checksums.
But what is a PostgreSQL checksum?
When the checksum is enabled, a small integer checksum is written to each “page” of data that Postgres stores on your hard drive. Upon reading that block, the checksum value is recomputed and compared to the stored one.
This detects data corruption, which (without checksums) could be silently lurking in your database for a long time.
Good, checksum, when enabled, detects data corruption.
How does PostgreSQL checksum work?
PostgreSQL maintains page validity primarily on the way in and out of its buffer cache.
Also read: A comprehensive guide – PostgreSQL Caching
From here we understood that the PostgreSQL page has to pass through OS Cache before it leaves or enters into shared buffers. So page validity happens before leaving the shared buffers and before entering the shared buffers.
when PostgreSQL tries to copy the page into its buffer cache then it will (if possible) detect that something is wrong, and it will not allow page to enter into shared buffers with this invalid 8k page, and error out any queries that require this page for processing with the ERROR message
ERROR: invalid page in block 0 of relation base/13455/16395
If you already have a block with invalid data at disk-level and its page version at buffer level, during the next checkpoint, while page out, it will update invalid checksum details but which is rarely possible in real-time environments.
confused, bear with me.
And finally,
If the invalid byte is part of the PostgreSQL database buffer cache, then PostgreSQL will quite happily assume nothing is wrong and attempt to process the data on the page. Results are unpredictable; Some times you will get an error and sometimes you may end up with wrong data.
How PostgreSQL Checks Page Validity?
In a typical page, if data checksums are enabled, information is stored in a 2-byte field containing flag bits after the page header.
Also Read: A comprehensive guide on PostgreSQL: page header
This is followed by three 2-byte integer fields (pd_lower, pd_upper, and pd_special). These contain byte offsets from the page start to the start of unallocated space, to the end of unallocated space, and to the start of the special space.
The checksum value typically begins with zero and every time reading that block, the checksum value is recomputed and compared to the stored one. This detects data corruption.
Checksums are not maintained for blocks while they are in the shared buffers – so if you look at a buffer in the PostgreSQL page cache with pageinspect and you see a checksum value, note that when you do page inspect on a page which is already in the buffer, you may not get the actual checksum. The checksum is calculated and stamped onto the page when the page is written out of the buffer cache into the operating system page cache.
Also read: A comprehensive guide – PostgreSQL Caching
Let’s work on the practical understanding of whatever we learned so far.
I have a table check_corruption wherein I am going to do all the garbage work.
- my table size is 8 kB.
- has 5 records.
- the version I am using is PostgreSQL v12.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
postgres=# select * from check_corruption; aid | bid | abalance | filler -----+-----+----------+-------------------------------------------------------------------------------------- 1 | 1 | 0 | This is checksum example, checksum is for computing block corruption 2 | 1 | 0 | This is checksum example, checksum is for computing block corruption 3 | 1 | 0 | This is checksum example, checksum is for computing block corruption 4 | 1 | 0 | This is checksum example, checksum is for computing block corruption 5 | 1 | 0 | This is checksum example, checksum is for computing block corruption (5 rows) postgres=# SELECT * FROM page_header(get_raw_page('check_corruption',0)); lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid -----------+----------+-------+-------+-------+---------+----------+---------+----------- 0/17EFCA0 | 0 | 0 | 44 | 7552 | 8192 | 8192 | 4 | 0 (1 row) postgres=# \dt+ check_corruption List of relations Schema | Name | Type | Owner | Size | Description --------+------------------+-------+----------+------------+------------- public | check_corruption | table | postgres | 8192 bytes | (1 row) postgres=# select pg_relation_filepath('check_corruption'); pg_relation_filepath ---------------------- base/13455/16490 (1 row) postgres=# |
First, check whether the checksum is enabled or not?
[postgres@stagdb ~]$ pg_controldata -D /u01/pgsql/data | grep checksum
Data page checksum version: 0
[postgres@stagdb ~]$
It is disabled.
Let me enable the page checksum in PostgreSQL v12.
Syntax: pg_checksums -D /u01/pgsql/data –enable –progress –verbose
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
[postgres@stagdb ~]$ pg_checksums -D /u01/pgsql/data --enable --progress --verbose pg_checksums: checksums enabled in file "/u01/pgsql/data/global/2847" pg_checksums: checksums enabled in file "/u01/pgsql/data/global/1260_fsm" pg_checksums: checksums enabled in file "/u01/pgsql/data/global/4175" .. .. 23/23 MB (100%) computed Checksum operation completed Files scanned: 969 Blocks scanned: 3006 pg_checksums: syncing data directory pg_checksums: updating control file Data checksum version: 1 Checksums enabled in cluster |
Again, check the status of checksums in PostgreSQL
[postgres@stagdb ~]$
[postgres@stagdb ~]$ pg_controldata -D /u01/pgsql/data | grep checksum
Data page checksum version: 1
[postgres@stagdb ~]$
we can disable the checksums with –disable option
1 2 3 4 5 6 |
[postgres@stagdb ~]$ [postgres@stagdb ~]$ pg_checksums -D /u01/pgsql/data --disable pg_checksums: syncing data directory pg_checksums: updating control file Checksums disabled in cluster [postgres@stagdb ~]$ |
Let’s first check the current data directory for errors, then play with data.
To check the PostgreSQL page errors, we use the following command.
pg_checksums -D /u01/pgsql/data –check
[postgres@stagdb ~]$ pg_checksums -D /u01/pgsql/data –check
Checksum operation completed
Files scanned: 969
Blocks scanned: 3006
Bad checksums: 0
Data checksum version: 1
[postgres@stagdb ~]$
Warning!! Do not perform the below case study in your production machine.
As the table check_corruption data file is 16490, I am going to corrupt the file with the Operating system’s dd command.
dd bs=8192 count=1 seek=1 of=16490 if=16490
[postgres@stagdb 13455]$ dd bs=8192 count=1 seek=1 of=16490 if=16490
Now, log in and get the result
1 2 3 4 5 6 7 8 9 |
postgres=# select * from check_corruption; aid | bid | abalance | filler -----+-----+----------+-------------------------------------------------------------------------------------- 1 | 1 | 0 | This is checksum example, checksum is for computing block corruption 2 | 1 | 0 | This is checksum example, checksum is for computing block corruption 3 | 1 | 0 | This is checksum example, checksum is for computing block corruption 4 | 1 | 0 | This is checksum example, checksum is for computing block corruption 5 | 1 | 0 | This is checksum example, checksum is for computing block corruption (5 rows) |
I got the result, but why?
I got the result from shared buffers. Let me restart the cluster and fetch the same.
/usr/local/pgsql/bin/pg_ctl restart -D /u01/pgsql/data
1 2 3 4 5 6 7 8 9 |
postgres=# select * from check_corruption; aid | bid | abalance | filler -----+-----+----------+-------------------------------------------------------------------------------------- 1 | 1 | 0 | This is checksum example, checksum is for computing block corruption 2 | 1 | 0 | This is checksum example, checksum is for computing block corruption 3 | 1 | 0 | This is checksum example, checksum is for computing block corruption 4 | 1 | 0 | This is checksum example, checksum is for computing block corruption 5 | 1 | 0 | This is checksum example, checksum is for computing block corruption (5 rows) |
But again why?
As we discussed earlier, during restart my PostgreSQL has replaced error checksum with the value of shared buffer.
How can we trigger a checksum warning?
We need to get that row out of shared buffers. The quickest way to do so in this test scenario is to restart the database, then make sure we do not even look at (e.g. SELECT) the table before we make our on-disk modification. Once that is done, the checksum will fail and we will, as expected, receive a checksum error:
i.e., stop the server, corrupt the disk and start it.
- /usr/local/pgsql/bin/pg_ctl stop -D /u01/pgsql/data
- dd bs=8192 count=1 seek=1 of=16490 if=16490
- /usr/local/pgsql/bin/pg_ctl start -D /u01/pgsql/data
During the next fetch, I got below error
postgres=# select * from check_corruption;
2020-02-06 19:06:17.433 IST [25218] WARNING: page verification failed, calculated checksum 39428 but expected 39427
WARNING: page verification failed, calculated checksum 39428 but expected 39427
2020-02-06 19:06:17.434 IST [25218] ERROR: invalid page in block 1 of relation base/13455/16490
2020-02-06 19:06:17.434 IST [25218] STATEMENT: select * from check_corruption;
ERROR: invalid page in block 1 of relation base/13455/16490
Let us dig deeper into the issue and confirm that the block is corrupted
There are a couple of ways you can find the issue which includes Linux commands like
- dd
- od
- hexdump
Usind dd command : dd if=16490 bs=8192 count=1 skip=1 | od -A d -t x1z -w16 | head -1
[postgres@stagdb 13455]$ dd if=16490 bs=8192 count=1 skip=1 | od -A d -t x1z -w16 | head -2
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 4.5e-05 seconds, 182 MB/s
0000000 00 00 00 00 a0 fc 7e 01 03 9a 00 00 2c 00 80 1d >……~…..,…<
here,
00 00 00 00 a0 fc 7e 01 the first 8 bytes indicate pd_lsn and the next two bytes
03 9a indicates checksums.
Using hexdump : hexdump -C 16490 | head -1
[postgres@stagdb 13455]$ hexdump -C 16490 | head -1
00000000 00 00 00 00 a0 fc 7e 01 03 9a 00 00 2c 00 80 1d |……~…..,…|
[postgres@stagdb 13455]$
Both hexdump and dd returned same result.
Let’s understand what our PostgreSQL very own pg_checksums has to say?
command: pg_checksums -D /u01/pgsql/data –check
[postgres@stagdb 13455]$ pg_checksums -D /u01/pgsql/data –check
pg_checksums: error: checksum verification failed in file “/u01/pgsql/data/base/13455/16490”, block 1: calculated checksum 9A04 but block contains 9A03
Checksum operation completed
Files scanned: 968
Blocks scanned: 3013
Bad checksums: 1
Data checksum version: 1
[postgres@stagdb 13455]$
here, according to pg_checksums checksum 9A03 is matching with that of hexdump’s checksum 9A03.
Converting Hex 9A03 to decimals, I got 39427
which is matching the error
2020-02-06 19:06:17.433 IST [25218] WARNING: page verification failed, calculated checksum 39428 but expected 39427
How to resolve the PostgreSQL corrupted page issue?
use the below function to find the exact location where the page is corrupted.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
CREATE OR REPLACE FUNCTION find_bad_row(tableName TEXT) RETURNS tid as $find_bad_row$ DECLARE result tid; curs REFCURSOR; row1 RECORD; row2 RECORD; tabName TEXT; count BIGINT := 0; BEGIN SELECT reverse(split_part(reverse($1), '.', 1)) INTO tabName; OPEN curs FOR EXECUTE 'SELECT ctid FROM ' || tableName; count := 1; FETCH curs INTO row1; WHILE row1.ctid IS NOT NULL LOOP result = row1.ctid; count := count + 1; FETCH curs INTO row1; EXECUTE 'SELECT (each(hstore(' || tabName || '))).* FROM ' || tableName || ' WHERE ctid = $1' INTO row2 USING row1.ctid; IF count % 100000 = 0 THEN RAISE NOTICE 'rows processed: %', count; END IF; END LOOP; CLOSE curs; RETURN row1.ctid; EXCEPTION WHEN OTHERS THEN RAISE NOTICE 'LAST CTID: %', result; RAISE NOTICE '%: %', SQLSTATE, SQLERRM; RETURN result; END $find_bad_row$ LANGUAGE plpgsql; |
Now, using the function find_bad_row(), you can find the ctid of the corrupted location.
you need hstore extension to use the function
postgres=# CREATE EXTENSION hstore;
CREATE EXTENSION
postgres=#
postgres=# select find_bad_row(‘check_corruption’);
2020-02-06 19:44:24.227 IST [25929] WARNING: page verification failed, calculated checksum 39428 but expected 39427
2020-02-06 19:44:24.227 IST [25929] CONTEXT: PL/pgSQL function find_bad_row(text) line 21 at FETCH
WARNING: page verification failed, calculated checksum 39428 but expected 39427
NOTICE: LAST CTID: (0,5)
NOTICE: XX001: invalid page in block 1 of relation base/13455/16490
find_bad_row
————–
(0,5)
(1 row)
Deleting that particular CTID will resolve the issue
postgres=# delete from check_corruption where ctid='(0,6)’;
DELETE 1
postgres=#
If deleting ctid has not worked for you, you have an alternative solution which is setting zero_damaged_pages parameter.
Example.,
postgres=# select * from master;
WARNING: page verification failed, calculated checksum 8770 but expected 8769
ERROR: invalid page in block 1 of relation base/13455/16770
postgres=#
I can’t access the data from table master as block is corrupted.
Solution:
postgres=# SET zero_damaged_pages = on;
SET
postgres=# vacuum full master;
1 2 3 4 5 6 7 8 |
postgres=# select * from master; WARNING: page verification failed, calculated checksum 8770 but expected 8769 WARNING: invalid page in block 1 of relation base/13455/16770; zeroing out page id | name | city ----+---------+----------- 1 | Orson | hyderabad 2 | Colin | chennai 3 | Leonard | newyork |
here, it cleared the damaged page and gave the rest of the result.
What your document has to say?
zero_damaged_pages (boolean): Detection of a damaged page header normally causes PostgreSQL to report an error, aborting the current transaction. Setting zero_damaged_pages to on causes the system to instead report a warning, zero out the damaged page in memory, and continue processing. This behavior will destroy data, namely all the rows on the damaged page. However, it does allow you to get past the error and retrieve rows from any undamaged pages that might be present in the table. It is useful for recovering data if corruption has occurred due to a hardware or software error. You should generally not set this on until you have given up hope of recovering data from the damaged pages of a table. Zeroed-out pages are not forced to disk so it is recommended to recreate the table or the index before turning this parameter off again. The default setting is off, and it can only be changed by a superuser.
There are a couple of things to be aware when using this feature though. First, using checksums has a cost in performance as it introduces extra calculations for each data page (8kB by default), so be aware of the tradeoff between data security and performance when using it.
There are many factors that influence how much slower things are when checksums are enabled, including:
- How likely things are to be read from shared_buffers, which depends on how large shared_buffers is set, and how much of your active database fits inside of it
- How fast your server is in general, and how well it (and your compiler) are able to optimize the checksum calculation
- How many data pages you have (which can be influenced by your data types)
- How often you are writing new pages (via COPY, INSERT, or UPDATE)
- How often you are reading values (via SELECT)
The more that shared buffers are used (and using them efficiently is a good general goal), the less checksumming is done, and the less the impact of checksums on database performance will be. On an average if you enable checksum the performance cost would be more than 2% and for inserts, the average difference was 6%. For selects, that jumps to 19%. Complete computation benchmark test can be found here
Bonus
You can dump the content of the file with pg_filedump before and after the test and can use diff command to analyze data corruption
- pg_filedump -if 16770 > before_corrupt.txt
- corrupt the disk block
- pg_filedump -if 16770 > before_corrupt.txt
- diff or beyond compare both the files.
Thank you for giving your valuable time to read the above information. I hope the content served your purpose in reaching out to the blog.
Suggestions for improvement of the blog are highly appreciable. Please contact us for any information/suggestions/feedback.
If you want to be updated with all our articles
please follow us on Facebook | Twitter
Please subscribe to our newsletter.