PostgresHelp Production Digest #1: Planner Lies and VACUUM Myths

bi-weekly · for DBAs who can't afford downtime

This issue: one planner trick most DBAs have never used, one Monday morning query, one mistake that cost a DBA 6 hours of sleep, and three articles worth reading this week.

💾 Query of the Week

Catch autovacuum falling behind before it becomes an incident

Autovacuum doesn't tell you when it's losing. This query does. I run it every Monday morning — takes 2 seconds, has saved me from XID wraparound emergencies twice.

SELECT
  psu.schemaname || '.' || psu.relname  AS table_name,
  psu.n_dead_tup,
  age(pc.relfrozenxid)                  AS xid_age,
  2000000000 - age(pc.relfrozenxid)     AS xids_until_wraparound,
  psu.last_autovacuum
FROM pg_stat_user_tables psu
JOIN pg_class pc ON pc.oid = psu.relid
WHERE
  age(pc.relfrozenxid) > 500000000
  OR psu.n_dead_tup    > 1000000
  OR psu.last_autovacuum < now() - interval '7 days'
ORDER BY xid_age DESC;

-- xids_until_wraparound < 200M → start watching closely
-- xids_until_wraparound < 50M  → manual VACUUM FREEZE, now

Where to use this

Add it to your weekly DBA checklist. Alert at 200M XID age. XID wraparound has caused full-database emergency shutdowns at well-monitored companies — because this one number wasn't on anyone's dashboard.

⚡ Tip of the Week

Your planner is lying to you — and correlated columns are why

You filter on status = 'active' AND country = 'US'. The planner estimates 48% of rows match. The actual result is 75%. It chose a sequential scan on a 200GB table. Your query runs for 4 minutes instead of 8 seconds.

This isn't a bug. It's the planner assuming column independence — multiplying selectivities as if status and country have no relationship. On most real-world data, that assumption is wrong. Active users skew toward certain regions. High-value orders cluster by country. The planner has no idea unless you tell it.

-- Tell the planner these columns are correlated
CREATE STATISTICS stat_status_country (dependencies)
  ON status, country
  FROM orders;

-- Recalculate stats to activate it
ANALYZE orders;

-- Verify the extended stats were created
SELECT stxname, stxkeys, stxkind
FROM pg_statistic_ext;

-- Check if the planner is now using them
SELECT e.stxname, d.stxdependencies
FROM pg_statistic_ext e
JOIN pg_statistic_ext_data d ON e.oid = d.stxoid
WHERE e.stxname = 'stat_status_country';

Production Gotcha

Extended statistics are not created automatically — not by autovacuum, not by ANALYZE, not by anything. You have to identify the correlated columns yourself and create them manually. On a 6TB table with skewed status+country distribution, this single command took one of our worst queries from a 4-minute sequential scan to an 8-second index scan. The data hadn't changed. The schema hadn't changed. The planner just finally knew the truth.

→ Reproducible simulation on GitHub (PG18 output included)

Candidates to check in your schema: any column pair where one value implies another. order_status + payment_method. region + warehouse_id. user_type + subscription_plan. If you've ever wondered why the planner picked a terrible plan on a filtered query — this is often why.

🚫 Avoid This

VACUUM FULL is not "a stronger VACUUM"

Someone in our PostgreSQL community ran VACUUM FULL on a large orders table this week to "properly clean it up." They were still waiting hours later. The table was locked for reads and writes the entire time.

What VACUUM FULL actually does: acquires AccessExclusiveLock, rewrites the entire table to a new file on disk, and doubles your I/O temporarily. Duration depends on table size, disk speed, and server load — it can range from minutes to many hours. No reads. No writes. No graceful degradation while it runs.

Before reaching for VACUUM FULL, run this first:

-- Step 1: Check if you actually have a bloat problem worth fixing
SELECT
  relname,
  pg_size_pretty(pg_total_relation_size(relid)) AS size,
  round(100.0 * n_dead_tup / nullif(n_live_tup + n_dead_tup, 0), 1) AS dead_pct
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY dead_pct DESC;

-- Real output from pgbench test environment:
--      relname      |  size  | dead_pct
-- -----------------+--------+----------
--  pgbench_accounts | 299 MB |     50.0

50% bloat is not always a problem

Context matters. A 10MB table with 50% bloat wastes 5MB — autovacuum will handle it on its next pass, nothing to do. A 300MB table with 50% bloat is worth investigating but still not necessarily a reason to run VACUUM FULL. Ask: is this causing actual query slowdowns? Is autovacuum keeping up? Only reach for aggressive tooling when bloat is impacting performance and the table is large enough that the wasted space matters operationally.

-- Step 2: If bloat IS impacting performance on a large table,
-- use pg_repack — rebuilds online with no AccessExclusiveLock
-- pg_repack --table orders --jobs 2 mydb

-- Regular VACUUM first — often enough on its own
VACUUM (VERBOSE, ANALYZE) orders;

The real question before any bloat operation: is autovacuum configured correctly, or is it just falling behind? Fix the root cause first — aggressive bloat removal on top of a misconfigured autovacuum is treating the symptom, not the problem.

📚 This Week I Read

Three articles worth your time

Long transactions are silently killing your VACUUM

An interview with Masao Fujii, PostgreSQL committer at NTT Data, on why long transactions are more dangerous than most DBAs realise. The part that stuck with me: MVCC keeps dead tuple versions around as long as any transaction — even an idle one — holds an old snapshot. One forgotten connection left open overnight can prevent VACUUM from reclaiming anything on your busiest tables. His recommendation: set idle_in_transaction_session_timeout in production and treat it as a safety net, not a nice-to-have. I've added it to our standard server config after reading this. Worth 10 minutes of your time if you've ever been surprised by table bloat that autovacuum "should have caught."

→ Read the full article

PostgreSQL vs MongoDB for JSON — real benchmark numbers

A rigorous head-to-head benchmark across 17 test cases using JSONB in PostgreSQL 18 against MongoDB 7. The headline: PostgreSQL won 9 of 17. It dominated single-document reads, deletes, and array containment queries on smaller documents. MongoDB won on sorted pagination (4× faster) and aggregate queries on large complex documents. The storage difference was the most surprising finding — MongoDB stores data 2× more compactly thanks to built-in compression. My take: if your team is debating whether to add MongoDB for a JSON-heavy service, read this first. The answer is almost always "stay with PostgreSQL, add a GIN index."

→ Read the full article

pg_jitter: JIT compilation that actually works for OLTP

PostgreSQL's built-in JIT uses LLVM, which has millisecond-to-second compilation startup — making it useless for transactional workloads where queries run in microseconds. pg_jitter is a new alternative JIT provider using lighter backends (sljit, AsmJit, MIR) that compile in microseconds instead. It supports PG14–18 and could make JIT practical for mixed OLTP+analytics workloads for the first time. I haven't run it in production yet — it's too new to trust on critical systems — but it's on my staging list. Worth watching if you've ever disabled JIT because the overhead wasn't worth it.

→ Read the full article