PostgreSQL Autovacuum Not Running

Symptoms

You start getting complaints that queries are slower than usual. A nightly report that used to finish in 30 seconds is now taking five minutes. You log into sw-infrarunbook-01, open a psql session, and start poking around. Dead tuple counts are climbing fast. Table sizes on disk are ballooning even though row counts are stable. And when you check

pg_stat_user_tables

, the

last_autovacuum

column is either NULL or shows a timestamp from three days ago.

That's the classic signal. Autovacuum is supposed to be the silent janitor keeping your PostgreSQL tables tidy — reclaiming space from dead rows left behind by updates and deletes. When it stops working, garbage accumulates fast and it compounds: bloated tables mean longer sequential scans, longer scans mean slower queries, slower queries mean more connection stacking, and suddenly you're having an incident at 2am.

Specific symptoms you'll typically observe before things get critical:

High
n_dead_tup
values in
pg_stat_user_tables
last_autovacuum
is NULL or shows a stale timestamp hours or days old
Table file sizes on disk growing even when the application isn't inserting new rows net
Sequential scan costs increasing over time despite stable data volume
Warnings in the PostgreSQL log about transaction ID wraparound approaching
Bloat ratios from
pgstattuple
or
check_postgres
in the 40–80% range

Let's dig into every reason this happens and exactly what to do about each one.

Root Cause 1: Autovacuum Is Disabled

Why It Happens

This one sounds obvious, but I've seen it more times than I'd like to admit. Someone disabled autovacuum globally — usually during a bulk data load or schema migration to reduce I/O pressure — and never turned it back on. It also happens through configuration management drift: a Puppet or Ansible template gets out of sync and overwrites

postgresql.conf

with a stale version that has

autovacuum = off

. The change goes out with the next Ansible run, nobody notices, and autovacuum quietly disappears.

How to Identify It

Check the current runtime value directly, and more importantly check the source so you know where to fix it:

SELECT name, setting, source, sourcefile
FROM pg_settings
WHERE name = 'autovacuum';

Output when it's off and the source is a config file:

 name       | setting | source             | sourcefile
------------+---------+--------------------+----------------------------
 autovacuum | off     | configuration file | /etc/postgresql/16/main/postgresql.conf

Also check whether it was disabled via

ALTER SYSTEM

, which writes to

postgresql.auto.conf

and overrides whatever is in

postgresql.conf

-- If sourcefile points to postgresql.auto.conf, ALTER SYSTEM was used
SELECT name, setting, source, sourcefile
FROM pg_settings
WHERE name = 'autovacuum';

How to Fix It

Enable it immediately without a full restart using

ALTER SYSTEM

followed by a config reload:

ALTER SYSTEM SET autovacuum = on;
SELECT pg_reload_conf();

Verify it took effect right away:

SHOW autovacuum;
-- Expected: on

If your config management tooling is responsible, update the template immediately and lock in

autovacuum = on

explicitly. Don't rely on the default — document it. For the tables that missed autovacuum, kick off a manual vacuum to clear the backlog while autovacuum catches up:

-- Run for each bloated table
VACUUM ANALYZE public.orders;

Root Cause 2: Cost Delay Is Throttling Workers Into Uselessness

Why It Happens

Autovacuum has a built-in I/O throttle. Workers accumulate a "cost" as they read and write pages —

vacuum_cost_page_hit

(1 by default) for buffer cache hits,

vacuum_cost_page_miss

(10) for disk reads, and

vacuum_cost_page_dirty

(20) for pages that need to be written. Once a worker's accumulated cost hits

autovacuum_vacuum_cost_limit

, it sleeps for

autovacuum_vacuum_cost_delay

milliseconds before continuing.

This throttle is a good idea in principle — you don't want vacuum starving your application queries for disk bandwidth. The problem is when

autovacuum_vacuum_cost_delay

is set too high. The default in PostgreSQL 12 and earlier was 20ms. That sounds small, but on a busy table with tens of millions of dead tuples, a worker that sleeps for 20ms after every 200 cost units ends up spending the vast majority of its time sleeping rather than working. Autovacuum technically runs. It just makes almost no progress against the accumulation rate.

How to Identify It

Check the current cost settings:

SELECT name, setting, unit
FROM pg_settings
WHERE name IN (
  'autovacuum_vacuum_cost_delay',
  'autovacuum_vacuum_cost_limit',
  'vacuum_cost_page_hit',
  'vacuum_cost_page_miss',
  'vacuum_cost_page_dirty'
)
ORDER BY name;

An overly conservative configuration looks like this:

            name             | setting | unit
-----------------------------+---------+------
 autovacuum_vacuum_cost_delay | 20      | ms
 autovacuum_vacuum_cost_limit | 200     |
 vacuum_cost_page_dirty       | 20      |
 vacuum_cost_page_hit         | 1       |
 vacuum_cost_page_miss        | 10      |

A worker with those settings hits its cost limit after processing roughly 20 dirty pages, then sleeps for 20ms. That's a throughput ceiling of about 1,000 dirty pages per second — painfully slow for any seriously bloated table. Observe actual progress using

pg_stat_progress_vacuum

SELECT pid,
       relid::regclass AS table,
       phase,
       heap_blks_scanned,
       heap_blks_vacuumed,
       heap_blks_total
FROM pg_stat_progress_vacuum;

Run this query twice a minute apart. If

heap_blks_vacuumed

is barely moving, cost throttling is your problem.

How to Fix It

PostgreSQL 13 changed the default cost delay to 2ms — much more aggressive and appropriate for modern hardware. For older versions, or environments that haven't been updated, reduce it:

ALTER SYSTEM SET autovacuum_vacuum_cost_delay = 2;
SELECT pg_reload_conf();

Raise the cost limit so workers do more work per cycle before sleeping:

ALTER SYSTEM SET autovacuum_vacuum_cost_limit = 800;
SELECT pg_reload_conf();

On modern NVMe storage, 800 or even higher is fine. Spinning disks are more sensitive — test in staging first. For a single hot table that needs emergency relief, override at the table level without touching globals:

ALTER TABLE orders SET (
  autovacuum_vacuum_cost_delay = 2,
  autovacuum_vacuum_cost_limit = 800
);

Root Cause 3: The Table Is Explicitly Excluded from Autovacuum

Why It Happens

PostgreSQL allows you to disable autovacuum on a per-table basis using a storage parameter. This has legitimate uses — static lookup tables that never see writes, staging tables that get truncated and reloaded on a schedule. But it's a sharp tool. I've seen engineers set

autovacuum_enabled = false

on a table as a "temporary" measure during a migration, commit the DDL, and forget about it entirely. Six months later, that table is 80% dead tuples and nobody can figure out why autovacuum won't touch it despite being globally enabled.

How to Identify It

Query

pg_class

joined against its storage options to find excluded tables:

SELECT c.relname, opt.option_name, opt.option_value
FROM pg_class c,
     pg_options_to_table(c.reloptions) AS opt
WHERE opt.option_name = 'autovacuum_enabled'
  AND opt.option_value = 'false'
  AND c.relkind = 'r'
ORDER BY c.relname;

Example output:

   relname    | option_name        | option_value
--------------+--------------------+--------------
 audit_log    | autovacuum_enabled | false
 session_data | autovacuum_enabled | false

Cross-reference these against

pg_stat_user_tables

to see how bad the buildup is:

SELECT relname, n_dead_tup, n_live_tup,
       ROUND(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 1) AS dead_pct,
       last_autovacuum
FROM pg_stat_user_tables
WHERE relname IN ('audit_log', 'session_data');

How to Fix It

Re-enable autovacuum on the table by resetting the storage parameter:

ALTER TABLE audit_log RESET (autovacuum_enabled);
-- or explicitly set it:
ALTER TABLE audit_log SET (autovacuum_enabled = true);

Then run a manual vacuum immediately to clear the accumulated backlog rather than waiting for the next autovacuum cycle to pick it up:

VACUUM (VERBOSE, ANALYZE) audit_log;

If the table legitimately doesn't need autovacuum — say, it's purely insert-only and gets periodically truncated — document that decision in a comment on the table and ensure your operational runbooks include a manual vacuum step after each bulk reload. Never leave it as an implicit assumption.

Root Cause 4: Dead Tuples Aren't Hitting the Trigger Threshold

Why It Happens

Autovacuum doesn't fire on a table the moment it sees dead tuples. It fires when dead tuple count exceeds a threshold calculated as:

trigger = autovacuum_vacuum_threshold + (autovacuum_vacuum_scale_factor × n_live_tup)

The default scale factor is 0.2, meaning autovacuum kicks in when dead tuples exceed 20% of live row count plus a base of 50 rows. For small tables this is perfectly fine. For a table with 200 million rows, you're waiting until 40 million dead tuples accumulate before autovacuum even considers touching it. By then, bloat is severe, the cleanup run takes a long time, and in the meantime query performance has been quietly degrading for weeks.

The reverse also happens: someone cranks up the threshold and scale factor to reduce autovacuum frequency on a low-priority table, and forgets that the table later became high-write.

How to Identify It

Calculate the current trigger threshold for each table and compare it against actual dead tuple counts:

SELECT
  schemaname,
  relname,
  n_dead_tup,
  n_live_tup,
  ROUND(
    current_setting('autovacuum_vacuum_scale_factor')::numeric * n_live_tup
    + current_setting('autovacuum_vacuum_threshold')::numeric
  ) AS vacuum_trigger,
  CASE
    WHEN n_dead_tup > ROUND(
      current_setting('autovacuum_vacuum_scale_factor')::numeric * n_live_tup
      + current_setting('autovacuum_vacuum_threshold')::numeric
    ) THEN 'NEEDS VACUUM'
    ELSE 'ok'
  END AS status,
  last_autovacuum
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 20;

If a table shows high

n_dead_tup

but the status column says "ok", the threshold is simply set too high for this table's size. Also check for table-level overrides that might be making things worse:

SELECT c.relname, reloptions
FROM pg_class c
WHERE reloptions IS NOT NULL
  AND c.relkind = 'r'
ORDER BY c.relname;

How to Fix It

For large high-write tables, dramatically reduce the scale factor so autovacuum triggers much earlier:

ALTER TABLE events SET (
  autovacuum_vacuum_scale_factor = 0.01,
  autovacuum_vacuum_threshold = 1000
);

With a scale factor of 0.01, autovacuum fires when dead tuples exceed 1% of live rows. On a 50 million row table, that's triggering at 500,000 dead tuples instead of 10 million — cleanup runs are smaller, faster, and more frequent. It's a much better operating point.

If most of your tables are large, adjust the globals rather than setting per-table overrides everywhere:

ALTER SYSTEM SET autovacuum_vacuum_scale_factor = 0.05;
ALTER SYSTEM SET autovacuum_vacuum_threshold = 500;
SELECT pg_reload_conf();

Root Cause 5: Worker Count Is Too Low

Why It Happens

The default

autovacuum_max_workers

is 3. On a server running a single small database with moderate write load, that's adequate. On a server running multiple databases, dozens of actively-written tables, and high throughput — three workers get saturated easily. When all worker slots are occupied, tables that need vacuuming simply wait. If you have a handful of large, slow-to-vacuum tables, they'll hold all the worker slots for extended periods while smaller tables accumulate dead tuples unattended.

This problem is particularly common after adding new databases to an instance, scaling up write load seasonally, or completing a migration that introduces new high-write tables.

How to Identify It

Check how many workers are currently active and which tables they're working on:

SELECT pid, datname, left(query, 80) AS vacuum_target, state
FROM pg_stat_activity
WHERE query LIKE 'autovacuum:%'
ORDER BY pid;

Example output showing all three default workers fully occupied:

  pid  | datname  |              vacuum_target                    | state
-------+----------+-----------------------------------------------+--------
 21301 | appdb    | autovacuum: VACUUM public.events              | active
 21302 | appdb    | autovacuum: VACUUM public.orders              | active
 21303 | appdb    | autovacuum: VACUUM public.sessions            | active

If you consistently see exactly

autovacuum_max_workers

processes running and dead tuples are still climbing on other tables, you've hit the ceiling. Check the current setting and naptime together:

SELECT name, setting, unit
FROM pg_settings
WHERE name IN ('autovacuum_max_workers', 'autovacuum_naptime')
ORDER BY name;

How to Fix It

Increasing

autovacuum_max_workers

requires a PostgreSQL restart — it's a postmaster-level parameter that's allocated at startup. Set it via

ALTER SYSTEM

then restart:

ALTER SYSTEM SET autovacuum_max_workers = 6;

systemctl restart postgresql-16

Each worker consumes memory for the index scans it performs during vacuuming. Cap memory per worker to keep the total footprint predictable:

ALTER SYSTEM SET autovacuum_work_mem = '128MB';
SELECT pg_reload_conf();

On a server with 32GB RAM, six workers at 128MB each commits at most 768MB to autovacuum — entirely reasonable. You can also tighten the naptime so autovacuum checks for newly-bloated tables more frequently:

ALTER SYSTEM SET autovacuum_naptime = '30s';
SELECT pg_reload_conf();

The default naptime is 60 seconds. Halving it means autovacuum picks up newly-threshold tables twice as fast, which matters on high-write workloads where bloat can accumulate significantly in a minute.

Root Cause 6: Lock Conflicts Are Blocking Workers

Why It Happens

Autovacuum needs a

ShareUpdateExclusiveLock

to vacuum a table. This is a weak lock — it doesn't block reads or DML writes. But autovacuum workers won't wait indefinitely for it. They give up, log a cancellation message, and move on to the next table. If a table is constantly held under a conflicting lock by long-running transactions, autovacuum may never successfully complete a cycle on it.

The most common culprits are application bugs that leave transactions open across network calls or user interactions, idle-in-transaction connections from connection poolers that don't clean up properly, and long-running analytical queries that hold old transaction snapshots. That last one is subtle: even if a query doesn't hold an explicit lock on the table, an open transaction with an old snapshot prevents autovacuum from reclaiming rows that are visible to that snapshot.

How to Identify It

Check the PostgreSQL log for autovacuum cancellation messages:

grep -E 'autovacuum.*(cancel|lock|wait)' /var/log/postgresql/postgresql-*.log | tail -20

Typical output when a worker gets blocked:

2026-04-18 03:12:45 UTC [21301]: LOG: automatic vacuum of table "appdb.public.orders": index scans: 0
2026-04-18 03:12:45 UTC [21301]: DETAIL: canceled by autovacuum: too many dead tuples; will retry later
2026-04-18 03:14:01 UTC [21302]: LOG: autovacuum: found xmin 7893421 from client backend, not cleaning up

Find the long-running transactions responsible:

SELECT pid, usename, datname, state,
       now() - xact_start AS xact_age,
       now() - query_start AS query_age,
       left(query, 100) AS query_snippet
FROM pg_stat_activity
WHERE xact_start IS NOT NULL
  AND now() - xact_start > interval '5 minutes'
ORDER BY xact_start ASC;

How to Fix It

For idle-in-transaction connections that are clearly stale, terminate them:

SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'idle in transaction'
  AND now() - xact_start > interval '10 minutes';

More importantly, make this automatic going forward:

ALTER SYSTEM SET idle_in_transaction_session_timeout = '5min';
SELECT pg_reload_conf();

This kills any session that goes idle inside a transaction after five minutes — catching connection pooler bugs and application-level oversights before they accumulate into a vacuum-blocking problem. For analytical workloads that legitimately need long query times, use a separate connection pool or database role with a higher (or no) timeout, rather than letting the default apply to everything.

Root Cause 7: Autovacuum Is Running but Can't Keep Up

Why It Happens

Sometimes autovacuum is working correctly in every technical sense — it's enabled, workers are running, thresholds are being hit, no lock conflicts exist — but the write rate on a table simply outpaces what autovacuum can process. This is a capacity problem, not a misconfiguration. A single high-throughput table receiving thousands of updates per second generates dead tuples faster than one throttled autovacuum worker can reclaim them.

How to Identify It

Watch

n_dead_tup

trend over time and compare it against vacuum progress:

SELECT relname,
       n_dead_tup,
       n_tup_upd + n_tup_del AS dead_generating_ops,
       last_autovacuum
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC;

Then sample

pg_stat_progress_vacuum

twice, 60 seconds apart, and calculate blocks vacuumed per second. If the table's write rate generates more dead-tuple pages per second than that throughput figure, you're losing the race even though autovacuum is technically running.

How to Fix It

For the specific hot table, apply aggressive per-table settings that override the globals:

ALTER TABLE high_volume_events SET (
  autovacuum_vacuum_cost_delay = 0,
  autovacuum_vacuum_cost_limit = 2000,
  autovacuum_vacuum_scale_factor = 0.005
);

Setting

autovacuum_vacuum_cost_delay = 0

removes throttling entirely for this table's vacuum worker. Use this carefully — on shared I/O environments or spinning disks it can impact query latency. On dedicated NVMe storage it's typically fine. Pair this with a higher worker count and monitor whether vacuum throughput now exceeds the dead tuple generation rate.

Prevention

The most effective approach to autovacuum problems is catching them before they become incidents. Most monitoring stacks — Prometheus with postgres_exporter, Datadog's PostgreSQL integration, Telegraf — can scrape

pg_stat_user_tables

continuously. Set alerts on tables where

n_dead_tup

exceeds a threshold relative to

n_live_tup

, or where

last_autovacuum

is older than a configurable staleness window.

A practical alerting query you can run as a scheduled check or wire into a monitoring script on sw-infrarunbook-01:

SELECT schemaname || '.' || relname AS table_name,
       n_dead_tup,
       n_live_tup,
       ROUND(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 1) AS dead_pct,
       last_autovacuum,
       now() - last_autovacuum AS since_last_vacuum
FROM pg_stat_user_tables
WHERE n_dead_tup > 50000
   OR (
     last_autovacuum IS NOT NULL
     AND now() - last_autovacuum > interval '24 hours'
     AND n_live_tup > 10000
   )
ORDER BY n_dead_tup DESC;

Enable autovacuum logging so you have visibility into what it's actually doing in production. Set

log_autovacuum_min_duration

to capture any vacuum run that takes longer than a threshold:

ALTER SYSTEM SET log_autovacuum_min_duration = '1s';
SELECT pg_reload_conf();

This is the single most useful diagnostic tool for understanding autovacuum behavior over time. You'll see which tables are getting the most attention, how long cleanup runs are taking, and whether the pace is appropriate. If you see the same table appearing in the log every few minutes with a long duration, that's your cue to tune its per-table settings.

Make autovacuum configuration part of your table provisioning process. Any new table expected to receive high write volume should have appropriate

autovacuum_vacuum_scale_factor

and cost parameters set at creation time — not after bloat has already accumulated. Document these settings in your schema migrations alongside the table definition so future engineers understand the intent.

Review your autovacuum configuration any time you significantly increase write load, add new databases to an instance, or upgrade PostgreSQL major versions. Each release adjusts defaults and behavior; don't assume settings that worked well on PostgreSQL 12 are still optimal on 16. And whenever you're about to disable autovacuum for a maintenance window — which is sometimes the right call — put a calendar reminder to re-enable it immediately afterward. That single discipline eliminates the most common cause of autovacuum being silently off in production.

PostgreSQL Autovacuum Not Running

Symptoms

Root Cause 1: Autovacuum Is Disabled

Why It Happens

How to Identify It

How to Fix It

Root Cause 2: Cost Delay Is Throttling Workers Into Uselessness

Why It Happens

How to Identify It

How to Fix It

Root Cause 3: The Table Is Explicitly Excluded from Autovacuum

Why It Happens

How to Identify It

How to Fix It

Root Cause 4: Dead Tuples Aren't Hitting the Trigger Threshold

Why It Happens

How to Identify It

How to Fix It

Root Cause 5: Worker Count Is Too Low

Why It Happens

How to Identify It

How to Fix It

Root Cause 6: Lock Conflicts Are Blocking Workers

Why It Happens

How to Identify It

How to Fix It

Root Cause 7: Autovacuum Is Running but Can't Keep Up

Why It Happens

How to Identify It

How to Fix It

Prevention

Frequently Asked Questions

How do I check if autovacuum has run recently on a specific table?

Can I force autovacuum to run immediately on a specific table?

Why does autovacuum keep getting canceled on my table?

Does increasing autovacuum_max_workers require a PostgreSQL restart?

What is a healthy dead tuple percentage for a PostgreSQL table?

Related Articles