InfraRunBook
    Back to articles

    PostgreSQL Autovacuum Not Running

    Databases
    Published: Apr 18, 2026
    Updated: Apr 18, 2026

    Step-by-step troubleshooting guide for PostgreSQL autovacuum not running, covering seven root causes from disabled configuration to worker count limits with real diagnostic SQL and fixes.

    PostgreSQL Autovacuum Not Running

    Symptoms

    You start getting complaints that queries are slower than usual. A nightly report that used to finish in 30 seconds is now taking five minutes. You log into sw-infrarunbook-01, open a psql session, and start poking around. Dead tuple counts are climbing fast. Table sizes on disk are ballooning even though row counts are stable. And when you check

    pg_stat_user_tables
    , the
    last_autovacuum
    column is either NULL or shows a timestamp from three days ago.

    That's the classic signal. Autovacuum is supposed to be the silent janitor keeping your PostgreSQL tables tidy — reclaiming space from dead rows left behind by updates and deletes. When it stops working, garbage accumulates fast and it compounds: bloated tables mean longer sequential scans, longer scans mean slower queries, slower queries mean more connection stacking, and suddenly you're having an incident at 2am.

    Specific symptoms you'll typically observe before things get critical:

    • High
      n_dead_tup
      values in
      pg_stat_user_tables
    • last_autovacuum
      is NULL or shows a stale timestamp hours or days old
    • Table file sizes on disk growing even when the application isn't inserting new rows net
    • Sequential scan costs increasing over time despite stable data volume
    • Warnings in the PostgreSQL log about transaction ID wraparound approaching
    • Bloat ratios from
      pgstattuple
      or
      check_postgres
      in the 40–80% range

    Let's dig into every reason this happens and exactly what to do about each one.


    Root Cause 1: Autovacuum Is Disabled

    Why It Happens

    This one sounds obvious, but I've seen it more times than I'd like to admit. Someone disabled autovacuum globally — usually during a bulk data load or schema migration to reduce I/O pressure — and never turned it back on. It also happens through configuration management drift: a Puppet or Ansible template gets out of sync and overwrites

    postgresql.conf
    with a stale version that has
    autovacuum = off
    . The change goes out with the next Ansible run, nobody notices, and autovacuum quietly disappears.

    How to Identify It

    Check the current runtime value directly, and more importantly check the source so you know where to fix it:

    SELECT name, setting, source, sourcefile
    FROM pg_settings
    WHERE name = 'autovacuum';

    Output when it's off and the source is a config file:

     name       | setting | source             | sourcefile
    ------------+---------+--------------------+----------------------------
     autovacuum | off     | configuration file | /etc/postgresql/16/main/postgresql.conf

    Also check whether it was disabled via

    ALTER SYSTEM
    , which writes to
    postgresql.auto.conf
    and overrides whatever is in
    postgresql.conf
    :

    -- If sourcefile points to postgresql.auto.conf, ALTER SYSTEM was used
    SELECT name, setting, source, sourcefile
    FROM pg_settings
    WHERE name = 'autovacuum';

    How to Fix It

    Enable it immediately without a full restart using

    ALTER SYSTEM
    followed by a config reload:

    ALTER SYSTEM SET autovacuum = on;
    SELECT pg_reload_conf();

    Verify it took effect right away:

    SHOW autovacuum;
    -- Expected: on

    If your config management tooling is responsible, update the template immediately and lock in

    autovacuum = on
    explicitly. Don't rely on the default — document it. For the tables that missed autovacuum, kick off a manual vacuum to clear the backlog while autovacuum catches up:

    -- Run for each bloated table
    VACUUM ANALYZE public.orders;

    Root Cause 2: Cost Delay Is Throttling Workers Into Uselessness

    Why It Happens

    Autovacuum has a built-in I/O throttle. Workers accumulate a "cost" as they read and write pages —

    vacuum_cost_page_hit
    (1 by default) for buffer cache hits,
    vacuum_cost_page_miss
    (10) for disk reads, and
    vacuum_cost_page_dirty
    (20) for pages that need to be written. Once a worker's accumulated cost hits
    autovacuum_vacuum_cost_limit
    , it sleeps for
    autovacuum_vacuum_cost_delay
    milliseconds before continuing.

    This throttle is a good idea in principle — you don't want vacuum starving your application queries for disk bandwidth. The problem is when

    autovacuum_vacuum_cost_delay
    is set too high. The default in PostgreSQL 12 and earlier was 20ms. That sounds small, but on a busy table with tens of millions of dead tuples, a worker that sleeps for 20ms after every 200 cost units ends up spending the vast majority of its time sleeping rather than working. Autovacuum technically runs. It just makes almost no progress against the accumulation rate.

    How to Identify It

    Check the current cost settings:

    SELECT name, setting, unit
    FROM pg_settings
    WHERE name IN (
      'autovacuum_vacuum_cost_delay',
      'autovacuum_vacuum_cost_limit',
      'vacuum_cost_page_hit',
      'vacuum_cost_page_miss',
      'vacuum_cost_page_dirty'
    )
    ORDER BY name;

    An overly conservative configuration looks like this:

                name             | setting | unit
    -----------------------------+---------+------
     autovacuum_vacuum_cost_delay | 20      | ms
     autovacuum_vacuum_cost_limit | 200     |
     vacuum_cost_page_dirty       | 20      |
     vacuum_cost_page_hit         | 1       |
     vacuum_cost_page_miss        | 10      |

    A worker with those settings hits its cost limit after processing roughly 20 dirty pages, then sleeps for 20ms. That's a throughput ceiling of about 1,000 dirty pages per second — painfully slow for any seriously bloated table. Observe actual progress using

    pg_stat_progress_vacuum
    :

    SELECT pid,
           relid::regclass AS table,
           phase,
           heap_blks_scanned,
           heap_blks_vacuumed,
           heap_blks_total
    FROM pg_stat_progress_vacuum;

    Run this query twice a minute apart. If

    heap_blks_vacuumed
    is barely moving, cost throttling is your problem.

    How to Fix It

    PostgreSQL 13 changed the default cost delay to 2ms — much more aggressive and appropriate for modern hardware. For older versions, or environments that haven't been updated, reduce it:

    ALTER SYSTEM SET autovacuum_vacuum_cost_delay = 2;
    SELECT pg_reload_conf();

    Raise the cost limit so workers do more work per cycle before sleeping:

    ALTER SYSTEM SET autovacuum_vacuum_cost_limit = 800;
    SELECT pg_reload_conf();

    On modern NVMe storage, 800 or even higher is fine. Spinning disks are more sensitive — test in staging first. For a single hot table that needs emergency relief, override at the table level without touching globals:

    ALTER TABLE orders SET (
      autovacuum_vacuum_cost_delay = 2,
      autovacuum_vacuum_cost_limit = 800
    );

    Root Cause 3: The Table Is Explicitly Excluded from Autovacuum

    Why It Happens

    PostgreSQL allows you to disable autovacuum on a per-table basis using a storage parameter. This has legitimate uses — static lookup tables that never see writes, staging tables that get truncated and reloaded on a schedule. But it's a sharp tool. I've seen engineers set

    autovacuum_enabled = false
    on a table as a "temporary" measure during a migration, commit the DDL, and forget about it entirely. Six months later, that table is 80% dead tuples and nobody can figure out why autovacuum won't touch it despite being globally enabled.

    How to Identify It

    Query

    pg_class
    joined against its storage options to find excluded tables:

    SELECT c.relname, opt.option_name, opt.option_value
    FROM pg_class c,
         pg_options_to_table(c.reloptions) AS opt
    WHERE opt.option_name = 'autovacuum_enabled'
      AND opt.option_value = 'false'
      AND c.relkind = 'r'
    ORDER BY c.relname;

    Example output:

       relname    | option_name        | option_value
    --------------+--------------------+--------------
     audit_log    | autovacuum_enabled | false
     session_data | autovacuum_enabled | false

    Cross-reference these against

    pg_stat_user_tables
    to see how bad the buildup is:

    SELECT relname, n_dead_tup, n_live_tup,
           ROUND(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 1) AS dead_pct,
           last_autovacuum
    FROM pg_stat_user_tables
    WHERE relname IN ('audit_log', 'session_data');

    How to Fix It

    Re-enable autovacuum on the table by resetting the storage parameter:

    ALTER TABLE audit_log RESET (autovacuum_enabled);
    -- or explicitly set it:
    ALTER TABLE audit_log SET (autovacuum_enabled = true);

    Then run a manual vacuum immediately to clear the accumulated backlog rather than waiting for the next autovacuum cycle to pick it up:

    VACUUM (VERBOSE, ANALYZE) audit_log;

    If the table legitimately doesn't need autovacuum — say, it's purely insert-only and gets periodically truncated — document that decision in a comment on the table and ensure your operational runbooks include a manual vacuum step after each bulk reload. Never leave it as an implicit assumption.


    Root Cause 4: Dead Tuples Aren't Hitting the Trigger Threshold

    Why It Happens

    Autovacuum doesn't fire on a table the moment it sees dead tuples. It fires when dead tuple count exceeds a threshold calculated as:

    trigger = autovacuum_vacuum_threshold + (autovacuum_vacuum_scale_factor × n_live_tup)

    The default scale factor is 0.2, meaning autovacuum kicks in when dead tuples exceed 20% of live row count plus a base of 50 rows. For small tables this is perfectly fine. For a table with 200 million rows, you're waiting until 40 million dead tuples accumulate before autovacuum even considers touching it. By then, bloat is severe, the cleanup run takes a long time, and in the meantime query performance has been quietly degrading for weeks.

    The reverse also happens: someone cranks up the threshold and scale factor to reduce autovacuum frequency on a low-priority table, and forgets that the table later became high-write.

    How to Identify It

    Calculate the current trigger threshold for each table and compare it against actual dead tuple counts:

    SELECT
      schemaname,
      relname,
      n_dead_tup,
      n_live_tup,
      ROUND(
        current_setting('autovacuum_vacuum_scale_factor')::numeric * n_live_tup
        + current_setting('autovacuum_vacuum_threshold')::numeric
      ) AS vacuum_trigger,
      CASE
        WHEN n_dead_tup > ROUND(
          current_setting('autovacuum_vacuum_scale_factor')::numeric * n_live_tup
          + current_setting('autovacuum_vacuum_threshold')::numeric
        ) THEN 'NEEDS VACUUM'
        ELSE 'ok'
      END AS status,
      last_autovacuum
    FROM pg_stat_user_tables
    ORDER BY n_dead_tup DESC
    LIMIT 20;

    If a table shows high

    n_dead_tup
    but the status column says "ok", the threshold is simply set too high for this table's size. Also check for table-level overrides that might be making things worse:

    SELECT c.relname, reloptions
    FROM pg_class c
    WHERE reloptions IS NOT NULL
      AND c.relkind = 'r'
    ORDER BY c.relname;

    How to Fix It

    For large high-write tables, dramatically reduce the scale factor so autovacuum triggers much earlier:

    ALTER TABLE events SET (
      autovacuum_vacuum_scale_factor = 0.01,
      autovacuum_vacuum_threshold = 1000
    );

    With a scale factor of 0.01, autovacuum fires when dead tuples exceed 1% of live rows. On a 50 million row table, that's triggering at 500,000 dead tuples instead of 10 million — cleanup runs are smaller, faster, and more frequent. It's a much better operating point.

    If most of your tables are large, adjust the globals rather than setting per-table overrides everywhere:

    ALTER SYSTEM SET autovacuum_vacuum_scale_factor = 0.05;
    ALTER SYSTEM SET autovacuum_vacuum_threshold = 500;
    SELECT pg_reload_conf();

    Root Cause 5: Worker Count Is Too Low

    Why It Happens

    The default

    autovacuum_max_workers
    is 3. On a server running a single small database with moderate write load, that's adequate. On a server running multiple databases, dozens of actively-written tables, and high throughput — three workers get saturated easily. When all worker slots are occupied, tables that need vacuuming simply wait. If you have a handful of large, slow-to-vacuum tables, they'll hold all the worker slots for extended periods while smaller tables accumulate dead tuples unattended.

    This problem is particularly common after adding new databases to an instance, scaling up write load seasonally, or completing a migration that introduces new high-write tables.

    How to Identify It

    Check how many workers are currently active and which tables they're working on:

    SELECT pid, datname, left(query, 80) AS vacuum_target, state
    FROM pg_stat_activity
    WHERE query LIKE 'autovacuum:%'
    ORDER BY pid;

    Example output showing all three default workers fully occupied:

      pid  | datname  |              vacuum_target                    | state
    -------+----------+-----------------------------------------------+--------
     21301 | appdb    | autovacuum: VACUUM public.events              | active
     21302 | appdb    | autovacuum: VACUUM public.orders              | active
     21303 | appdb    | autovacuum: VACUUM public.sessions            | active

    If you consistently see exactly

    autovacuum_max_workers
    processes running and dead tuples are still climbing on other tables, you've hit the ceiling. Check the current setting and naptime together:

    SELECT name, setting, unit
    FROM pg_settings
    WHERE name IN ('autovacuum_max_workers', 'autovacuum_naptime')
    ORDER BY name;

    How to Fix It

    Increasing

    autovacuum_max_workers
    requires a PostgreSQL restart — it's a postmaster-level parameter that's allocated at startup. Set it via
    ALTER SYSTEM
    then restart:

    ALTER SYSTEM SET autovacuum_max_workers = 6;
    systemctl restart postgresql-16

    Each worker consumes memory for the index scans it performs during vacuuming. Cap memory per worker to keep the total footprint predictable:

    ALTER SYSTEM SET autovacuum_work_mem = '128MB';
    SELECT pg_reload_conf();

    On a server with 32GB RAM, six workers at 128MB each commits at most 768MB to autovacuum — entirely reasonable. You can also tighten the naptime so autovacuum checks for newly-bloated tables more frequently:

    ALTER SYSTEM SET autovacuum_naptime = '30s';
    SELECT pg_reload_conf();

    The default naptime is 60 seconds. Halving it means autovacuum picks up newly-threshold tables twice as fast, which matters on high-write workloads where bloat can accumulate significantly in a minute.


    Root Cause 6: Lock Conflicts Are Blocking Workers

    Why It Happens

    Autovacuum needs a

    ShareUpdateExclusiveLock
    to vacuum a table. This is a weak lock — it doesn't block reads or DML writes. But autovacuum workers won't wait indefinitely for it. They give up, log a cancellation message, and move on to the next table. If a table is constantly held under a conflicting lock by long-running transactions, autovacuum may never successfully complete a cycle on it.

    The most common culprits are application bugs that leave transactions open across network calls or user interactions, idle-in-transaction connections from connection poolers that don't clean up properly, and long-running analytical queries that hold old transaction snapshots. That last one is subtle: even if a query doesn't hold an explicit lock on the table, an open transaction with an old snapshot prevents autovacuum from reclaiming rows that are visible to that snapshot.

    How to Identify It

    Check the PostgreSQL log for autovacuum cancellation messages:

    grep -E 'autovacuum.*(cancel|lock|wait)' /var/log/postgresql/postgresql-*.log | tail -20

    Typical output when a worker gets blocked:

    2026-04-18 03:12:45 UTC [21301]: LOG: automatic vacuum of table "appdb.public.orders": index scans: 0
    2026-04-18 03:12:45 UTC [21301]: DETAIL: canceled by autovacuum: too many dead tuples; will retry later
    2026-04-18 03:14:01 UTC [21302]: LOG: autovacuum: found xmin 7893421 from client backend, not cleaning up

    Find the long-running transactions responsible:

    SELECT pid, usename, datname, state,
           now() - xact_start AS xact_age,
           now() - query_start AS query_age,
           left(query, 100) AS query_snippet
    FROM pg_stat_activity
    WHERE xact_start IS NOT NULL
      AND now() - xact_start > interval '5 minutes'
    ORDER BY xact_start ASC;

    How to Fix It

    For idle-in-transaction connections that are clearly stale, terminate them:

    SELECT pg_terminate_backend(pid)
    FROM pg_stat_activity
    WHERE state = 'idle in transaction'
      AND now() - xact_start > interval '10 minutes';

    More importantly, make this automatic going forward:

    ALTER SYSTEM SET idle_in_transaction_session_timeout = '5min';
    SELECT pg_reload_conf();

    This kills any session that goes idle inside a transaction after five minutes — catching connection pooler bugs and application-level oversights before they accumulate into a vacuum-blocking problem. For analytical workloads that legitimately need long query times, use a separate connection pool or database role with a higher (or no) timeout, rather than letting the default apply to everything.


    Root Cause 7: Autovacuum Is Running but Can't Keep Up

    Why It Happens

    Sometimes autovacuum is working correctly in every technical sense — it's enabled, workers are running, thresholds are being hit, no lock conflicts exist — but the write rate on a table simply outpaces what autovacuum can process. This is a capacity problem, not a misconfiguration. A single high-throughput table receiving thousands of updates per second generates dead tuples faster than one throttled autovacuum worker can reclaim them.

    How to Identify It

    Watch

    n_dead_tup
    trend over time and compare it against vacuum progress:

    SELECT relname,
           n_dead_tup,
           n_tup_upd + n_tup_del AS dead_generating_ops,
           last_autovacuum
    FROM pg_stat_user_tables
    WHERE n_dead_tup > 10000
    ORDER BY n_dead_tup DESC;

    Then sample

    pg_stat_progress_vacuum
    twice, 60 seconds apart, and calculate blocks vacuumed per second. If the table's write rate generates more dead-tuple pages per second than that throughput figure, you're losing the race even though autovacuum is technically running.

    How to Fix It

    For the specific hot table, apply aggressive per-table settings that override the globals:

    ALTER TABLE high_volume_events SET (
      autovacuum_vacuum_cost_delay = 0,
      autovacuum_vacuum_cost_limit = 2000,
      autovacuum_vacuum_scale_factor = 0.005
    );

    Setting

    autovacuum_vacuum_cost_delay = 0
    removes throttling entirely for this table's vacuum worker. Use this carefully — on shared I/O environments or spinning disks it can impact query latency. On dedicated NVMe storage it's typically fine. Pair this with a higher worker count and monitor whether vacuum throughput now exceeds the dead tuple generation rate.


    Prevention

    The most effective approach to autovacuum problems is catching them before they become incidents. Most monitoring stacks — Prometheus with postgres_exporter, Datadog's PostgreSQL integration, Telegraf — can scrape

    pg_stat_user_tables
    continuously. Set alerts on tables where
    n_dead_tup
    exceeds a threshold relative to
    n_live_tup
    , or where
    last_autovacuum
    is older than a configurable staleness window.

    A practical alerting query you can run as a scheduled check or wire into a monitoring script on sw-infrarunbook-01:

    SELECT schemaname || '.' || relname AS table_name,
           n_dead_tup,
           n_live_tup,
           ROUND(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 1) AS dead_pct,
           last_autovacuum,
           now() - last_autovacuum AS since_last_vacuum
    FROM pg_stat_user_tables
    WHERE n_dead_tup > 50000
       OR (
         last_autovacuum IS NOT NULL
         AND now() - last_autovacuum > interval '24 hours'
         AND n_live_tup > 10000
       )
    ORDER BY n_dead_tup DESC;

    Enable autovacuum logging so you have visibility into what it's actually doing in production. Set

    log_autovacuum_min_duration
    to capture any vacuum run that takes longer than a threshold:

    ALTER SYSTEM SET log_autovacuum_min_duration = '1s';
    SELECT pg_reload_conf();

    This is the single most useful diagnostic tool for understanding autovacuum behavior over time. You'll see which tables are getting the most attention, how long cleanup runs are taking, and whether the pace is appropriate. If you see the same table appearing in the log every few minutes with a long duration, that's your cue to tune its per-table settings.

    Make autovacuum configuration part of your table provisioning process. Any new table expected to receive high write volume should have appropriate

    autovacuum_vacuum_scale_factor
    and cost parameters set at creation time — not after bloat has already accumulated. Document these settings in your schema migrations alongside the table definition so future engineers understand the intent.

    Review your autovacuum configuration any time you significantly increase write load, add new databases to an instance, or upgrade PostgreSQL major versions. Each release adjusts defaults and behavior; don't assume settings that worked well on PostgreSQL 12 are still optimal on 16. And whenever you're about to disable autovacuum for a maintenance window — which is sometimes the right call — put a calendar reminder to re-enable it immediately afterward. That single discipline eliminates the most common cause of autovacuum being silently off in production.

    Frequently Asked Questions

    How do I check if autovacuum has run recently on a specific table?

    Query pg_stat_user_tables for the table name and look at the last_autovacuum column: SELECT relname, last_autovacuum, n_dead_tup FROM pg_stat_user_tables WHERE relname = 'your_table'; A NULL value means autovacuum has never run on that table since statistics were last reset, or the table was created recently.

    Can I force autovacuum to run immediately on a specific table?

    You can't force an autovacuum worker to run on demand, but you can run a manual VACUUM: VACUUM ANALYZE tablename; This is equivalent in behavior and will immediately clean up dead tuples and update statistics. It runs in the foreground of your session rather than as a background worker.

    Why does autovacuum keep getting canceled on my table?

    The most common cause is a long-running transaction or idle-in-transaction connection holding an old snapshot. Autovacuum workers will give up and log a cancellation rather than wait indefinitely. Check pg_stat_activity for sessions where now() - xact_start is greater than a few minutes, and consider enabling idle_in_transaction_session_timeout to terminate them automatically.

    Does increasing autovacuum_max_workers require a PostgreSQL restart?

    Yes. autovacuum_max_workers is a postmaster parameter that must be set before startup. Use ALTER SYSTEM SET autovacuum_max_workers = 6; and then restart PostgreSQL. Most other autovacuum parameters — cost delay, cost limit, scale factor, naptime — take effect after a simple SELECT pg_reload_conf(); with no restart required.

    What is a healthy dead tuple percentage for a PostgreSQL table?

    There's no universal number, but a dead tuple ratio below 5–10% of total tuples (live plus dead) is generally healthy. Tables above 20% dead tuples are noticeably bloated and will show degraded query performance, particularly for sequential scans. Anything above 50% is a serious problem requiring immediate manual VACUUM and autovacuum tuning review.

    Related Articles