InfraRunBook
    Back to articles

    PostgreSQL Replication Lag High

    Databases
    Published: Apr 14, 2026
    Updated: Apr 14, 2026

    Step-by-step runbook for diagnosing and fixing PostgreSQL streaming replication lag, covering network saturation, write load, underpowered replicas, WAL sender slots, and hot standby feedback bloat.

    PostgreSQL Replication Lag High

    Symptoms

    You're paged at 2 AM because monitoring is screaming about replication lag. Or a developer pings you saying "the replica is showing data from 10 minutes ago." Either way, you open your terminal and start digging. Here's what you typically see.

    Querying

    pg_stat_replication
    on the primary reveals non-zero lag across every column:

    SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn,
           write_lag, flush_lag, replay_lag
    FROM pg_stat_replication;
    
     client_addr  |   state   | sent_lsn  | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag  | replay_lag
    --------------+-----------+-----------+-----------+-----------+------------+-----------+------------+------------
     10.10.10.52  | streaming | 0/B2A0018 | 0/B1C4020 | 0/B1C4020 | 0/B0F8800  | 00:00:04.2| 00:00:04.2 | 00:00:14.8
    (1 row)

    On the replica side,

    pg_stat_wal_receiver
    shows the connection is alive but replay is clearly behind received LSN:

    SELECT status, receive_start_lsn, received_lsn,
           last_msg_send_time, last_msg_receipt_time
    FROM pg_stat_wal_receiver;
    
      status   | receive_start_lsn | received_lsn | last_msg_send_time          | last_msg_receipt_time
    -----------+-------------------+--------------+-----------------------------+-----------------------------
     streaming | 0/A0000000        | 0/B1C4020    | 2026-04-15 02:13:45.12+00   | 2026-04-15 02:13:45.34+00
    (1 row)

    The gap between what the replica has received and what

    pg_current_wal_lsn()
    returns on the primary is your lag in bytes — translate that to seconds and it maps directly to the
    replay_lag
    interval you saw above. Other symptoms include monitoring dashboards showing replication_delay_seconds spiking past SLO thresholds, application sessions on the replica reading stale rows that were committed minutes ago on the primary, and in the worst cases the replica falling so far behind that it disconnects entirely and must re-sync from a fresh base backup.


    Root Cause 1: Network Bandwidth Saturated

    WAL streaming is a continuous, high-throughput data flow. The primary ships every WAL record to connected replicas in near real-time. When the network link between primary and replica gets saturated — by backup jobs, ETL pipelines, application traffic, or even a competing replica on the same interface — WAL data queues up faster than it can be transmitted. Lag grows. Steadily.

    In my experience, this is most common on shared 1GbE links where a nightly

    pg_dump
    over the same interface consumes 800+ Mbps. The lag climbs from midnight until the dump finishes, then corrects itself. Nobody notices until the SLO is breached.

    To identify this, check interface throughput on both ends during the lag window:

    # On sw-infrarunbook-01 (primary)
    $ sar -n DEV 1 10 | grep eth0
    
    02:14:01    eth0   rxpck/s   txpck/s    rxkB/s    txkB/s
    02:14:02    eth0    1243.00   8921.00    782.34   9842.11
    02:14:03    eth0    1198.00   9102.00    755.21   9901.45

    If

    txkB/s
    is pinned near the interface's rated capacity, you've found your culprit. Cross-reference which processes own those connections:

    $ ss -tnp | grep :5432
    ESTAB  0  0  10.10.10.51:5432  10.10.10.52:41022  users:(("postgres",pid=3821,fd=14))
    ESTAB  0  0  10.10.10.51:5432  10.10.10.20:55301  users:(("postgres",pid=4012,fd=10))

    Then check

    pg_stat_activity
    to see if backup or bulk query traffic is competing with WAL sender on the same link.

    Fix: Move backups to a dedicated network interface, or implement QoS to prioritize WAL replication traffic. If you're on a hypervisor, give database VMs their own vSwitch or dedicated trunk port. Replication traffic should never compete with application or backup flows on the same interface. A dedicated 10.10.20.0/24 replication VLAN with its own NIC eliminates this class of problem entirely.


    Root Cause 2: Primary Under Heavy Write Load

    PostgreSQL generates WAL for every committed write. Bulk inserts, large updates, VACUUM FULL runs, and index rebuilds can produce gigabytes of WAL per minute. If your primary is generating more WAL than your replica can apply — even over a fast network — lag grows as a direct function of that throughput mismatch.

    The replica's WAL replay process is single-threaded in PostgreSQL's physical streaming replication model. It applies WAL sequentially. If the primary generates 500 MB/s during a batch job and the replica can only sustain 200 MB/s replay throughput due to disk or CPU constraints, the gap widens until the load subsides.

    Diagnose it by measuring WAL generation rate on the primary directly:

    -- Run twice, 60 seconds apart, subtract to get rate
    SELECT pg_current_wal_lsn();
    
     pg_current_wal_lsn
    --------------------
     0/B2A0018
    
    -- 60 seconds later:
    SELECT pg_current_wal_lsn();
    
     pg_current_wal_lsn
    --------------------
     0/D1F4200
    
    -- Convert difference to bytes:
    SELECT ('0/D1F4200'::pg_lsn - '0/B2A0018'::pg_lsn) AS wal_bytes_per_minute;
    
     wal_bytes_per_minute
    ----------------------
               501784040
    (1 row)

    500 MB of WAL per minute is aggressive. Find what's generating it using

    pg_stat_statements
    if it's enabled, or catch active sessions in the act:

    SELECT pid, usename, application_name, state, wait_event_type, wait_event,
           query_start, left(query, 100) AS query_snippet
    FROM pg_stat_activity
    WHERE state = 'active'
    ORDER BY query_start ASC
    LIMIT 10;

    Fix: Throttle batch jobs by adding

    pg_sleep()
    between chunk commits. Reduce individual transaction sizes so WAL is generated in smaller, more digestible bursts. Schedule heavy writes during off-peak hours. For index rebuilds, use
    CREATE INDEX CONCURRENTLY
    which spreads WAL generation over time rather than dumping it all at once. Also enable WAL compression — it's one of the highest-leverage settings for reducing WAL volume with minimal downside:

    # postgresql.conf on the primary
    wal_compression = on

    On typical OLTP workloads, WAL compression reduces volume by 30-60% at the cost of modest CPU overhead. That translates directly to less data the replica needs to apply.


    Root Cause 3: Replica Underpowered

    The replica's WAL replay process is CPU and I/O bound. If the replica is on a VM with fewer CPU cores or slower storage than the primary, it simply can't keep up. This is surprisingly common in environments where replicas were sized as "just for reads" without accounting for the constant WAL replay workload running underneath those reads.

    Check the replica's resources during a lag event:

    # On the replica (10.10.10.52)
    $ vmstat 1 10
    procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
     r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
     3  2      0 214032  18240 1842160   0    0  4821  1204  892 2241  8 12 32 48  0
     4  3      0 211840  18240 1843200   0    0  5012  1340  934 2389  9 13 28 50  0

    High

    wa
    — I/O wait — is the telltale sign. The replica is trying to apply WAL but the disk can't keep pace. Verify with iostat:

    $ iostat -x 1 5 | grep -E 'Device|sda|nvme'
    
    Device            r/s     w/s     rkB/s     wkB/s   r_await  w_await  %util
    sda              12.40   842.30    196.20   6738.40    0.82    18.94   99.8

    99.8% disk utilization on the replica while the primary's disk is barely touched tells the whole story. I've watched this exact scenario on a fleet where the primary ran NVMe and replicas were still on spinning HDDs. The replica ran perpetually 30-60 seconds behind and never caught up during business hours, no matter how much we tuned PostgreSQL parameters.

    Also rule out an intentional delay being misconfigured:

    -- On the replica
    SHOW recovery_min_apply_delay;
    
     recovery_min_apply_delay
    --------------------------
     0
    (1 row)

    Fix: Match the replica's storage performance to the primary's at minimum. The replica is doing everything the primary does — applying every write — while simultaneously serving read queries on top. If the primary has NVMe, the replica needs NVMe. Upgrading storage is the right answer when hardware is the bottleneck. No amount of PostgreSQL tuning compensates for a disk that can't keep up.


    Root Cause 4: WAL Sender Slot Blocked

    Replication slots guarantee that WAL is retained on the primary until the replica confirms it has consumed that data. This prevents the replica from falling behind due to WAL being recycled before it's been read — a real problem without slots. But it introduces a serious failure mode: if a slot becomes stale because the replica goes down or disconnects for an extended period, the primary must hold all WAL generated since the slot's

    restart_lsn
    . That can mean gigabytes or even terabytes of WAL accumulating in
    pg_wal
    .

    Check slot status on the primary:

    SELECT slot_name, plugin, slot_type, active, restart_lsn, confirmed_flush_lsn,
           pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS retained_wal
    FROM pg_replication_slots;
    
       slot_name    | plugin | slot_type | active | restart_lsn | confirmed_flush_lsn | retained_wal
    ----------------+--------+-----------+--------+-------------+---------------------+--------------
     replica_slot_1 |        | physical  | t      | 0/B0F8800   | 0/B0F8800           | 448 MB
     replica_slot_2 |        | physical  | f      | 0/42C10000  | 0/42C10000          | 18 GB
    (2 rows)

    That second slot is inactive and retaining 18 GB of WAL. Confirm the

    pg_wal
    directory reflects this:

    $ du -sh /var/lib/postgresql/16/main/pg_wal/
    18G    /var/lib/postgresql/16/main/pg_wal/

    When

    pg_wal
    fills the filesystem, PostgreSQL panics and stops. That stops replication entirely for all connected replicas, not just the one with the stale slot. This is how one offline replica can take down an entire replication topology.

    Fix: Drop stale slots that are no longer needed:

    SELECT pg_drop_replication_slot('replica_slot_2');

    Set

    max_slot_wal_keep_size
    in
    postgresql.conf
    to cap how much WAL any slot can retain before PostgreSQL invalidates it automatically. A replica that needs a resync is far better than a primary with a full disk:

    # postgresql.conf
    max_slot_wal_keep_size = 20GB
    wal_keep_size = 2GB

    Monitor slot lag as a first-class alerting metric. Alert when

    retained_wal
    for any slot exceeds your disk headroom threshold — long before it becomes an incident.


    Root Cause 5: Hot Standby Feedback Causing Bloat

    Hot standby feedback is a mechanism that prevents query cancellations on the replica due to conflicting VACUUM operations. When

    hot_standby_feedback = on
    , the replica periodically tells the primary: "don't vacuum rows that my active queries are still reading." This sounds like a feature. It is a feature. It's also a slow-motion disaster in environments with long-running analytics queries on replicas.

    Here's what happens in practice. A long-running query on the replica runs for 45 minutes. During that entire window, the primary cannot vacuum the rows that query is reading. Dead tuples accumulate. Table bloat grows. Scans get slower because they have to physically skip more dead rows. Autovacuum eventually fires an intensive, WAL-heavy cleanup pass — and the replica has to replay all of it. The very mechanism meant to protect the replica's queries ends up generating more work for it to do.

    Identify this by correlating long-running queries on the replica with bloat accumulation on the primary:

    -- On the replica: find long-running queries
    SELECT pid, now() - query_start AS duration, state, left(query, 120) AS query
    FROM pg_stat_activity
    WHERE state != 'idle'
      AND (now() - query_start) > interval '5 minutes'
    ORDER BY duration DESC;
    
      pid  | duration | state  | query
    -------+----------+--------+----------------------------------------------
     21045 | 00:47:23 | active | SELECT * FROM orders WHERE created_at > ...
    (1 row)
    -- On the primary: check for bloated tables
    SELECT schemaname, tablename, n_dead_tup, n_live_tup,
           round(n_dead_tup::numeric / nullif(n_live_tup + n_dead_tup, 0) * 100, 2) AS dead_pct,
           last_autovacuum
    FROM pg_stat_user_tables
    WHERE n_dead_tup > 100000
    ORDER BY n_dead_tup DESC
    LIMIT 10;
    
     schemaname | tablename | n_dead_tup | n_live_tup | dead_pct | last_autovacuum
    ------------+-----------+------------+------------+----------+------------------------------
     public     | orders    |    9821440 |    4201030 |    70.05 | 2026-04-14 22:01:12.123+00
    (1 row)

    70% dead tuples is severe. Autovacuum has tried and failed to clean this table because hot_standby_feedback is blocking it from removing rows the replica's 47-minute query still holds an XID interest in.

    Fix: Disable hot_standby_feedback and instead protect the replica's queries through timeout policies. Canceling a query is recoverable. Uncontrolled bloat and replication lag are not:

    # postgresql.conf on the replica
    hot_standby_feedback = off
    
    # Set in postgresql.conf or per analytics role:
    statement_timeout = '30min'
    idle_in_transaction_session_timeout = '5min'

    If you genuinely need multi-hour analytics on a replica without query cancellations, use a logical replica or a purpose-built analytics node — somewhere the bloat side-effect doesn't flow back to affect the primary's vacuum behavior. For immediate relief, terminate the blocking backend:

    -- On the replica
    SELECT pg_terminate_backend(21045);

    Root Cause 6: Long-Running Transactions on the Primary

    A forgotten open transaction on the primary prevents VACUUM from advancing the oldest transaction ID horizon, which causes the same bloat and WAL-amplification problem as hot_standby_feedback — but it originates on the primary itself. The replica then has to replay the resulting heavy autovacuum WAL. It's a quieter cause of lag but I've seen it show up enough times that it deserves its own check.

    Find it like this:

    SELECT pid, usename, state, now() - xact_start AS txn_age, left(query, 80) AS query
    FROM pg_stat_activity
    WHERE xact_start IS NOT NULL
      AND (now() - xact_start) > interval '10 minutes'
    ORDER BY txn_age DESC;
    
      pid  | usename           | state                | txn_age  | query
    -------+-------------------+----------------------+----------+-------
     18832 | infrarunbook-admin| idle in transaction  | 01:22:14 | BEGIN
    (1 row)

    An "idle in transaction" session open for 82 minutes is a forgotten connection — likely an application that opened a transaction, did its work, and then crashed or hung before committing. Terminate it and set

    idle_in_transaction_session_timeout
    on the primary to prevent recurrence:

    SELECT pg_terminate_backend(18832);
    
    # postgresql.conf on the primary
    idle_in_transaction_session_timeout = '10min'

    Prevention

    Reactive troubleshooting at 2 AM is expensive. Build the following into every PostgreSQL deployment and you'll catch lag before it becomes a page.

    Monitor the right metrics. Alert on

    replay_lag
    from
    pg_stat_replication
    when it exceeds your SLO — typically 30 seconds for most applications, tighter for financial or inventory systems. Also alert on
    pg_replication_slots.retained_wal
    and on
    pg_stat_user_tables.n_dead_tup / (n_live_tup + n_dead_tup)
    for bloat ratio. These three metrics catch the majority of lag causes before they escalate.

    Cap WAL slot retention unconditionally. Always set

    max_slot_wal_keep_size
    in production. Losing a replica to a required resync is a recoverable operational event. Losing your primary to a full disk is not:

    # postgresql.conf
    max_slot_wal_keep_size = 20GB
    wal_keep_size = 2GB

    Match hardware between primary and replica. The replica is doing everything the primary does plus serving read queries. If the primary has NVMe, the replica needs NVMe. Underpowering replicas is a false economy — the operational cost of chasing lag incidents far exceeds the hardware savings.

    Isolate replication traffic to its own network path. Backups, ETL, and application traffic should not share a NIC with WAL streaming. A dedicated replication subnet — even a simple second interface at 10.10.20.0/24 — eliminates an entire class of network saturation problems.

    Set session timeouts at every layer. Both

    statement_timeout
    and
    idle_in_transaction_session_timeout
    should be configured for application roles on both primary and replica. These are your safety net against long-running queries and forgotten transactions feeding bloat and lag.

    Enable WAL compression.

    wal_compression = on
    reduces WAL volume by 30-60% for typical OLTP workloads with minimal CPU cost on modern hardware. Less WAL shipped means less replay work and less lag headroom consumed.

    Test your replica's catch-up speed. Run a controlled write burst on the primary and time how long the replica takes to fully apply it. That number is your actual recovery time under load — and it should inform your SLO and RTO planning, not the other way around.

    Replication lag is rarely mysterious once you know where to look. The causes above cover the vast majority of production incidents. Instrument your stack to surface them early, keep your hardware balanced, and put hard limits on the things that grow unbounded — slots, long transactions, bloat — and you'll spend a lot less time debugging at 2 AM.

    Frequently Asked Questions

    How do I check PostgreSQL replication lag from the primary?

    Query pg_stat_replication on the primary: SELECT client_addr, write_lag, flush_lag, replay_lag FROM pg_stat_replication; The replay_lag column shows how far behind each replica is. You can also compute byte-level lag with pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn).

    What is a safe value for max_slot_wal_keep_size?

    Set it to roughly 20-50% of your available pg_wal disk space. For example, if you have 100 GB dedicated to WAL, 20GB is a reasonable cap. The goal is to prevent disk exhaustion from stale slots without setting it so low that a briefly disconnected replica immediately loses its slot.

    Should I enable hot_standby_feedback on my PostgreSQL replica?

    Only if query cancellation errors on the replica are genuinely unacceptable and you have strict controls on query duration. For most deployments, disabling it and setting statement_timeout on the replica is the better trade-off — it avoids the table bloat side-effect that hot_standby_feedback causes on the primary.

    Why does my PostgreSQL replica keep disconnecting and needing to re-sync?

    The most common reason is a stale or inactive replication slot on the primary causing pg_wal to fill the filesystem. Check pg_replication_slots for inactive slots retaining large amounts of WAL, and set max_slot_wal_keep_size to prevent unbounded WAL accumulation.

    How can I reduce WAL generation on a busy PostgreSQL primary?

    Enable wal_compression = on to reduce WAL volume by 30-60% on typical OLTP workloads. For batch jobs, reduce transaction sizes and add pg_sleep() between chunks. Use CREATE INDEX CONCURRENTLY instead of regular index rebuilds to spread WAL generation over time.

    Related Articles