What You're Building
PostgreSQL streaming replication lets your standby server receive WAL (Write-Ahead Log) records directly from the primary in near real-time. Unlike file-based log shipping, streaming doesn't wait for a WAL segment to fill up before sending — it pushes changes continuously. The result is a replica that's typically seconds behind the primary, not minutes.
In this guide, we're setting up a single primary and one hot standby using streaming replication on PostgreSQL 15. The primary is sw-infrarunbook-01 at
192.168.10.10, and the replica is sw-infrarunbook-02 at
192.168.10.11. Both run Ubuntu 22.04 LTS with PostgreSQL 15 installed from the official PGDG repository. The same steps apply to PostgreSQL 14 and 16 with only minor differences.
Prerequisites
Get these sorted before touching a single config file, otherwise you'll be chasing cryptic connection errors halfway through the setup:
- PostgreSQL 15 installed on both servers from the PGDG apt repo — don't use the distro's default package, it's usually one or two versions behind and causes subtle incompatibilities
- Both servers can reach each other on port 5432 — verify this with
nc -zv 192.168.10.10 5432
from the replica before doing anything else - NTP synchronized on both hosts — replication won't break if clocks drift slightly, but your monitoring will lie to you about replication lag in confusing ways
- Sufficient disk space on the replica to hold a full base backup of the primary's data directory, plus room for WAL replay
- The PostgreSQL service running on the primary, and stopped on the replica for now
In my experience, the connectivity check is the step people skip and then waste thirty minutes on later. Run it first. It costs you five seconds.
Step 1 — Configure the Primary Server
SSH into sw-infrarunbook-01 and open
/etc/postgresql/15/main/postgresql.conf. The defaults PostgreSQL ships with won't work for replication. You need to explicitly enable WAL streaming and tune a few related settings:
wal_level = replica
max_wal_senders = 5
wal_keep_size = 512
listen_addresses = '192.168.10.10'
hot_standby = on
wal_log_hints = on
hot_standby_feedback = on
wal_level = replicais the minimum required for streaming replication. It tells PostgreSQL to include enough information in WAL records for a standby to follow along. The default is
minimal, which deliberately strips most of that out to reduce WAL volume. Set it to
logicalif you also plan to use logical replication downstream — it's a superset of
replica.
max_wal_senderscontrols how many concurrent WAL streaming connections the primary accepts. Five covers one or two standbys plus monitoring tools that occasionally poll replication status.
wal_keep_sizetells the primary to hold at least 512 MB of WAL segments on disk so a briefly lagging replica doesn't fall off the edge and get stranded without the WAL it needs. On a busy OLTP system generating several gigabytes of WAL per hour, bump this value accordingly.
wal_log_hintsisn't strictly required for basic streaming replication, but it's needed if you ever want to use
pg_rewindto resync a former primary after a failover. Enable it now and you won't regret it later.
hot_standby_feedbackis covered in the full configuration section — it matters for replicas that serve read traffic.
Step 2 — Create the Replication User
Don't use the
postgressuperuser for replication. Create a dedicated role with only the permissions it actually needs:
psql -U postgres -c "CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'Use-A-Real-Password-Here';"
Pick a real password. The replication user has network access to your primary and can read all your WAL data. Treat it like any other privileged service credential — store it in your secrets manager, not in a sticky note on the server.
Step 3 — Update pg_hba.conf on the Primary
Open
/etc/postgresql/15/main/pg_hba.confand add a line that allows the replicator role to connect from the replica's IP specifically for replication. This is a different entry type from regular database access:
# TYPE DATABASE USER ADDRESS METHOD
host replication replicator 192.168.10.11/32 scram-sha-256
The key detail is the word
replicationin the database column. This isn't a database name — it's a special keyword that covers WAL streaming connections. A line granting access to
postgresor
alldatabases does not cover replication. Without this specific entry, the standby connects, authenticates successfully, and then gets rejected the moment it tries to start the replication stream. The error is confusing the first time you see it, which is why this ends up in the common mistakes section below.
Reload PostgreSQL on the primary to apply both config changes. Some parameters like
max_wal_sendersand
listen_addressesrequire a full restart rather than a reload:
systemctl restart postgresql@15-main
Step 4 — Take the Base Backup on the Replica
SSH into sw-infrarunbook-02. Stop the local PostgreSQL instance if it's running, then clear the data directory. This is the step to feel appropriately nervous about — double-check you're on the replica before running the remove command:
systemctl stop postgresql@15-main
rm -rf /var/lib/postgresql/15/main
Now use
pg_basebackupto copy the primary's data directory over the network. Run this as the
postgressystem user:
sudo -u postgres pg_basebackup \
-h 192.168.10.10 \
-U replicator \
-D /var/lib/postgresql/15/main \
-P -Xs -R
Breaking down the flags:
-his the primary's IP,
-U replicatoris the role we created,
-Dis where to write the data directory,
-Pshows a progress indicator,
-Xsstreams WAL during the backup rather than waiting to copy it afterward, and
-Ris the critical one — it writes a
standby.signalfile and adds the
primary_conninfoconnection string to
postgresql.auto.confautomatically.
The
-Rflag was introduced when PostgreSQL 12 folded
recovery.confinto the main config system. If you're on an older version, you'll need to create
recovery.confmanually with the
standby_modeand
primary_conninfoentries — but honestly, if you're still running PostgreSQL 11, a streaming replication guide is the least of your upgrade concerns.
The command will prompt for the replicator's password. Once it finishes, verify the
standby.signalfile landed in the right place:
ls -la /var/lib/postgresql/15/main/standby.signal
If the file is there, you're on track.
Step 5 — Verify and Adjust the Replica's Connection Config
The
-Rflag should have written a
primary_conninfoentry into
/var/lib/postgresql/15/main/postgresql.auto.conf. Open it and confirm it looks correct:
primary_conninfo = 'user=replicator password=Use-A-Real-Password-Here host=192.168.10.10 port=5432 sslmode=prefer'
If the password wasn't captured — and sometimes it isn't depending on how
pg_basebackuphandled the auth flow — add it here or use a
.pgpassfile at
/var/lib/postgresql/.pgpasswith permissions set to 0600. The
.pgpassapproach is cleaner for production: you're not embedding credentials in a config file that might drift into version control or a configuration management system.
Also confirm that
hot_standby = onis set in
/etc/postgresql/15/main/postgresql.confon the replica. This allows the standby to accept read-only queries while it's replaying WAL. It defaults to on in recent versions, but being explicit means it's visible when someone audits the config.
Step 6 — Start the Replica
systemctl start postgresql@15-main
Tail the PostgreSQL log at
/var/log/postgresql/postgresql-15-main.logimmediately after starting. You want to see a sequence like this:
LOG: entering standby mode
LOG: redo starts at 0/3000028
LOG: consistent recovery state reached at 0/3000100
LOG: database system is ready to accept read-only connections
LOG: started streaming WAL from primary at 0/3000000 on timeline 1
That last line — started streaming WAL from primary — is the one you're waiting for. It means the replica has connected and is actively receiving WAL records in real-time. If you see connection refused errors or authentication failures instead, work through the common mistakes section below before proceeding.
Full Configuration Reference
Here's the complete set of relevant settings from both servers, consolidated so you can cross-check your own config against a known-good baseline.
Primary — postgresql.conf (replication-related settings)
# Replication
wal_level = replica
max_wal_senders = 5
wal_keep_size = 512
wal_log_hints = on
# Connections
listen_addresses = '192.168.10.10'
port = 5432
# Standby behavior
hot_standby = on
hot_standby_feedback = on
Primary — pg_hba.conf (replication entry only)
# TYPE DATABASE USER ADDRESS METHOD
host replication replicator 192.168.10.11/32 scram-sha-256
Replica — postgresql.auto.conf
primary_conninfo = 'user=replicator password=Use-A-Real-Password-Here host=192.168.10.10 port=5432 sslmode=prefer'
Replica — postgresql.conf (relevant additions)
hot_standby = on
hot_standby_feedback = on
A note on
hot_standby_feedback: when enabled on the replica, it reports the oldest transaction still active on the standby back to the primary. This prevents the primary from vacuuming away row versions that a long-running read query on the replica still needs to see. Without it, those queries on the standby get cancelled with a replication conflict error, which is disorienting if you're using the replica to serve reporting workloads. In my experience, this setting is worth enabling by default on any standby that accepts read traffic. The downside is that it can delay table bloat cleanup on the primary if the standby has a query that runs for a long time — but that's a trade-off you want to make deliberately, not stumble into.
Verification Steps
Once both servers are running, connect to the primary and query
pg_stat_replication:
SELECT client_addr,
state,
sent_lsn,
write_lsn,
flush_lsn,
replay_lsn,
write_lag,
flush_lag,
replay_lag,
sync_state
FROM pg_stat_replication;
You should see one row for
192.168.10.11with
state = streaming. The lag columns show how far behind the replica is at each stage: write lag is when the replica acknowledged receiving the WAL, flush lag is when it hit disk, and replay lag is when it was applied to the database. Under a low-write workload, all three will be null or sub-millisecond. During a write burst they'll climb temporarily — that's normal behavior. What you don't want is any of them growing indefinitely without plateauing.
From the replica itself, check
pg_stat_wal_receiver:
SELECT status,
receive_start_lsn,
received_lsn,
last_msg_send_time,
last_msg_receipt_time,
sender_host,
sender_port
FROM pg_stat_wal_receiver;
A
statusof
streamingconfirms the replica is actively receiving WAL from
192.168.10.10. If this query returns no rows, the WAL receiver process isn't running — check the replica's log for the reason.
For a quick end-to-end sanity check, create a test table on the primary and confirm it propagates:
-- On sw-infrarunbook-01 (primary):
CREATE TABLE replication_test (id serial PRIMARY KEY, created_at timestamptz DEFAULT now());
INSERT INTO replication_test DEFAULT VALUES;
-- On sw-infrarunbook-02 (replica):
SELECT * FROM replication_test;
The row should appear on the replica within a second or two. If the table doesn't exist on the replica at all, streaming has stalled before the DDL was applied. Dig into the replica log immediately — it'll usually tell you exactly what went wrong.
You can also measure replication lag in bytes using LSN arithmetic on the primary:
SELECT client_addr,
pg_current_wal_lsn() - sent_lsn AS unsent_bytes,
pg_current_wal_lsn() - replay_lsn AS total_lag_bytes
FROM pg_stat_replication
WHERE client_addr = '192.168.10.11';
Under normal conditions both values will be zero or close to it. This query is useful to run during a write-heavy maintenance window to see whether the replica is keeping up.
Common Mistakes
Wrong wal_level. If
wal_levelis still set to
minimalwhen the replica connects, PostgreSQL accepts the connection and then fails when the standby tries to start streaming. The error in the standby log typically reads something like requested WAL segment has already been removed or a complaint about the WAL format not supporting replication. Always verify with
SHOW wal_level;on the primary before starting the replica. Changing
wal_levelrequires a full restart of the primary, not just a reload.
Missing the replication keyword in pg_hba.conf. This is the most common stumbling block by far. You add a
hostline for the replicator user against a database, but the replication connection type is separate from regular client connections. The database column must contain
replication— granting access to the
postgresdatabase or using
allwon't cover WAL streaming. The error the standby reports looks like an authentication failure even though the password is correct, which sends people down the wrong troubleshooting path. Check pg_hba.conf first when connection attempts fail.
Forgetting to restart after parameter changes. Some parameters — including
max_wal_senders,
wal_level, and
listen_addresses— require a full server restart to take effect. A reload won't apply them. If you change these and then reload, PostgreSQL will silently continue running with the old values. You can verify the active value with
SHOW max_wal_senders;after restarting.
Running pg_basebackup as the wrong user. The data directory at
/var/lib/postgresql/15/mainis owned by the
postgressystem user. If you run
pg_basebackupas root or your own account, the copied files will have the wrong ownership and PostgreSQL will refuse to start the replica. Always run
pg_basebackupvia
sudo -u postgres. If you've already made this mistake, fix the ownership recursively with
chown -R postgres:postgres /var/lib/postgresql/15/main.
Not setting hot_standby = on. Without this, the replica starts in recovery mode but refuses all client connections — including the monitoring queries that would tell you whether replication is working. The standby can be perfectly healthy and fully caught up while returning the database system is starting up to every connection attempt. It defaults to on in recent PostgreSQL versions, but set it explicitly in your config so it's visible.
Ignoring WAL segment retention and replication slots. The
wal_keep_sizesetting only helps if the replica catches up before that buffer is exhausted. On a high-write primary with a slow or temporarily disconnected replica, WAL segments get recycled before the replica can consume them. When the replica reconnects, it finds those segments gone and aborts with requested WAL segment has already been removed — at which point you need to re-run
pg_basebackupfrom scratch. Replication slots solve this by preventing WAL recycling until the replica has consumed it, but they come with a real risk: if the replica stays offline for an extended period, the slot causes WAL to accumulate on the primary until it runs out of disk. If you use replication slots, monitor slot lag as a separate alert. A slot that's growing and not draining is a disk space incident waiting to happen.
Streaming replication is one of PostgreSQL's most reliable built-in features once it's configured correctly. The moving parts are the WAL configuration on the primary, the pg_hba.conf replication entry, the base backup, and the standby signal file — get those four things right and the replication stream tends to be rock solid. The verification queries in
pg_stat_replicationand
pg_stat_wal_receivergive you a clear, real-time view of what's happening on both ends of the connection. Build alerting on top of those views and you'll know the moment something drifts before it becomes an incident.
