What Is PostgreSQL Replication, and Why Does It Come in Two Flavors?
PostgreSQL gives you two fundamentally different approaches to replication, and if you've spent any time managing production databases, you've probably run into situations where picking the wrong one caused real headaches. Streaming replication and logical replication both move data from one PostgreSQL instance to another, but they operate at completely different levels of the stack. Getting that distinction right is not optional — it determines what you can do, what you can't, and where things will break when they inevitably do.
Streaming replication is physical. It works at the level of Write-Ahead Log (WAL) records — the raw binary changes PostgreSQL writes before it modifies any data page. A standby server connects to the primary, receives those WAL records in real time, and replays them exactly as they were written. The result is a byte-for-byte identical copy of the primary. Logical replication, on the other hand, works at the level of row changes. It decodes WAL records into logical operations — INSERT, UPDATE, DELETE — and ships those to subscribers, which can then apply them to tables that might look completely different from the source.
Both are built on WAL. That's the common foundation. But where they diverge tells you everything about when to use each one.
How Streaming Replication Works
The mechanics of streaming replication are worth understanding from first principles. When PostgreSQL writes a transaction, it first records the change in WAL — a sequential log on disk that acts as the source of truth for crash recovery. Streaming replication taps into this log and ships it to one or more standby servers before (or after, depending on your synchrony settings) committing.
On the primary, you set
wal_level = replica(or higher). You create a replication user and configure
pg_hba.confto allow that user to connect with the replication protocol. On the standby, you configure a
primary_conninfopointing back to the primary. When the standby connects, it negotiates a start LSN (Log Sequence Number) with the primary and begins receiving WAL segments. The standby is in continuous recovery mode — it's always applying WAL, never accepting writes.
Here's a minimal setup for streaming replication between two nodes at 10.10.1.10 (primary) and 10.10.1.11 (standby):
# On primary: postgresql.conf
wal_level = replica
max_wal_senders = 5
wal_keep_size = 512MB
listen_addresses = '10.10.1.10'
# On primary: pg_hba.conf
host replication replication_user 10.10.1.11/32 scram-sha-256
# On standby: postgresql.conf
primary_conninfo = 'host=10.10.1.10 port=5432 user=replication_user password=strongpassword'
hot_standby = on
# Create the standby signal file (PostgreSQL 12+)
# touch /var/lib/postgresql/data/standby.signal
The standby will start replaying WAL as soon as it connects. You can check replication lag from the primary with:
SELECT
client_addr,
state,
sent_lsn,
write_lsn,
flush_lsn,
replay_lsn,
(sent_lsn - replay_lsn) AS replication_lag_bytes
FROM pg_stat_replication;
One important concept here is replication slots. Without a slot, if the standby falls behind, PostgreSQL might recycle WAL segments the standby still needs. A replication slot pins those segments on disk until the consumer acknowledges them. The trade-off is serious: if a standby with a slot goes down and stays down, WAL accumulates on the primary indefinitely — which will eventually fill your disk and crash the instance. I've seen this bite teams during unplanned outages where nobody noticed a standby had been down for two days. You need monitoring on slot lag just as much as you need the slot itself.
# Create a physical replication slot on the primary
SELECT pg_create_physical_replication_slot('standby_sw_infrarunbook_01');
# Check slot status and lag
SELECT
slot_name,
active,
restart_lsn,
pg_size_pretty(
pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)
) AS slot_lag
FROM pg_replication_slots;
Synchronous vs asynchronous is a significant choice here. By default, streaming replication is asynchronous — the primary commits and the standby receives WAL at whatever pace it can manage. You get low commit latency at the cost of potential data loss on failover (the standby might be a few transactions behind). Synchronous replication, configured with
synchronous_standby_names, forces the primary to wait for at least one standby to confirm WAL receipt before acknowledging the commit to the client. Zero data loss, but every write now pays a network round-trip penalty. In my experience, most teams run async and accept the small RPO window rather than absorb that write latency hit across every transaction.
How Logical Replication Works
Logical replication takes a different path through the same WAL. Instead of shipping raw WAL bytes, it runs WAL through a logical decoding process that converts binary page-level changes into a stream of row-level operations. The primary creates a publication — a named set of tables and the operations to replicate (INSERT, UPDATE, DELETE, TRUNCATE). Subscribers connect, subscribe to a publication, and receive a decoded stream of row changes they can apply to their own tables.
The infrastructure requirement shifts too. You need
wal_level = logical, which is a superset of
replica. Critically, the subscriber doesn't need to be identical to the publisher — it can be a different PostgreSQL major version, have additional columns, different indexes, or even be a completely different schema layout as long as the replicated columns can be mapped. That flexibility is what makes logical replication so powerful for upgrade and migration scenarios.
# On publisher: postgresql.conf
wal_level = logical
max_replication_slots = 10
max_wal_senders = 10
# Create a publication for specific tables
CREATE PUBLICATION app_data_pub
FOR TABLE users, orders, products
WITH (publish = 'insert, update, delete');
# Or publish everything
CREATE PUBLICATION full_pub FOR ALL TABLES;
# On subscriber: create the subscription
CREATE SUBSCRIPTION app_data_sub
CONNECTION 'host=10.10.1.10 port=5432 dbname=appdb user=replication_user password=strongpassword'
PUBLICATION app_data_pub;
# Check subscription sync status
SELECT
subname,
pid,
received_lsn,
latest_end_lsn,
latest_end_time
FROM pg_stat_subscription;
When a subscription is created, PostgreSQL first performs an initial table sync — essentially a snapshot copy of existing rows — then switches to streaming incremental changes. The subscriber is a fully writable PostgreSQL instance, not a read-only standby. You can run DDL on it, write to tables that aren't being replicated, or use it as a staging environment while changes flow in from the publisher.
Under the hood, each subscription creates a logical replication slot on the publisher. These slots carry the exact same WAL accumulation risk as physical slots. If your subscriber goes silent, you need to know about it before it becomes a disk crisis on the publisher side.
Why It Matters: Choosing the Right Tool
The choice between streaming and logical replication is usually obvious once you're clear on your requirements, but I've seen teams default to streaming replication for everything because it's simpler to set up initially — then realize months later they've backed themselves into a corner.
Streaming replication is the right choice for high availability and disaster recovery. You want a hot standby you can promote if the primary fails? Use streaming. The standby is an exact replica — same data, same schema, same PostgreSQL major version, same everything. You can use hot standby replicas for read scaling, since they accept SELECT queries while continuously replaying WAL. Failover is clean because there's nothing to reconcile; the promoted standby is a complete, consistent database.
Logical replication is the right choice when you need flexibility that streaming can't provide. Cross-version upgrades are the canonical use case, but it goes further than that. Selective table replication means you can maintain a lightweight reporting replica that only carries the tables your analytics queries actually touch, without the overhead of replicating audit logs or session tables you don't care about. You can also use logical replication to feed external systems — Kafka, data warehouses, search indexes — via change data capture without any application-level instrumentation.
The teams that get the most mileage out of PostgreSQL replication are typically running streaming replication for their HA stack and layering logical replication on top for specific use cases — zero-downtime major version upgrades, CDC pipelines to analytics, or selective sync to read-optimized replicas with different indexing strategies.
Real-World Example: Zero-Downtime Major Version Upgrade
Here's a scenario I've walked through multiple times. You're running PostgreSQL 14 on sw-infrarunbook-01 at 10.10.1.10. You need to reach PostgreSQL 16 with minimal application downtime.
pg_upgradeis an option, but it requires a full outage window — typically 30 minutes to several hours depending on catalog size. Logical replication gives you a far better path.
Spin up a fresh PostgreSQL 16 instance at 10.10.1.12. Migrate the schema manually using
pg_dump --schema-only. Then create the publication on the old instance and the subscription on the new one, and let the initial sync run in the background while your application continues writing to the old primary.
# PostgreSQL 14 at 10.10.1.10 — enable logical WAL and create publication
ALTER SYSTEM SET wal_level = logical;
SELECT pg_reload_conf();
CREATE PUBLICATION upgrade_pub FOR ALL TABLES;
-- Confirm
SELECT pubname, puballtables, pubinsert, pubupdate, pubdelete
FROM pg_publication;
# PostgreSQL 16 at 10.10.1.12 — subscribe
CREATE SUBSCRIPTION upgrade_sub
CONNECTION 'host=10.10.1.10 port=5432 dbname=appdb user=replication_user password=strongpassword'
PUBLICATION upgrade_pub;
-- Watch initial sync progress per table
SELECT
schemaname,
relname,
n_live_tup
FROM pg_stat_user_tables
ORDER BY n_live_tup DESC;
Once the initial sync completes and
received_lsnin
pg_stat_subscriptionis tracking close to the publisher's current WAL position, your cutover window is tiny. Put the application in maintenance mode, wait for the subscriber to fully catch up, then point your connection string at 10.10.1.12 and drop the subscription. The actual outage is measured in seconds.
One thing that catches people on their first run: sequences don't replicate via logical replication. Their current values don't flow through the publication. If you forget to sync them, your new instance will start generating IDs from wherever the sequence was initialized — which almost certainly collides with data that was copied during the initial sync. Always add a sequence sync step to your cutover runbook.
# On old instance — capture current sequence values
SELECT sequencename, last_value
FROM pg_sequences
WHERE schemaname = 'public';
# On new instance — advance sequences past the safe threshold before cutover
SELECT setval('users_id_seq', 5000000);
SELECT setval('orders_id_seq', 10000000);
SELECT setval('products_id_seq', 2000000);
Real-World Example: CDC Pipeline with wal2json
Another pattern that's become increasingly common is using PostgreSQL's logical decoding framework to drive a change data capture pipeline. Instead of polling tables for changes with application-level triggers or timestamp comparisons, you subscribe directly to the WAL stream and emit every row change to downstream systems — Kafka, Kinesis, a data warehouse, whatever your stack uses.
wal2jsonis a PostgreSQL logical decoding plugin that outputs WAL changes as JSON. It ships as part of most PostgreSQL distributions. You configure it on a logical replication slot and consume it from an external process:
# Create a logical replication slot backed by wal2json
SELECT pg_create_logical_replication_slot('cdc_pipeline_slot', 'wal2json');
# Peek at changes without consuming them (safe for testing)
SELECT * FROM pg_logical_slot_peek_changes(
'cdc_pipeline_slot',
NULL,
NULL,
'pretty-print', '1',
'include-timestamp', '1',
'include-transaction', '1'
);
The output for an INSERT into a users table looks like this:
{
"timestamp": "2026-04-08T14:22:00Z",
"change": [
{
"kind": "insert",
"schema": "public",
"table": "users",
"columnnames": ["id", "email", "created_at"],
"columnvalues": [42, "infrarunbook-admin@solvethenetwork.com", "2026-04-08T14:22:00Z"]
}
]
}
This pattern completely decouples your analytics and event pipelines from application-level instrumentation. No triggers, no shadow tables, no audit columns cluttering your schema. The WAL is the source of truth, and you're just reading it. Tools like Debezium wrap this pattern with Kafka integration and automatic slot management, but understanding the raw mechanism means you can debug it when Debezium does something unexpected — and it will.
Common Misconceptions
The most damaging misconception I keep encountering is that logical replication can serve as a drop-in replacement for streaming replication. It cannot. If you configure logical replication hoping to use it for HA failover, you will run into trouble fast. Logical replication does not replicate DDL changes. Every
ALTER TABLE,
CREATE INDEX, or schema modification on the publisher must be manually applied to the subscriber before replication can continue for affected tables. Streaming replication replicates absolutely everything — schema changes, sequence updates, index builds, system catalog modifications, all of it. For HA, streaming is the only sane answer.
Another persistent confusion: people assume hot standby replicas accept writes. They don't. A streaming standby in recovery mode is strictly read-only — you can run SELECT queries against it, but any attempt to write returns an error. If you need a writable secondary, that's what logical replication gives you. Some teams use extensions like BDR or Citus for true multi-master topologies, but vanilla logical replication with writes on both sides pushes conflict resolution entirely to your application. PostgreSQL won't mediate it for you — the last write wins, and if two nodes modify the same row simultaneously, you'll get silent data divergence unless your application is explicitly designed to prevent it.
There's also a belief that logical replication is meaningfully slower than streaming replication. The WAL decoding step does consume more CPU than raw WAL shipping, and for extreme write workloads this overhead is real. But the bottleneck in most production systems is network bandwidth or subscriber I/O, not the decoding overhead on the publisher. Don't let theoretical CPU cost drive you away from logical replication when it's the right architectural fit.
Finally, slot management is underestimated almost universally. Both physical and logical slots pin WAL until the consumer acknowledges it. If a consumer disappears — standby failover, subscription dropped accidentally, CDC consumer crashing — that WAL accumulates silently. PostgreSQL 13+ gives you
max_slot_wal_keep_sizeto cap this:
# postgresql.conf — cap WAL retention per slot to prevent disk exhaustion
max_slot_wal_keep_size = 10GB
With this configured, PostgreSQL will invalidate a slot that falls too far behind rather than allow WAL to grow unbounded. The subscriber will need a full resync, but your primary stays healthy. Pair this with an alert on the
wal_statuscolumn in
pg_replication_slots. When it transitions from
reservedto
extendedto
lost, you want to know before it becomes an incident:
SELECT
slot_name,
slot_type,
active,
wal_status,
pg_size_pretty(
pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)
) AS slot_lag
FROM pg_replication_slots
ORDER BY slot_lag DESC NULLS LAST;
PostgreSQL replication isn't something you set up once and forget. Both streaming and logical replication require active monitoring, disciplined slot management, and a clear mental model of what each approach will and won't do for you. Once you have that, the right choice for each situation becomes obvious — and so do the failure modes before they become 3am incidents.
