DNS Propagation Delay Issues

Published: Apr 6, 2026

Updated: Apr 6, 2026

A deep-dive troubleshooting guide for DNS propagation delays, covering every layer of the resolution stack from authoritative servers to ISP resolvers, OS caches, and zone transfer failures.

Symptoms

DNS propagation delays are among the most operationally disruptive issues in infrastructure work. You have updated a record on your authoritative nameserver — changed an A record, swapped an MX endpoint, removed a deprecated CNAME — and yet hours later, behavior is inconsistent. Some users reach the new destination; others are still hitting the old one. The discrepancy is hard to reproduce because it depends entirely on which resolver a client happens to be using.

Common symptoms include:

Running
dig solvethenetwork.com @8.8.8.8
returns the new IP but
dig solvethenetwork.com @1.1.1.1
still returns the old one
Users in different geographic regions or on different ISPs receive different IP addresses for the same hostname
After a server migration, the decommissioned host continues receiving production traffic
TLS certificate issuance via ACME fails intermittently because the ACME challenge DNS record resolves to the wrong IP from the CA's resolver
Email delivery bounces or defers because MX changes haven't reached the recipient's mail server resolver
Monitoring and health checks report the new record while end-user support tickets report the old behavior
A newly created subdomain returns NXDOMAIN from some resolvers even though it exists on the authoritative server

Propagation delays are rarely caused by a single failure. They are typically a combination of several independently-caching systems — each operating on its own TTL clock — that must all expire their cached data before the change is universally visible. Diagnosing the problem requires interrogating each layer of the resolution stack in isolation.

Root Cause 1: TTL Too High on the Old Record

Why It Happens

The Time-to-Live (TTL) field on every DNS resource record instructs recursive resolvers how long they are permitted to serve that record from cache before they must re-query the authoritative server. If a record carried a TTL of 86400 (24 hours) at the time a resolver last fetched it, that resolver is fully compliant with RFC 1035 when it continues serving the cached answer for the next 24 hours — even after you've changed the record on the authoritative server. This is not a bug; it is DNS behaving exactly as designed.

The failure mode is almost always a process failure, not a technical one. An operator makes a record change and simultaneously lowers the TTL in the same zone file edit. By the time any resolver sees the new TTL value, it has already cached the old record under the old TTL, so the lower TTL has no effect on the current cache cycle. It only benefits the next re-fetch — which still won't happen until the original high TTL has expired.

How to Identify It

Query the authoritative nameserver directly to see the TTL currently published in the zone:

dig @ns1.solvethenetwork.com solvethenetwork.com A +norecurse

;; ANSWER SECTION:
solvethenetwork.com.    86400   IN  A   10.10.1.75

Then query a public recursive resolver. The TTL in the answer will count down from the value at the time of caching:

dig @8.8.8.8 solvethenetwork.com A

;; ANSWER SECTION:
solvethenetwork.com.    82341   IN  A   10.10.1.50

A remaining TTL of 82341 means the resolver cached this entry approximately 4059 seconds ago (86400 − 82341) and will continue serving the stale answer for another 22.8 hours. The authoritative server has the new record (10.10.1.75) but the resolver still has the old one (10.10.1.50) locked in for nearly a full day.

How to Fix It

The only correct procedure is to lower the TTL well in advance of any planned record change — before the change window, not during it. The lead time must equal at least one full current TTL cycle so that all resolvers that were holding the record under the old TTL have had a chance to refresh and pick up the new low TTL.

; Step 1: 24+ hours before the change window, lower the TTL
solvethenetwork.com.    300     IN  A   10.10.1.50

; Step 2: After one full TTL cycle, make the record change
solvethenetwork.com.    300     IN  A   10.10.1.75

; Step 3: After confirming propagation, restore TTL to operational value
solvethenetwork.com.    86400   IN  A   10.10.1.75

If you are already past the point of no return — the record has changed but the old high-TTL version is cached everywhere — there is no shortcut. You must wait. You can monitor progress by querying multiple public resolvers every few minutes and watching the TTL value count down toward zero. When it hits zero, that resolver will re-query and pick up the new record.

Root Cause 2: Negative Cache Not Expired

Why It Happens

RFC 2308 defines negative caching: when a resolver queries for a record that does not exist (NXDOMAIN response) or queries for a record type that has no entries for a name (NOERROR/NODATA), the resolver caches that negative result. The duration of that negative cache is governed by the MINIMUM field of the zone's SOA record — commonly called the negative TTL.

This creates two distinct problem scenarios. First, if you create a new hostname or record type that previously did not exist, any resolver that already queried and received an NXDOMAIN response will refuse to re-query until the negative TTL expires — even though the record now exists. Second, if a record was briefly absent during a zone reload, BIND restart, or accidental deletion, resolvers may have cached an NXDOMAIN during that window. Those resolvers will continue returning NXDOMAIN until the negative cache expires, regardless of how quickly you restore the record.

How to Identify It

Inspect the SOA record to determine the negative TTL value (the last field in the SOA RDATA):

dig solvethenetwork.com SOA +short
ns1.solvethenetwork.com. infrarunbook-admin.solvethenetwork.com. 2024040601 3600 900 604800 300

The seventh field (300) is the negative TTL in seconds. To confirm that a remote resolver has cached a negative response, query it and inspect the status and authority section:

dig @8.8.8.8 api.solvethenetwork.com A

;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 44271
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; AUTHORITY SECTION:
solvethenetwork.com.    287     IN  SOA  ns1.solvethenetwork.com. infrarunbook-admin.solvethenetwork.com. 2024040601 3600 900 604800 300

Status NXDOMAIN combined with a SOA record in the authority section is the definitive signature of a cached negative response. The TTL on the SOA entry (287) is the remaining seconds before this negative cache expires and the resolver will try again.

How to Fix It

Reduce the negative TTL in the SOA record for zones subject to frequent changes. Edit the zone file on sw-infrarunbook-01 and set the seventh SOA field to 60 seconds:

$TTL 86400
@   IN  SOA  ns1.solvethenetwork.com. infrarunbook-admin.solvethenetwork.com. (
            2024040602  ; Serial — increment after every change
            3600        ; Refresh
            900         ; Retry
            604800      ; Expire
            60 )        ; Negative TTL — reduced from 300 to 60

After editing, increment the serial and reload the zone:

rndc reload solvethenetwork.com
server: reloading zone 'solvethenetwork.com/IN': success

For records that were just added and have already been negatively cached, you must wait for the negative TTL to expire on each affected resolver. You cannot remotely flush external caches. The only workaround for end users who cannot wait is to point them at a resolver with a fresher cache, or have them flush their local OS-level cache.

Root Cause 3: ISP Resolver Caching Stale Data

Why It Happens

ISP-operated recursive resolvers are shared infrastructure serving potentially hundreds of thousands of subscribers. Some of these resolvers implement aggressive caching strategies that deliberately extend TTL values beyond what the authoritative server advertises. This behavior — sometimes called TTL stretching — reduces the resolver's upstream query volume and improves perceived response times for subscribers. It is technically non-compliant with RFC 1035 but is common enough in the wild to be a regular source of propagation complaints.

Even when an ISP resolver faithfully honors TTL values, its popularity problem is significant: a heavily-used resolver may receive thousands of queries for

solvethenetwork.com

per minute. Each query that arrives before the TTL expires resets nothing — the cached entry stays alive as long as the TTL on the existing cache entry hasn't hit zero. But in practice, a record that is constantly being queried will always find the entry still in cache when it is re-requested, so it stays perpetually cached until an explicit re-fetch cycle completes.

How to Identify It

The telltale sign is divergence between well-known public resolvers and a specific ISP's resolver. Query each vantage point and compare:

# Ground truth — authoritative server
dig @ns1.solvethenetwork.com solvethenetwork.com A +norecurse +short
10.10.1.75

# Google Public DNS
dig @8.8.8.8 solvethenetwork.com A +short
10.10.1.75

# Cloudflare DNS
dig @1.1.1.1 solvethenetwork.com A +short
10.10.1.75

# ISP-assigned resolver (RFC 1918 address seen via DHCP)
dig @192.168.100.1 solvethenetwork.com A +short
10.10.1.50   # still returning old IP

If the authoritative server and multiple public resolvers all return the new record but a specific ISP resolver returns the old one — and the TTL on the SOA-published record has clearly long expired — ISP-side TTL stretching or aggressive caching is the cause.

How to Fix It

You have no direct mechanism to flush a third-party ISP's resolver cache. Your practical options are:

Wait: Even ISP resolvers with aggressive caching eventually expire entries. Most will re-query within hours even if they ignore the published TTL.
Redirect users temporarily: Instruct affected users to configure their system DNS to 8.8.8.8 or 1.1.1.1 until the ISP cache clears.
Contact the ISP NOC: Some ISPs will flush their resolver cache for a specific domain on request. Success rate is variable, but worth attempting for critical migrations.
Proxy old to new: If traffic volume is critical, keep the old server running with a reverse proxy or redirect rule pointing to the new server while the ISP cache drains.

The most effective long-term mitigation is strict pre-migration TTL reduction. If the record's TTL was already at 300 seconds for 24 hours before the change, even a misbehaving ISP resolver will re-fetch within 5 minutes of each subscriber's query cycle.

Root Cause 4: Authoritative Server Not Updated

Why It Happens

Production DNS deployments almost universally involve a primary authoritative server and one or more secondary servers. Changes applied to the primary do not automatically appear on secondaries. Secondaries must either receive a DNS NOTIFY message from the primary and then pull an IXFR or AXFR, or they must poll the primary periodically (every Refresh interval from the SOA) and detect a serial number change. If the NOTIFY is lost, the secondary's poll interval is long, or the transfer itself fails silently, the secondary will continue serving the old zone data indefinitely.

This is particularly insidious because a zone's NS records list all authoritative servers equally. Recursive resolvers typically round-robin across listed nameservers or select them by latency. This means some resolvers will hit the updated primary and get the correct answer, while others hit the stale secondary and get the old answer. The result is non-deterministic: the same query from two different resolvers returns different answers, and neither client can explain the discrepancy.

How to Identify It

Query each listed authoritative server directly and compare SOA serials and record values:

# Check SOA serial on primary
dig @ns1.solvethenetwork.com solvethenetwork.com SOA +short
ns1.solvethenetwork.com. infrarunbook-admin.solvethenetwork.com. 2024040602 3600 900 604800 300

# Check SOA serial on secondary
dig @ns2.solvethenetwork.com solvethenetwork.com SOA +short
ns1.solvethenetwork.com. infrarunbook-admin.solvethenetwork.com. 2024040601 3600 900 604800 300

The serial mismatch (2024040602 vs 2024040601) confirms the secondary is serving a stale zone version. Verify by querying the specific record that was changed on each server:

dig @ns1.solvethenetwork.com www.solvethenetwork.com A +short
10.10.1.75

dig @ns2.solvethenetwork.com www.solvethenetwork.com A +short
10.10.1.50   # old IP — secondary is stale

How to Fix It

On sw-infrarunbook-01 (the primary), force a NOTIFY to all configured secondaries:

rndc notify solvethenetwork.com
zone 'solvethenetwork.com' is now notified

Watch the BIND log for confirmation that the transfer completed cleanly:

tail -f /var/log/named/named.log

06-Apr-2024 14:22:10.341 notify: info: zone solvethenetwork.com/IN: sending notifies (serial 2024040602)
06-Apr-2024 14:22:10.502 xfer-in: info: transfer of 'solvethenetwork.com/IN' from 10.10.1.10#53: Transfer completed: 1 messages, 14 records, 476 bytes, 0.001 secs (476000 bytes/sec)

If NOTIFY fails, manually trigger a retransfer from the secondary:

# Run on the secondary nameserver
rndc retransfer solvethenetwork.com

If zone transfers are being blocked, verify the primary's

named.conf

allows the secondary's IP under

allow-transfer

, and confirm that TCP/53 is open between the servers at the firewall level. Zone transfers always use TCP; a firewall that only permits UDP/53 will silently block all transfers.

Root Cause 5: Partial Zone Transfer

Why It Happens

A zone transfer — whether a full AXFR or an incremental IXFR — can fail midway through due to a network interruption, a TCP session timeout, a firewall stateful table overflow, or a TSIG authentication failure on a single packet in a multi-packet sequence. When this occurs, the secondary may apply only part of the zone changes, leaving the zone in an internally inconsistent state: some records reflect the new version, others still carry old values. Critically, the secondary's SOA serial may or may not reflect the intended zone version depending on exactly when in the transfer process the failure occurred.

IXFR transfers are more susceptible to this failure mode than full AXFR transfers. An IXFR conveys only the diff — the sequence of deletions and additions between two zone serial numbers. If that diff is complex (many records changed in a single serial increment) or if the transfer spans multiple TCP segments, a mid-transfer reset can leave the secondary in an indeterminate state that is difficult to detect by serial comparison alone.

How to Identify It

The warning sign of a partial transfer is matching serial numbers across servers accompanied by mismatched record data. Start with serial comparison:

dig @ns1.solvethenetwork.com solvethenetwork.com SOA +short
ns1.solvethenetwork.com. infrarunbook-admin.solvethenetwork.com. 2024040605 3600 900 604800 300

dig @ns2.solvethenetwork.com solvethenetwork.com SOA +short
ns1.solvethenetwork.com. infrarunbook-admin.solvethenetwork.com. 2024040605 3600 900 604800 300

Serials match — but now compare specific records that were part of the update:

dig @ns1.solvethenetwork.com app.solvethenetwork.com A +short
10.10.2.20

dig @ns2.solvethenetwork.com app.solvethenetwork.com A +short
10.10.1.80   # old value — partial transfer left this record unchanged

Matching serials with divergent record data is the definitive fingerprint of a partial zone transfer. Confirm by checking the secondary's BIND log for IXFR errors:

grep -i "ixfr\|xfer\|transfer\|failed" /var/log/named/named.log

06-Apr-2024 12:14:33.110 xfer-in: error: transfer of 'solvethenetwork.com/IN' from 10.10.1.10#53: IXFR failed: unexpected end of input
06-Apr-2024 12:14:33.112 xfer-in: info: transfer of 'solvethenetwork.com/IN' from 10.10.1.10#53: retrying AXFR
06-Apr-2024 12:14:33.891 xfer-in: error: transfer of 'solvethenetwork.com/IN' from 10.10.1.10#53: transfer failed: timed out

How to Fix It

Force the secondary to discard its current zone state and perform a fresh full AXFR. The safest method is to remove the zone journal file (which tracks IXFR history) and retrigger the transfer:

# On the secondary (sw-infrarunbook-01 acting as secondary)
systemctl stop named
rm /var/cache/bind/solvethenetwork.com.jnl
systemctl start named

# Then explicitly request retransfer
rndc retransfer solvethenetwork.com

Confirm a clean completion in the logs:

06-Apr-2024 14:30:01.221 xfer-in: info: transfer of 'solvethenetwork.com/IN' from 10.10.1.10#53: Transfer completed: 4 messages, 52 records, 2041 bytes, 0.004 secs

To reduce the frequency of partial IXFR failures for critical zones, add

request-ixfr no;

to the secondary's zone configuration to always use full AXFR. This increases transfer bandwidth but eliminates incremental transfer failures as a failure mode entirely. Alternatively, deploy TSIG for authenticated zone transfers — TSIG covers the entire transfer stream at the message level, providing integrity verification and making partial transfer corruption detectable immediately.

Root Cause 6: Multiple Authoritative Servers Running Different Zone Versions

Why It Happens

Some organizations operate DNS in a stealth primary configuration where the authoritative servers listed in the parent zone's NS delegation are all secondaries, with the actual primary hidden from public view. Changes flow: operator edits primary → primary notifies all secondaries → secondaries transfer. If a deployment pipeline applies changes to some secondaries but not others — due to a partial Ansible run, a failed configuration management job, or a network partition — the visible authoritative servers diverge silently.

How to Identify It

Enumerate all NS records and query each in turn:

dig solvethenetwork.com NS +short
ns1.solvethenetwork.com.
ns2.solvethenetwork.com.

for ns in ns1.solvethenetwork.com ns2.solvethenetwork.com; do
  printf "%-35s" "$ns:"
  dig @$ns solvethenetwork.com A +short
done

ns1.solvethenetwork.com:    10.10.1.75
ns2.solvethenetwork.com:    10.10.1.50

How to Fix It

Re-apply the full zone update to all out-of-sync authoritative servers. If you use a configuration management system, ensure all DNS nodes are included in the target inventory and that the run completes successfully on every node. After applying changes, always run the serial comparison check across all listed NS records as part of your post-change verification before closing the change ticket.

Root Cause 7: OS and Browser DNS Cache

Why It Happens

End-user operating systems maintain a local DNS cache independent of any upstream resolver. On Linux,

systemd-resolved

nscd

caches records locally. On macOS,

mDNSResponder

handles caching. Web browsers implement their own separate DNS cache on top of the OS cache — Chromium-based browsers cache positive responses for up to 60 seconds by default, and this is not controlled by the record's TTL.

How to Identify It

If a specific user reports stale resolution but querying the same upstream resolver from a different machine returns the correct answer, the problem is local to that user's machine:

# Check systemd-resolved cache statistics
resolvectl statistics

Current Transactions: 0
Total Transactions: 5112
Current Cache Size:  94
Cache Hits:          4803
Cache Misses:        309

How to Fix It

Flush the local OS cache and the browser cache:

# Linux — systemd-resolved
resolvectl flush-caches

# Linux — nscd
nscd -i hosts

# macOS
sudo dscacheutil -flushcache; sudo killall -HUP mDNSResponder

For browser caches, navigate to the internal DNS cache management page and flush it directly:

# Chromium-based browsers
chrome://net-internals/#dns

# Firefox
about:networking#dns

Prevention

Most DNS propagation issues are preventable with disciplined operational processes. The following practices eliminate the majority of propagation-related incidents:

Pre-lower TTLs before every record change. Reduce the target record's TTL to 300 seconds or less at least one full TTL-cycle before the planned change. Never change the record value and TTL simultaneously in the same edit.
Keep the negative TTL low. Set the SOA MINIMUM field to 60–300 seconds for all zones where new records may be added. A high negative TTL (e.g., 86400) causes newly created records to be invisible from previously-queried resolvers for an entire day.
Alert on SOA serial mismatches. Implement monitoring that queries all authoritative nameservers for each zone's SOA serial every 5 minutes and alerts if any server lags behind the primary by more than one serial increment. This catches failed transfers before they affect users.
Authenticate zone transfers with TSIG. Deploy TSIG keys between primary and secondary servers. TSIG authenticates every DNS message in a transfer sequence, making partial or corrupted transfers detectable immediately.
Verify across all authoritative servers after every change. Before closing any DNS change, query every NS record listed for the zone and confirm the expected value and serial are returned by each server.
Use a DNS change runbook. Standardize a four-step procedure: lower TTL → wait one TTL cycle → make the record change → verify across all authoritatives and multiple resolvers → restore TTL.
Audit parent zone NS delegation regularly. Decommissioned nameservers left in parent-zone NS delegation will serve stale or empty responses to resolvers that happen to query them. Audit and clean up NS records whenever a nameserver is retired.
Document rollback values. Before making any DNS change, record the current record value and TTL. If rollback is needed, time-to-restore matters — have the previous zone file state committed to version control and ready to re-apply immediately.

Frequently Asked Questions

What is DNS propagation and why does it take time?

DNS propagation is the process by which a change made on an authoritative nameserver becomes visible to all recursive resolvers worldwide. It takes time because resolvers cache records for the duration of the record's TTL. Until that TTL expires and the resolver re-queries the authoritative server, it continues serving the cached answer. There is no broadcast mechanism in DNS — each resolver independently decides when to refresh its cache.

How do I check whether a DNS change has propagated to a specific resolver?

Query that resolver directly using the @ flag in dig. For example: dig @8.8.8.8 solvethenetwork.com A +short. Compare the result to your authoritative server: dig @ns1.solvethenetwork.com solvethenetwork.com A +norecurse +short. If the answers differ, the resolver is still serving cached data.

What TTL value should I normally set on A records?

For stable infrastructure records that rarely change, 3600–86400 seconds is reasonable. For records on hosts subject to migrations or failover, 300–600 seconds is more appropriate. Match your TTL to your operational risk tolerance for change propagation latency — the higher the TTL, the longer propagation takes when a change is needed.

Can I force external resolvers to clear their cache for my domain?

For most public resolvers, no. Google Public DNS and Cloudflare each provide cache flush tools via their resolver management consoles. For ISP resolvers, you can contact the NOC and request a manual flush. The most reliable solution is to have a low TTL in place before making any change so the propagation window is short by design.

What is the difference between AXFR and IXFR, and which should I use?

AXFR transfers the complete zone file. IXFR transfers only the diff between two zone serial numbers. IXFR is more bandwidth-efficient but more complex and prone to partial-transfer failures. For small-to-medium zones or where correctness is critical, AXFR is the safer choice. Force AXFR-only on a secondary by setting request-ixfr no; in the named.conf zone block.

How does negative caching affect newly created DNS records?

If a resolver queried for a hostname before that record was created and received an NXDOMAIN response, it caches that negative result for the zone's negative TTL (SOA MINIMUM field). Even after the record is created on the authoritative server, that resolver returns NXDOMAIN until the negative cache entry expires. This is a common issue when provisioning new services.

Why do some users see the new record immediately after a change while others do not?

Different users use different resolvers. A resolver that had not previously cached the record fetches the new answer immediately. A resolver that cached the old answer under a high TTL continues serving it until the TTL expires. The diversity of resolvers across your user base directly determines the spread of propagation lag you observe.

How do I verify that my BIND secondary performed a complete and correct zone transfer?

Check three things: compare the SOA serial on primary and secondary (they must match); query several recently-changed records on both servers and compare answers; review /var/log/named/named.log for the most recent xfer-in entry and confirm it completed without errors with a reasonable record count. Matching serials with divergent record data indicates a partial transfer.

What does it mean when my SOA serials match but record values differ between primary and secondary?

This is the signature of a partial zone transfer. The secondary received enough of the transfer to update its serial but the session was interrupted before all record changes were applied. The fix is to delete the zone journal file on the secondary (the .jnl file), restart BIND, and allow a clean full AXFR to complete.

Is there any way to speed up propagation after a mistake has already been made with a high TTL?

Once a high-TTL record is cached, you cannot force external resolvers to expire it early. Your options are: wait for the TTL to naturally expire; run old and new services in parallel; instruct users to switch to a public resolver; or lower the TTL immediately so the next re-fetch cycle results in a short new cache duration. Lowering the TTL shortens the propagation tail even if it cannot fix the current cache cycle.

Symptoms

Root Cause 1: TTL Too High on the Old Record

Why It Happens

How to Identify It

How to Fix It

Root Cause 2: Negative Cache Not Expired

Why It Happens

How to Identify It

How to Fix It

Root Cause 3: ISP Resolver Caching Stale Data

Why It Happens

How to Identify It

How to Fix It

Root Cause 4: Authoritative Server Not Updated

Why It Happens

How to Identify It

How to Fix It

Root Cause 5: Partial Zone Transfer

Why It Happens

How to Identify It

How to Fix It

Root Cause 6: Multiple Authoritative Servers Running Different Zone Versions

Why It Happens

How to Identify It

How to Fix It

Root Cause 7: OS and Browser DNS Cache

Why It Happens

How to Identify It

How to Fix It

Prevention

Related Articles

Frequently Asked Questions

What is DNS propagation and why does it take time?

How do I check whether a DNS change has propagated to a specific resolver?

What TTL value should I normally set on A records?

Can I force external resolvers to clear their cache for my domain?

What is the difference between AXFR and IXFR, and which should I use?

How does negative caching affect newly created DNS records?

Why do some users see the new record immediately after a change while others do not?

How do I verify that my BIND secondary performed a complete and correct zone transfer?

What does it mean when my SOA serials match but record values differ between primary and secondary?

Is there any way to speed up propagation after a mistake has already been made with a high TTL?

Related Articles