DNS Resolution Failure Troubleshooting

Symptoms

DNS resolution failures surface across every layer of your infrastructure simultaneously. When DNS breaks, applications stall mid-connection, SSH sessions hang for 30 seconds on hostname lookups before timing out, monitoring systems start firing cascade alerts, and users report that "the internet is down" when the network itself is perfectly healthy. Recognizing the symptom pattern before running any diagnostic commands saves significant time.

Common symptoms include:

Name resolution timeout:
ping sw-infrarunbook-01.solvethenetwork.com
hangs for several seconds then returns
ping: sw-infrarunbook-01.solvethenetwork.com: Name or service not known
SERVFAIL responses:
dig
returns
status: SERVFAIL
even for names that definitively exist in the zone
NXDOMAIN on existing records: Records that exist in the zone return NXDOMAIN, indicating the resolver cannot reach or does not trust the authoritative server
REFUSED responses: The resolver explicitly rejects queries — often due to recursion being disabled or ACL blocks excluding the client subnet
Partial resolution failures: Internal hostnames fail while external names resolve correctly (or vice versa), pointing to a split-horizon, forwarder, or zone loading issue
Intermittent failures: Some queries succeed while others fail for the same name — common when a round-robin resolver pool has one unhealthy member
Application-level errors:
ERR_NAME_NOT_RESOLVED
in browsers,
getaddrinfo ENOTFOUND
in Node.js logs,
java.net.UnknownHostException
in JVM applications

Root Cause 1: Resolver Down

Why It Happens

The resolver — whether a BIND instance, Unbound, dnsmasq, or a network appliance — can crash, become unreachable, or exhaust system resources. Common triggers include OOM kills when cache memory grows unbounded, a runaway query flood that spikes CPU and causes the process watchdog to kill it, a corrupt configuration that causes the daemon to exit on reload, or a network change that isolates the resolver host from its clients. This is the first thing to rule out because it is the most complete failure mode: nothing resolves.

How to Identify It

From a client host, run a direct query against the configured resolver and observe the response:

infrarunbook-admin@sw-infrarunbook-01:~$ dig @192.168.1.10 solvethenetwork.com A

; <<>> DiG 9.18.12 <<>> @192.168.1.10 solvethenetwork.com A
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached

A timeout with

no servers could be reached

is the clearest possible sign the resolver is not responding. Confirm whether the host is reachable at all before logging into it:

infrarunbook-admin@sw-infrarunbook-01:~$ ping -c 3 192.168.1.10
PING 192.168.1.10 (192.168.1.10) 56(84) bytes of data.
64 bytes from 192.168.1.10: icmp_seq=1 ttl=64 time=0.412 ms
64 bytes from 192.168.1.10: icmp_seq=2 ttl=64 time=0.388 ms
64 bytes from 192.168.1.10: icmp_seq=3 ttl=64 time=0.401 ms

The host responds to ICMP but DNS is not answering. Log into the resolver and check the service state:

infrarunbook-admin@sw-infrarunbook-01:~$ systemctl status named
● named.service - BIND Domain Name Server
     Loaded: loaded (/lib/systemd/system/named.service; enabled)
     Active: failed (Result: exit-code) since Fri 2026-04-04 09:12:44 UTC; 18min ago
    Process: 2341 ExecStart=/usr/sbin/named -f -u bind (code=exited, status=1/FAILURE)
   Main PID: 2341 (code=exited, status=1/FAILURE)

Apr 04 09:12:44 sw-infrarunbook-01 named[2341]: loading configuration from '/etc/bind/named.conf'
Apr 04 09:12:44 sw-infrarunbook-01 named[2341]: /etc/bind/named.conf.options:12: unknown option 'forwarders-policy'
Apr 04 09:12:44 sw-infrarunbook-01 named[2341]: loading configuration: unexpected token

If the process was killed by the OOM killer rather than a config error, check kernel logs:

infrarunbook-admin@sw-infrarunbook-01:~$ journalctl -k | grep -i "oom\|killed"
Apr 04 08:55:10 sw-infrarunbook-01 kernel: Out of memory: Killed process 2219 (named) score 312 total-vm:1048576kB

How to Fix It

Address the configuration error, validate it, then restart the service:

infrarunbook-admin@sw-infrarunbook-01:~$ sudo named-checkconf /etc/bind/named.conf
/etc/bind/named.conf.options:12: unknown option 'forwarders-policy'

# Correct the typo in named.conf.options — remove or fix the invalid directive
infraunbook-admin@sw-infrarunbook-01:~$ sudo nano /etc/bind/named.conf.options

infraunbook-admin@sw-infrarunbook-01:~$ sudo named-checkconf /etc/bind/named.conf
# No output means clean

infraunbook-admin@sw-infrarunbook-01:~$ sudo systemctl start named
infraunbook-admin@sw-infrarunbook-01:~$ systemctl status named
● named.service - BIND Domain Name Server
     Active: active (running) since Fri 2026-04-04 09:31:02 UTC; 3s ago

If the OOM killer is the culprit, reduce the resolver cache size in

named.conf.options

options {
    max-cache-size 256m;
    max-cache-ttl 3600;
};

Root Cause 2: Wrong Nameserver Configured

Why It Happens

A client pointed at the wrong nameserver will receive incorrect or empty answers. This arises from stale DHCP leases pushing an old resolver IP, a manually misconfigured

/etc/resolv.conf

that survived a rebuild, a broken systemd-resolved stub listener, or a cloud metadata service returning resolver addresses from a different VPC or subnet. When the queried server has no knowledge of the internal zone, it returns NXDOMAIN (if it has no forwarder configured for the zone) or silently forwards to upstream public resolvers that also have no knowledge of your private namespace.

How to Identify It

Check what nameserver the system is actually using at this moment:

infrarunbook-admin@sw-infrarunbook-01:~$ cat /etc/resolv.conf
# This file is managed by man:systemd-resolved(8).
nameserver 172.16.0.254
search solvethenetwork.com

infraunbook-admin@sw-infrarunbook-01:~$ resolvectl status
Global
       Protocols: -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: stub
Current DNS Server: 172.16.0.254
       DNS Servers: 172.16.0.254

Now query that server directly and compare the result against the known authoritative resolver:

infrarunbook-admin@sw-infrarunbook-01:~$ dig @172.16.0.254 sw-infrarunbook-01.solvethenetwork.com A
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 44312

infraunbook-admin@sw-infrarunbook-01:~$ dig @192.168.1.10 sw-infrarunbook-01.solvethenetwork.com A
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 19872
;; ANSWER SECTION:
sw-infrarunbook-01.solvethenetwork.com. 300 IN A 192.168.1.50

The authoritative resolver at

192.168.1.10

returns the correct answer. The wrong server at

172.16.0.254

returns NXDOMAIN because it has no knowledge of the internal zone.

How to Fix It

On systemd-networkd managed hosts, update the DNS setting in the network unit file:

infrarunbook-admin@sw-infrarunbook-01:~$ sudo nano /etc/systemd/network/10-eth0.network

[Network]
DNS=192.168.1.10
DNS=192.168.1.11

infraunbook-admin@sw-infrarunbook-01:~$ sudo systemctl restart systemd-networkd
infraunbook-admin@sw-infrarunbook-01:~$ resolvectl status
Current DNS Server: 192.168.1.10
       DNS Servers: 192.168.1.10 192.168.1.11

/etc/resolv.conf

is managed manually and needs direct correction:

infrarunbook-admin@sw-infrarunbook-01:~$ sudo chattr -i /etc/resolv.conf
infraunbook-admin@sw-infrarunbook-01:~$ sudo tee /etc/resolv.conf <<'EOF'
nameserver 192.168.1.10
nameserver 192.168.1.11
search solvethenetwork.com
EOF

Root Cause 3: Firewall Blocking UDP/TCP Port 53

Why It Happens

DNS traffic runs on port 53 using UDP for standard queries and TCP for zone transfers, large responses, and any response exceeding 512 bytes (or 4096 bytes with EDNS0). Overly restrictive firewall rules — applied on the resolver host itself, on an intermediate network appliance, or in a cloud security group — can silently drop DNS packets in one or both directions. A particularly common scenario is a security hardening script that locks down all UDP by default and only whitelists specific application UDP ports, forgetting to include 53. Another scenario is a stateful firewall that permits outbound queries but blocks the return UDP datagrams because they arrive on unexpected ports or with unexpected source IPs.

How to Identify It

Test both UDP and TCP connectivity to port 53 directly, bypassing the resolver library entirely:

infrarunbook-admin@sw-infrarunbook-01:~$ nc -zuv 192.168.1.10 53
Connection to 192.168.1.10 53 port [udp/domain] succeeded!

infraunbook-admin@sw-infrarunbook-01:~$ nc -zv 192.168.1.10 53
nc: connect to 192.168.1.10 port 53 (tcp) failed: Connection refused

UDP succeeds but TCP is blocked. This will cause silent failures for large DNS responses (DNSSEC, long TXT records, large ANY responses). Confirm with dig over TCP:

infrarunbook-admin@sw-infrarunbook-01:~$ dig +tcp @192.168.1.10 solvethenetwork.com ANY
;; communications error to 192.168.1.10#53: connection refused

On the resolver host, inspect the active firewall rules:

infrarunbook-admin@sw-infrarunbook-01:~$ sudo iptables -L INPUT -n -v
Chain INPUT (policy DROP)
 pkts bytes target  prot  opt in  out  source        destination
    0     0 ACCEPT  udp   --  *   *    0.0.0.0/0     192.168.1.10   udp dpt:53
    0     0 DROP    tcp   --  *   *    0.0.0.0/0     192.168.1.10   tcp dpt:53

The DROP rule on TCP port 53 is explicit. If using nftables instead:

infrarunbook-admin@sw-infrarunbook-01:~$ sudo nft list ruleset | grep -B2 -A2 "port 53"

How to Fix It

Insert rules to permit both UDP and TCP on port 53 and persist them:

infrarunbook-admin@sw-infrarunbook-01:~$ sudo iptables -I INPUT -p tcp --dport 53 -j ACCEPT
infraunbook-admin@sw-infrarunbook-01:~$ sudo iptables -I INPUT -p udp --dport 53 -j ACCEPT

# Persist across reboots:
infraunbook-admin@sw-infrarunbook-01:~$ sudo netfilter-persistent save
run-parts: executing /usr/share/netfilter-persistent/plugins.d/15-ip4tables save
run-parts: executing /usr/share/netfilter-persistent/plugins.d/25-ip6tables save

For nftables environments:

infrarunbook-admin@sw-infrarunbook-01:~$ sudo nft add rule inet filter input ip daddr 192.168.1.10 tcp dport 53 accept
infraunbook-admin@sw-infrarunbook-01:~$ sudo nft add rule inet filter input ip daddr 192.168.1.10 udp dport 53 accept

Verify the fix resolves both protocols:

infrarunbook-admin@sw-infrarunbook-01:~$ dig +tcp @192.168.1.10 solvethenetwork.com SOA
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 6712
;; ANSWER SECTION:
solvethenetwork.com. 86400 IN SOA ns1.solvethenetwork.com. infrarunbook-admin.solvethenetwork.com. 2026040401 3600 900 604800 300

Root Cause 4: Recursion Disabled

Why It Happens

BIND distinguishes between authoritative queries (the server holds the zone and answers from local data) and recursive queries (the server looks up external names on behalf of the client, following the delegation chain from root). When

recursion no;

is set globally — a common hardening step for authoritative-only nameservers — any client that needs the resolver to look up external names will receive an explicit

REFUSED

. The same result occurs when recursion is enabled globally but the

allow-recursion

ACL does not include the querying client's subnet. This failure mode is especially disruptive because the resolver is healthy and the zone is loaded correctly — only recursive queries fail.

How to Identify It

infrarunbook-admin@sw-infrarunbook-01:~$ dig @192.168.1.10 cloudflare.com A
;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 55023
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

The message

WARNING: recursion requested but not available

is the definitive indicator. Internal zone names may still work if the server is authoritative for them, but all external lookups will fail. Confirm by checking the BIND configuration:

infrarunbook-admin@sw-infrarunbook-01:~$ sudo grep -n "recursion\|allow-recursion" /etc/bind/named.conf.options
8:  recursion no;
9:  allow-recursion { none; };

Or, if recursion is enabled but the ACL is too narrow:

infrarunbook-admin@sw-infrarunbook-01:~$ sudo grep -n "allow-recursion" /etc/bind/named.conf.options
8:  allow-recursion { 10.10.10.0/24; };

# A client on 192.168.1.0/24 is not in this ACL and will receive REFUSED

How to Fix It

Edit

/etc/bind/named.conf.options

to enable recursion and specify the authorized client subnets:

options {
    directory "/var/cache/bind";

    recursion yes;
    allow-recursion {
        192.168.0.0/16;
        10.0.0.0/8;
        172.16.0.0/12;
        127.0.0.1;
    };

    forwarders {
        8.8.8.8;
        1.1.1.1;
    };
    forward only;
};

Validate the configuration and reload without restarting:

infrarunbook-admin@sw-infrarunbook-01:~$ sudo named-checkconf
# No output = clean

infraunbook-admin@sw-infrarunbook-01:~$ sudo rndc reload
server reload successful

infraunbook-admin@sw-infrarunbook-01:~$ dig @192.168.1.10 cloudflare.com A
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 31881
;; ANSWER SECTION:
cloudflare.com. 299 IN A 104.16.132.229

Security note: Never set
allow-recursion { any; };
on a resolver with a public IP. Open resolvers are abused for DNS amplification DDoS attacks. Always restrict recursion to known RFC 1918 ranges or trusted client prefixes.

Root Cause 5: Broken Zone File

Why It Happens

Zone file syntax errors are one of the most common causes of resolution failure for internal names. A missing trailing dot on a fully qualified domain name, an incorrect or non-monotonic serial number, a syntax error introduced during a manual edit (missing parenthesis in the SOA record, extra whitespace in a field), a record type used incorrectly, or a corrupted dynamic DNS journal file can all prevent BIND from loading the zone. When a zone fails to load, all queries for names within that zone return SERVFAIL — not NXDOMAIN, because the server knows it should be authoritative but cannot serve the data.

How to Identify It

infrarunbook-admin@sw-infrarunbook-01:~$ dig @192.168.1.10 sw-infrarunbook-01.solvethenetwork.com A
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 23017

infraunbook-admin@sw-infrarunbook-01:~$ sudo journalctl -u named --since "10 min ago"
Apr 04 09:45:12 sw-infrarunbook-01 named[3104]: zone solvethenetwork.com/IN: loading from master file /etc/bind/zones/db.solvethenetwork.com failed: not at top of zone
Apr 04 09:45:12 sw-infrarunbook-01 named[3104]: zone solvethenetwork.com/IN: not loaded due to errors.

Use

named-checkzone

for a detailed report with line numbers:

infrarunbook-admin@sw-infrarunbook-01:~$ sudo named-checkzone solvethenetwork.com /etc/bind/zones/db.solvethenetwork.com
/etc/bind/zones/db.solvethenetwork.com:22: solvethenetwork.com: not at top of zone
zone solvethenetwork.com/IN: not loaded due to errors.

Inspect the zone file at and around line 22:

infrarunbook-admin@sw-infrarunbook-01:~$ sed -n '18,25p' /etc/bind/zones/db.solvethenetwork.com
$ORIGIN solvethenetwork.com.
$TTL 300
@   IN  SOA  ns1.solvethenetwork.com. infrarunbook-admin.solvethenetwork.com. (
                2026040401  ; serial
                3600        ; refresh
                900         ; retry
                604800      ; expire
                300 )       ; minimum TTL
@   IN  NS   ns1.solvethenetwork.com.
; Missing trailing dot on the CNAME target:
www IN  CNAME  solvethenetwork.com

The CNAME target

solvethenetwork.com

without a trailing dot is treated as a relative name, expanding to

solvethenetwork.com.solvethenetwork.com.

— a name outside the zone origin, triggering the

not at top of zone

error.

How to Fix It

Add the trailing dot and increment the serial number:

; Before:
www IN  CNAME  solvethenetwork.com

; After:
www IN  CNAME  solvethenetwork.com.

; Also increment serial from 2026040401 to 2026040402

Validate then reload the zone without a full named restart:

infrarunbook-admin@sw-infrarunbook-01:~$ sudo named-checkzone solvethenetwork.com /etc/bind/zones/db.solvethenetwork.com
zone solvethenetwork.com/IN: loaded serial 2026040402
OK

infraunbook-admin@sw-infrarunbook-01:~$ sudo rndc reload solvethenetwork.com
zone reload up-to-date

infraunbook-admin@sw-infrarunbook-01:~$ dig @192.168.1.10 sw-infrarunbook-01.solvethenetwork.com A
;; ANSWER SECTION:
sw-infrarunbook-01.solvethenetwork.com. 300 IN A 192.168.1.50

Root Cause 6: Stale Cache or Long Negative TTL

Why It Happens

DNS resolvers cache both positive responses (the record exists) and negative responses (NXDOMAIN — the record does not exist). If a record was deleted or changed but the old TTL has not expired, clients continue receiving the stale answer. Conversely, if a host was temporarily unreachable and NXDOMAIN was cached with a long negative TTL (controlled by the SOA

minimum

field per RFC 2308), clients receive NXDOMAIN long after the issue is resolved.

How to Identify It

infrarunbook-admin@sw-infrarunbook-01:~$ dig @192.168.1.10 sw-infrarunbook-01.solvethenetwork.com A
;; ANSWER SECTION:
sw-infrarunbook-01.solvethenetwork.com. 287 IN A 192.168.1.50

# TTL of 287 (counting down from 300) means this is a cached response
# Query the authoritative server directly with +norecurse to compare:
infraunbook-admin@sw-infrarunbook-01:~$ dig @192.168.1.10 sw-infrarunbook-01.solvethenetwork.com A +norecurse
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 91204

The resolver has a positive cached answer but the authoritative server now returns NXDOMAIN, confirming a stale positive cache entry.

How to Fix It

# Flush a specific name from the BIND cache:
infraunbook-admin@sw-infrarunbook-01:~$ sudo rndc flushname sw-infrarunbook-01.solvethenetwork.com

# Flush all cached data for a zone:
infraunbook-admin@sw-infrarunbook-01:~$ sudo rndc flush solvethenetwork.com

# Nuclear option — flush the entire resolver cache:
infraunbook-admin@sw-infrarunbook-01:~$ sudo rndc flush

# For systemd-resolved clients:
infraunbook-admin@sw-infrarunbook-01:~$ sudo resolvectl flush-caches

Root Cause 7: DNSSEC Validation Failure

Why It Happens

When DNSSEC validation is enabled on a resolver, any mismatch between the zone's RRSIG signatures and the DS records published in the parent zone causes the resolver to return SERVFAIL instead of the actual answer. This occurs after a KSK/ZSK rollover that was not properly coordinated with the parent zone, after zone signing configuration changes, or when the resolver host's system clock is skewed enough to fall outside a signature's validity window.

How to Identify It

infrarunbook-admin@sw-infrarunbook-01:~$ dig @192.168.1.10 solvethenetwork.com A +dnssec
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 9901

# Re-query with +cd (checking disabled) to bypass validation:
infraunbook-admin@sw-infrarunbook-01:~$ dig @192.168.1.10 solvethenetwork.com A +dnssec +cd
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 2213
;; ANSWER SECTION:
solvethenetwork.com. 300 IN A 192.168.1.100

The

+cd

flag bypasses DNSSEC validation and returns a result, confirming the data exists but validation is failing. Check resolver logs for the specific validation error:

infrarunbook-admin@sw-infrarunbook-01:~$ sudo journalctl -u named | grep -i "dnssec\|bogus\|validation"
Apr 04 10:02:31 sw-infrarunbook-01 named[3104]: validating solvethenetwork.com/DNSKEY: no valid signature found (DS)

How to Fix It

For internal zones where DNSSEC is not operationally required, disable validation for that specific zone:

zone "solvethenetwork.com" IN {
    type forward;
    forwarders { 192.168.1.10; };
    forward only;
    dnssec-validation no;
};

For production zones, re-sign the zone and re-publish the DS record in the parent to restore the chain of trust. Check the system clock skew first:

infrarunbook-admin@sw-infrarunbook-01:~$ timedatectl status
               Local time: Fri 2026-04-04 10:05:11 UTC
           Universal time: Fri 2026-04-04 10:05:11 UTC
     NTP service: active
NTP synchronized: yes

Root Cause 8: Forwarder Misconfiguration

Why It Happens

Most internal resolvers are configured to forward queries for unknown zones to upstream servers. If the forwarder IP is wrong, the upstream is unreachable, the upstream is returning errors, or the

forward only;

directive prevents fallback to root hints when the forwarder fails, all recursive lookups for external names will return SERVFAIL — even though the resolver daemon itself is running correctly and internal zones are loading fine.

How to Identify It

infrarunbook-admin@sw-infrarunbook-01:~$ dig @192.168.1.10 cloudflare.com A
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 77231

infraunbook-admin@sw-infrarunbook-01:~$ dig @192.168.1.10 sw-infrarunbook-01.solvethenetwork.com A
;; ANSWER SECTION:
sw-infrarunbook-01.solvethenetwork.com. 300 IN A 192.168.1.50

Internal names resolve; external names return SERVFAIL. This asymmetry almost always points to a broken forwarder. Check the configuration:

infrarunbook-admin@sw-infrarunbook-01:~$ sudo grep -A5 "forwarders" /etc/bind/named.conf.options
forwarders {
    172.16.99.99;   # This host does not exist on the network
};
forward only;

infraunbook-admin@sw-infrarunbook-01:~$ dig @172.16.99.99 cloudflare.com A
;; connection timed out; no servers could be reached

How to Fix It

infrarunbook-admin@sw-infrarunbook-01:~$ sudo nano /etc/bind/named.conf.options

forwarders {
    8.8.8.8;
    8.8.4.4;
};
forward only;

infraunbook-admin@sw-infrarunbook-01:~$ sudo named-checkconf && sudo rndc reload
server reload successful

infraunbook-admin@sw-infrarunbook-01:~$ dig @192.168.1.10 cloudflare.com A
;; ANSWER SECTION:
cloudflare.com. 299 IN A 104.16.132.229

Prevention

The majority of DNS resolution failures are preventable through configuration discipline, pre-deployment validation, redundancy, and active monitoring. Adopt the following practices to avoid outages before they happen:

Automate zone file validation in CI/CD. Run
named-checkconf
and
named-checkzone
as blocking gates in every pipeline that touches DNS configuration. A zone file error that passes peer review but fails
named-checkzone
should never reach production.
Deploy a minimum of two resolvers per site. A single-resolver environment means any service restart, host reboot, or configuration reload failure causes a total DNS outage. Configure both IPs in DHCP scope options and in each host's
/etc/resolv.conf
or network unit file.
Monitor resolver health with synthetic probes. Use Prometheus's
blackbox_exporter
DNS probe module to query a known-good name every 30 seconds and alert on SERVFAIL, NXDOMAIN, or response time exceeding 200ms.
Keep negative TTLs short. Set the SOA
minimum
field to 300–600 seconds. A value of 3600 means a misconfiguration that causes NXDOMAIN caching takes a full hour to self-heal after the fix is applied.
Restrict recursion to named subnets. Always use
allow-recursion
with explicit RFC 1918 ranges. Never use
allow-recursion { any; };
on any server reachable from outside your network perimeter.
Version-control all zone files. Store zone files in git with a commit per change. Each commit message should include the change reason and the new serial number, enabling rapid rollback and providing an audit trail.
Test both UDP and TCP port 53 after every firewall change. Large DNSSEC responses, zone transfers, and EDNS0-extended responses all require TCP. A post-change test of both protocols should be a mandatory step in every firewall change procedure.
Synchronize resolver system clocks with NTP. DNSSEC signatures have validity windows. Alert if clock drift on resolver hosts exceeds five seconds, and never let resolvers run without an NTP source.
Use
rndc zonestatus
as a post-deploy check. After any zone reload,
sudo rndc zonestatus solvethenetwork.com
immediately confirms the loaded serial and zone state without log scraping.

Symptoms

Root Cause 1: Resolver Down

Why It Happens

How to Identify It

How to Fix It

Root Cause 2: Wrong Nameserver Configured

Why It Happens

How to Identify It

How to Fix It

Root Cause 3: Firewall Blocking UDP/TCP Port 53

Why It Happens

How to Identify It

How to Fix It

Root Cause 4: Recursion Disabled

Why It Happens

How to Identify It

How to Fix It

Root Cause 5: Broken Zone File

Why It Happens

How to Identify It

How to Fix It

Root Cause 6: Stale Cache or Long Negative TTL

Why It Happens

How to Identify It

How to Fix It

Root Cause 7: DNSSEC Validation Failure

Why It Happens

How to Identify It

How to Fix It

Root Cause 8: Forwarder Misconfiguration

Why It Happens

How to Identify It

How to Fix It

Prevention

Related Articles

Frequently Asked Questions

What is the difference between SERVFAIL and NXDOMAIN?

Why does dig return the correct answer but ping or curl fails to resolve the same name?

How do I flush the DNS cache on a Linux host without restarting the resolver?

Why does an authoritative nameserver return REFUSED for some queries?

How can I confirm whether DNS traffic is going over UDP or TCP?

What does a missing trailing dot in a zone file actually cause?

How do I verify that my resolver is not an open resolver?

Secondary nameservers are not picking up zone changes. Where should I look?

Why do SSH connections hang for 30 seconds before succeeding or failing?

How do I test DNS from inside a container that has no dig or nslookup installed?

Related Articles