Why DNS Failures Are Uniquely Destructive
Most infrastructure components fail in isolation. A web server crash takes down a web server. A database timeout affects only queries against that database. DNS is fundamentally different. DNS is a shared dependency that sits beneath every layer of your stack—application connectivity, service discovery, certificate validation, authentication, logging, monitoring, and alerting. When DNS fails, it does not fail in isolation. It fails everywhere, simultaneously, and the symptoms look different in every system that depends on it.
At solvethenetwork.com, an internal DNS failure during a routine resolver migration triggered a cascade that took down LDAP authentication, broke the internal certificate authority's OCSP responder, silenced Prometheus scraping, and prevented SSH from resolving jump host names—all within 90 seconds of the resolver at 10.0.1.10 going dark. The root cause was a single misconfigured
named.confinclude path. The blast radius looked like a multi-system catastrophe.
Understanding DNS failure modes is not optional for infrastructure engineers. It is a prerequisite for incident response competency. This article dissects the specific ways DNS can fail, how each failure type propagates through dependent systems, and how to diagnose and recover from each scenario systematically.
The DNS Resolution Chain: Where Things Break
To understand DNS failure, you must first internalize the full resolution chain. When an application on sw-infrarunbook-01 wants to reach
api.solvethenetwork.com, the following sequence executes:
- The stub resolver on sw-infrarunbook-01 checks
/etc/hosts
for a static entry - The stub resolver sends a query to the configured recursive resolver at 10.0.1.10
- The recursive resolver checks its local cache for a valid, non-expired answer
- On a cache miss, the recursive resolver begins iterative resolution, querying a root nameserver
- The root refers the resolver to the
.com
TLD nameservers - The TLD nameservers refer the resolver to solvethenetwork.com's authoritative nameservers
- The authoritative nameserver returns the A or AAAA record for the query
- The recursive resolver caches the result for the record's TTL duration and returns it to sw-infrarunbook-01
- The application receives the IP address and initiates a TCP connection
A failure at any link in this chain produces a resolution failure at the application layer. The error message, timing, and observable behavior differ dramatically depending on where in the chain the break occurs. This is why DNS incidents are notoriously difficult to triage: the failure source is often three or four hops removed from the symptom.
Type 1: Recursive Resolver Unavailability
This is the most impactful single-point failure mode and the one that produces the most immediate, widespread symptoms. If the recursive resolver at 10.0.1.10 goes offline—whether due to a process crash, a network partition, an ACL change blocking port 53, or a firewall rule update—every host pointing to it for name resolution loses the ability to resolve any hostname entirely.
The stub resolver on an affected host will attempt to reach the resolver, wait for the configured timeout (typically 2–5 seconds per attempt), retry the configured number of times, and then return either a timeout error or
SERVFAILto the calling application. Most Linux systems configure this behavior in
/etc/resolv.conf. A standard configuration at solvethenetwork.com:
nameserver 10.0.1.10
nameserver 10.0.1.11
search solvethenetwork.com
options timeout:2 attempts:3 rotate
With this configuration, if 10.0.1.10 is down, the stub resolver will retry that resolver twice before failing over to 10.0.1.11. That fallback introduces 4–6 seconds of latency into every single DNS query until the primary is restored. For high-frequency service discovery systems—Kubernetes, Consul, or any microservice mesh—this latency budget is catastrophic. Health checks timeout. Circuit breakers trip. Cascading failures begin within seconds.
On systemd-based hosts,
systemd-resolvedmanages DNS and must itself be operational. It exposes a local stub at
127.0.0.53. If the process crashes, DNS fails even if your external resolvers are fully healthy, because the local listener is gone. Inspect its state with:
systemctl status systemd-resolved
resolvectl status
resolvectl query api.solvethenetwork.com
Type 2: Authoritative Nameserver Failure
When the authoritative nameservers for
solvethenetwork.comgo offline, recursive resolvers can initially continue serving answers—but only from their cache. Once cached records reach their TTL boundary and expire, any resolver that attempts to refresh the answer receives no response from the authoritative server. Without a valid answer to cache, the resolver returns
SERVFAILto all clients asking for that name.
This failure mode is delayed and gradual, which makes it deeply confusing during an active incident. Records with long TTLs (3600s or 86400s) continue resolving for hours or even days. Records with short TTLs (60s or 300s) begin failing within minutes. The result is an incident where some services work and others do not, with no obvious pattern—sending engineers down false trails while the actual authoritative failure sits waiting to be found.
You can check the current remaining TTL of any record to estimate how long cached answers will continue to be served before failures begin:
dig @10.0.1.10 api.solvethenetwork.com A +noall +answer
;; ANSWER SECTION:
api.solvethenetwork.com. 287 IN A 10.0.2.50
The
287is the remaining TTL in seconds. When it hits zero on a given resolver, any subsequent query from a client served by that resolver will fail until authoritative service is restored. Multiple resolvers cache records independently, so the failure onset will be staggered across your infrastructure.
Type 3: Zone Expiry on Secondary Nameservers
Secondary (replica) authoritative nameservers hold zone data copied from the primary via zone transfers. Each zone's SOA record contains an expire value—the maximum time a secondary will continue to serve zone data after losing contact with the primary. When this timer elapses without a successful zone transfer completing, the secondary stops answering authoritatively for that zone entirely and returns
SERVFAILfor all queries it receives.
A typical SOA record for the
solvethenetwork.comzone on sw-infrarunbook-01:
$TTL 3600
@ IN SOA ns1.solvethenetwork.com. infrarunbook-admin.solvethenetwork.com. (
2024040201 ; Serial
3600 ; Refresh - how often secondary checks for updates
900 ; Retry - how often secondary retries a failed refresh
604800 ; Expire - max time to serve data without contact
300 ) ; Negative cache TTL
With an expire value of 604800 (one week), a secondary will serve potentially stale data for up to seven days before going silent. This is intentional—trading freshness for availability during extended primary outages. But if the primary is unreachable for longer than the expire window and the secondary's zone data ages out, that nameserver stops responding for your zone entirely. Any resolver that selects it from your NS record set will receive
SERVFAIL.
Monitor zone transfer health proactively using
rndc:
rndc zonestatus solvethenetwork.com
zone solvethenetwork.com/IN: type secondary; serial 2024040201;
next refresh: Thu, 03 Apr 2026 14:30:00 GMT
expires: Thu, 10 Apr 2026 14:00:00 GMT
last refresh: successful
Compare serial numbers across all nameservers to detect silent replication failures before they become outages:
for ns in ns1.solvethenetwork.com ns2.solvethenetwork.com; do
echo -n "$ns serial: "
dig @$ns solvethenetwork.com SOA +short | awk '{print $3}'
done
Type 4: DNSSEC Validation Failure
DNSSEC adds cryptographic signatures to DNS responses and establishes a verifiable chain of trust from the root zone down to individual records. When a validating resolver receives a signed response, it must verify every signature in the chain. If any link is broken—an expired RRSIG, a missing DS record after a zone delegation change, a key rollover that was not coordinated with the parent zone, or a misconfigured NSEC chain—the validating resolver returns
SERVFAILto the client even though the authoritative server is healthy and returning correct data.
DNSSEC failures are among the most confusing DNS incidents because:
- Non-validating resolvers continue working normally—internal resolvers may succeed while external public resolvers fail
- The authoritative server appears completely healthy when queried directly
- Standard
dig
queries without+dnssec
may show clean answers that mask the validation failure - Users and monitoring systems see
SERVFAIL
with no obvious indication that DNSSEC is the root cause
To check whether DNSSEC validation is succeeding on the internal resolver at 10.0.1.10:
dig @10.0.1.10 solvethenetwork.com A +dnssec
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
The
ad(Authenticated Data) flag in the response confirms DNSSEC validation succeeded end-to-end. If you get
SERVFAILand the
adflag is absent, validation is failing. Check RRSIG record expiry directly:
dig @10.0.1.10 solvethenetwork.com RRSIG +dnssec +noall +answer
solvethenetwork.com. 3600 IN RRSIG A 13 2 3600 (
20260410120000 20260326120000 12345 solvethenetwork.com.
[base64 signature data] )
The two timestamps are signature expiry and inception respectively. If the current timestamp is past the expiry value, every validating resolver worldwide will reject responses for your zone. This is a full-zone outage for all clients using validating resolvers—which includes most modern public resolvers and ISP resolvers.
Type 5: Negative Caching and NXDOMAIN Propagation
When a resolver queries for a name that does not exist, the authoritative server returns
NXDOMAIN. Resolvers cache this negative response for the duration specified in the SOA record's minimum TTL field (the last value in the SOA—typically 300 to 3600 seconds). This negative caching is valuable under normal operations but becomes a failure mode in specific scenarios.
Common negative caching failure patterns at solvethenetwork.com:
- A new DNS record is added to the zone but a resolver already holds a cached
NXDOMAIN
that has not expired—clients on that resolver continue to receiveNXDOMAIN
even after the record is published - A deployment automation script briefly deletes and recreates a record during a zone update—any resolver that queried during the deletion window caches
NXDOMAIN
for up to the SOA minimum TTL - A service is deployed before its DNS record is created—clients fail, resolvers cache the failure, and even after the record is added clients must wait for the negative cache to expire
To force immediate resolution after a record is added or restored, flush the negative cache entry from the resolver:
# Flush a specific name from BIND's cache
rndc flushname api.solvethenetwork.com
# Flush the entire resolver cache (use with caution in production)
rndc flush
# Flush cache on a systemd-resolved host
resolvectl flush-caches
Type 6: Split-Horizon Misconfiguration
Split-horizon DNS (also called split-brain DNS) serves different answers for the same query depending on the source of the request. Internal clients receive RFC 1918 addresses for solvethenetwork.com hosts, while external clients receive public IPs. This is standard practice for enterprises with both internal services and public-facing infrastructure. When a split-horizon configuration breaks, the failure mode depends on which direction the misconfiguration goes.
A standard BIND view configuration on sw-infrarunbook-01:
acl internal_clients {
10.0.0.0/8;
172.16.0.0/12;
192.168.0.0/16;
};
view "internal" {
match-clients { internal_clients; };
zone "solvethenetwork.com" {
type primary;
file "/etc/bind/zones/solvethenetwork.com.internal";
};
};
view "external" {
match-clients { any; };
zone "solvethenetwork.com" {
type primary;
file "/etc/bind/zones/solvethenetwork.com.external";
};
};
If the
match-clientsACL is misconfigured—for example, if a subnet is missing from the internal ACL after a network expansion to a new
10.0.5.0/24range—hosts on that subnet are silently routed to the external view. They receive public IP addresses that may be unreachable from inside the network, or they receive records pointing to a load balancer that uses host-based routing and returns a different TLS certificate than expected. Applications fail with TLS certificate errors or connection timeouts, and engineers spend time investigating the application layer while the DNS misconfiguration goes unnoticed.
The Cascade: What Actually Breaks Across the Stack
A DNS failure does not just mean web browsing stops working. The following is a realistic impact map for a production environment at solvethenetwork.com during a full recursive resolver outage:
- Authentication and identity: Kerberos KDC discovery uses DNS SRV records (
_kerberos._tcp.solvethenetwork.com
). LDAP clients resolve directory server hostnames at connection time. When these fail, Active Directory joins, SSH PAM lookups, sudo authentication, and application OAuth flows all collapse simultaneously. - TLS and PKI: OCSP responders and CRL distribution points are resolved by DNS at certificate validation time. When certificate validation fails because the OCSP responder hostname cannot be resolved, HTTPS connections are rejected at the TLS handshake—the web server is fully operational but unreachable.
- Email delivery: MTA-to-MTA mail delivery requires MX record lookups. Inbound mail queues at sending servers. Outbound mail from sw-infrarunbook-01 cannot deliver to external domains. The mail queue grows silently until delivery timeout windows are reached and bounce messages are generated.
- Kubernetes and service mesh: CoreDNS is the resolver inside Kubernetes clusters. If CoreDNS becomes degraded or if the upstream resolver it depends on fails, pod-to-pod communication using service names breaks. Health checks fail, backends are drained from load balancers, and rolling deployments stall with pods unable to pass readiness probes.
- Monitoring and observability: Prometheus scrapes targets by hostname. If Prometheus cannot resolve
sw-infrarunbook-01.solvethenetwork.com
, it marks the target as down and stops collecting metrics—exactly when you need metrics most. Alerting rules that depend on those metrics stop firing, creating blind spots during the outage. - Log shipping: Fluentd, Logstash, and syslog forwarders resolve their destination aggregator by hostname at connection time and periodically during reconnects. When resolution fails, log agents buffer locally until configured disk limits are reached, at which point they either drop logs or cause application I/O pressure by blocking on write operations.
- CI/CD pipelines: Package managers—apt, dnf, pip, npm, cargo—resolve repository mirror hostnames via DNS. Builds fail at the dependency fetch stage, producing error messages that look like repository failures or network issues rather than DNS failures. Engineers investigating build failures often do not check DNS first.
- Backup and replication: Database replication, object storage sync, and backup agents resolve peer addresses by hostname. They fail silently and create data protection gaps that may not surface until the next restore test or disaster recovery drill.
Step-by-Step DNS Failure Diagnosis
When an incident is reported and DNS is a possible cause, follow this diagnostic sequence on sw-infrarunbook-01 or any affected host. Work from the bottom of the stack upward.
Step 1: Confirm DNS is the failure layer, not network or application
# Connect directly by IP, bypassing DNS entirely
curl -o /dev/null -s -w "%{http_code}" http://10.0.2.50/healthz
# If IP works but hostname fails, DNS is the culprit
curl -o /dev/null -s -w "%{http_code}" http://api.solvethenetwork.com/healthz
Step 2: Check resolver reachability on port 53
# Attempt a minimal query with a short timeout
dig @10.0.1.10 . SOA +time=2 +tries=1
# If this times out, the resolver is down or port 53 is blocked
nc -uzv 10.0.1.10 53
Step 3: Verify which resolver the host is actually using
cat /etc/resolv.conf
resolvectl status | grep -A5 "DNS Servers"
Step 4: Query the authoritative server directly, bypassing the resolver cache
# Identify authoritative nameservers
dig solvethenetwork.com NS +short
# Query authoritative directly to isolate resolver vs authoritative failure
dig @ns1.solvethenetwork.com api.solvethenetwork.com A +noall +answer
Step 5: Check DNSSEC validation status
dig @10.0.1.10 api.solvethenetwork.com A +dnssec
# Look for the "ad" flag and presence of RRSIG records
# SERVFAIL without ad flag = DNSSEC validation failure
Step 6: Check SOA serial consistency across all nameservers
for ns in ns1.solvethenetwork.com ns2.solvethenetwork.com; do
echo -n "$ns serial: "
dig @$ns solvethenetwork.com SOA +short | awk '{print $3}'
done
# Mismatched serials indicate zone transfer failure
Step 7: Check BIND service status and logs on sw-infrarunbook-01
systemctl status named
journalctl -u named --since "1 hour ago" --no-pager
tail -100 /var/log/named/named.log
Recovery Procedures by Failure Type
Resolver down: Immediately update
/etc/resolv.confon affected hosts to promote the secondary resolver at 10.0.1.11 to the primary position. For fleet-wide remediation, push the configuration change via your configuration management tooling. Do not wait for the primary to come back before doing this—every second of degraded resolver latency is generating cascading failures across dependent systems. Investigate the primary resolver's failure separately while the fleet operates on the secondary.
Authoritative server failure: Confirm the secondary nameserver is still serving valid data by querying it directly and comparing the returned serial with your expected value. Identify the authoritative records with the shortest TTLs—those are the ones that will fail first as resolver caches expire. Prioritize restoring authoritative service or adjusting delegation before those TTLs expire. If restoration will take longer than the shortest TTL, consider temporarily increasing TTLs on critical records from a surviving nameserver to buy time.
Zone expiry: Restore primary-secondary network connectivity, then trigger a manual zone transfer to refresh the secondary immediately:
rndc retransfer solvethenetwork.com
rndc zonestatus solvethenetwork.com
If the primary is permanently lost and you only have a copy of the zone on the secondary, promote the secondary to primary immediately by changing its zone type configuration and updating your NS glue records at the domain registrar.
DNSSEC validation failure: Re-sign the zone using BIND's inline signing commands. This requires access to the Zone Signing Key (ZSK) on the signing server:
rndc sign solvethenetwork.com
rndc loadkeys solvethenetwork.com
# Verify new RRSIGs are published with future expiry
dig @ns1.solvethenetwork.com solvethenetwork.com RRSIG +dnssec +short
If the signing key itself has been lost or compromised, initiate an emergency key rollover. This requires coordination with your parent zone registrar to update the DS record. Until the new DS record propagates, validating resolvers will continue to return
SERVFAIL.
Prevention: Building DNS Resilience
The most important prevention measure is geographic and topological diversity in nameserver placement. Never run both authoritative nameservers on the same subnet, the same physical rack, or the same availability zone. The probability of losing both 10.0.1.10 and a secondary on a completely separate network segment simultaneously is orders of magnitude lower than losing two servers on the same switch.
Additional resilience measures for solvethenetwork.com's DNS infrastructure:
- Monitor SOA serial consistency across all authoritative nameservers every five minutes and alert on divergence
- Alert on RRSIG expiry with a minimum seven-day warning window to allow time for key rotation without emergency pressure
- Set up synthetic DNS monitoring from at least two external vantage points that are independent of your internal infrastructure
- Keep
named.conf
, zone files, and resolver configurations in version control and deploy all changes through peer-reviewed automation - Use TSIG keys for all zone transfers to authenticate replication and prevent unauthorized zone data enumeration
- Document and rehearse your DNS recovery runbook quarterly—teams that have never practiced recovery are slow and error-prone during actual incidents
- Set a realistic SOA expire value: long enough to survive extended primary outages, short enough that secondary nameservers do not serve dangerously stale data indefinitely
Frequently Asked Questions
Q: What does SERVFAIL mean in DNS and how is it different from other error codes?
A: SERVFAIL (response code 2) means the server encountered an internal error and could not complete the query—but this tells you nothing about why. It is returned when a resolver cannot reach authoritative servers, when DNSSEC validation fails, when a zone has expired on a secondary, or when the resolver itself is misconfigured. NXDOMAIN (code 3) is fundamentally different: it means the name definitively does not exist according to an authoritative source. REFUSED (code 5) means the server received the query but has a policy-based reason to decline answering—typically an ACL blocking the client's IP. When you see SERVFAIL during an incident, treat it as a signal that something is broken in the resolution chain, not a description of what specifically is broken.
Q: How do I confirm DNS is the cause of an outage rather than a network or application issue?
A: Test connectivity to affected services using their IP addresses directly, bypassing DNS. If connecting by IP succeeds but connecting by hostname fails, DNS is the failure layer. This single test often saves 15–20 minutes of misdirected investigation. Use curl with an explicit IP and a Host header for HTTP services, or use ssh with an IP address to bypass DNS for remote access. If even IP connectivity fails, the issue is network-layer, not DNS.
Q: Why does a DNS failure take down authentication systems like SSH and LDAP?
A: Many authentication protocols use DNS for service discovery and endpoint resolution. Kerberos uses SRV records to locate Key Distribution Centers. LDAP clients resolve directory server hostnames when establishing connections. PAM modules performing reverse DNS lookups for logging or access control will block or fail if DNS is unavailable. SSH itself may perform reverse DNS lookups on connecting client IPs. Some of these behaviors can be disabled in configuration, but in a default enterprise setup, DNS unavailability reliably cascades into authentication failure within seconds.
Q: What is the difference between NXDOMAIN and SERVFAIL?
A: NXDOMAIN is an authoritative answer meaning the queried name genuinely does not exist in the zone. It comes from an authoritative nameserver and is a definitive negative response. SERVFAIL is a failure signal meaning the resolver was unable to obtain or validate an answer—it says nothing about whether the name exists. NXDOMAIN is expected and cached normally. SERVFAIL indicates a broken resolution path and should always be investigated. A common mistake during incidents is treating SERVFAIL as if it means the same thing as NXDOMAIN—it does not, and acting on that assumption delays finding the real cause.
Q: How long does DNS recovery take after an outage is resolved?
A: It depends on the failure type. Resolver-level failures can be mitigated in seconds by redirecting clients to a secondary resolver. Resolver cache poisoning from stale or incorrect data clears as TTLs expire—anywhere from seconds to 24 hours depending on record TTLs. Authoritative server failures resolved before TTL expiry are transparent to clients. NS record changes at the registrar level (needed if you change your authoritative nameservers) take up to 48 hours to propagate globally due to TLD nameserver TTLs. DNSSEC key rollover with DS record updates also follows the parent zone's TTL schedule. The fastest mitigation path is almost always at the resolver layer.
Q: What is DNS TTL and how does it affect the severity of an outage?
A: TTL (Time to Live) is the number of seconds a resolver should cache a DNS record before querying for a fresh copy. Short TTLs (60–300 seconds) allow rapid propagation of DNS changes but increase query volume against authoritative servers and reduce the grace period when those servers fail. Long TTLs (3600–86400 seconds) reduce load and provide a longer availability buffer during authoritative failures, but they mean DNS changes take much longer to propagate. During an active outage, long TTLs are your friend if the data is still correct—resolvers continue serving cached answers. They are your enemy if you need to rapidly update a record to restore service, because clients will keep using the old cached value until the TTL expires.
Q: Can DNS fail partially, affecting only some clients or some record types?
A: Yes, and partial failure is one of the most difficult diagnostic scenarios in DNS. Different records have different TTLs and expire at different times on different resolvers. DNSSEC failures affect only clients using validating resolvers. Split-horizon misconfigurations affect only clients on specific subnets. Negative caching failures affect only clients whose resolvers queried during a brief window when a record was absent. Zone transfer lag means secondary nameservers may have stale data while the primary has the correct answer. During any DNS incident, always query multiple resolvers from multiple source IPs to establish whether the failure is universal or scoped to a specific resolver, subnet, or client configuration.
Q: Why does DNSSEC make DNS failures harder to diagnose?
A: DNSSEC validation occurs invisibly inside the resolver. When it fails, the resolver returns SERVFAIL—the same code returned for every other resolution failure. There is no application-visible signal that DNSSEC specifically is the issue. To detect it, you must explicitly compare results between a validating resolver and a non-validating one, or use dig with the +dnssec flag and check for the ad flag and RRSIG records. The problem is compounded by the fact that internal resolvers are often non-validating while public resolvers validate—so internal testing passes while external users are completely blocked. Always test from both an internal resolver and an external validating resolver when investigating unexplained SERVFAIL responses.
Q: What is the DNS resolution order on Linux and how can it affect failure behavior?
A: Linux systems follow the order defined in
/etc/nsswitch.conf. The typical setting is
hosts: files dns, meaning
/etc/hostsis checked first before any DNS query is issued. Entries in
/etc/hostsalways win over DNS—which can be both a lifesaver during DNS outages (add a static override to restore critical service access) and a source of confusion when stale static entries override correct DNS data. On systemd-based systems,
systemd-resolvedhandles the actual DNS queries and adds an mDNS and LLMNR layer. The
resolveentry in nsswitch.conf routes through systemd-resolved's local socket at
127.0.0.53. If systemd-resolved is stopped, even correct DNS server configuration in
/etc/resolv.confmay not help if nsswitch.conf routes through it.
Q: How do I flush DNS cache on Linux during an incident?
A: The correct command depends on what is managing DNS resolution. For systemd-resolved, run
resolvectl flush-caches. For a local BIND instance acting as a caching resolver, run
rndc flushto flush all caches or
rndc flushname api.solvethenetwork.comto flush a specific name. For nscd (Name Service Cache Daemon), run
nscd -i hosts. For dnsmasq, send SIGHUP:
kill -HUP $(pidof dnsmasq). On hosts running nothing local, there is no client-side cache to flush—the cache lives in the resolver at 10.0.1.10 or 10.0.1.11. Note that flushing a resolver cache during an active authoritative outage just causes immediate SERVFAIL—do not flush until the underlying cause is resolved.
Q: What monitoring should be in place to detect DNS failures before users report them?
A: Synthetic DNS monitoring is essential: a monitoring agent external to your infrastructure should query all authoritative nameservers for critical records every 60 seconds and alert if any return SERVFAIL, NXDOMAIN for records that should exist, or incorrect answers. Pair this with SOA serial consistency checks comparing all nameservers every five minutes—divergence indicates zone transfer failure. For DNSSEC, monitor RRSIG expiry dates with alerts at 14 days and 7 days before expiry. Monitor resolver availability by querying known-stable records (like the root SOA) against each resolver every 30 seconds. Finally, track resolver query latency as a metric—a spike in mean latency often precedes a full resolver failure and gives you an early warning window.
Q: What are TSIG keys and why do they matter for DNS security and resilience?
A: TSIG (Transaction SIGnatures) are shared-secret HMAC-based keys used to authenticate DNS messages, most importantly zone transfers. Without TSIG authentication on zone transfers, any host that can reach your authoritative server on port 53 can request a complete copy of your zone data—every hostname, IP address, internal service name, and mail record in the zone. This is a significant reconnaissance capability for an attacker. TSIG also ensures that zone transfer data has not been tampered with in transit. Configure TSIG on sw-infrarunbook-01 by generating a key with
tsig-keygen, adding it to
named.confon both primary and secondary, and setting
allow-transferto require the key. Zone transfers that do not present the correct HMAC are rejected, protecting both confidentiality and integrity of your zone data.
