Symptoms
You make a DNS query and get SERVFAIL back. Nothing else. No NXDOMAIN, no partial answer — just a flat refusal from a resolver that won't hand you the record. Meanwhile, if you query the authoritative nameserver directly, or disable DNSSEC validation with
+cd(checking disabled), the record appears immediately. That gap between the validating resolver and the authoritative server is the signature of a DNSSEC validation failure.
Here's what it looks like against the validating resolver at 192.168.1.53:
$ dig solvethenetwork.com A @192.168.1.53
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 28471
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
Now disable validation with
+cd:
$ dig +cd solvethenetwork.com A @192.168.1.53
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 28472
;; flags: qr rd ra cd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; ANSWER SECTION:
solvethenetwork.com. 300 IN A 192.168.10.45
When
+cdmakes the query work but removing it breaks it, you have a DNSSEC problem. The resolver is seeing records but refusing to return them because something in the cryptographic chain doesn't check out. You might also see explicit messages in the resolver logs on sw-infrarunbook-01:
Apr 16 09:12:44 sw-infrarunbook-01 named[3421]: validating solvethenetwork.com/A: no valid signature found
Apr 16 09:12:44 sw-infrarunbook-01 named[3421]: RRSIG has expired: solvethenetwork.com/A
Let's go through each root cause systematically.
Root Cause 1: RRSIG Expired
This is the most common cause I've seen in production environments. An RRSIG record has a validity window defined by two timestamps: Signature Inception and Signature Expiration. Once the expiration timestamp passes, any validating resolver that receives that RRSIG will reject it — even if the underlying cryptographic signature is mathematically correct. A valid signature that's expired is, from DNSSEC's perspective, simply not valid.
This failure mode almost always traces back to broken automation. The cron job on sw-infrarunbook-01 that runs your zone signing script stopped working after a package update. BIND's inline signing is configured but the key directory permissions changed. The signing key itself expired and wasn't replaced. Whatever the trigger, the zone stopped getting re-signed and the RRSIGs aged out quietly while everything looked fine from the authoritative side.
How to Identify It
$ dig +dnssec +multi solvethenetwork.com A @192.168.1.10
;; ANSWER SECTION:
solvethenetwork.com. 300 IN A 192.168.10.45
solvethenetwork.com. 300 IN RRSIG A 13 2 300 (
20260301120000 ; expiration
20260201120000 ; inception
12345 solvethenetwork.com.
abc123XYZ== )
Check the expiration timestamp. In this output, the RRSIG expired on March 1st, 2026. If today is April 16th, every validating resolver on the planet is rejecting it. Use
delvfor a cleaner read:
$ delv @192.168.1.10 solvethenetwork.com A
;; validating solvethenetwork.com/A
;; RRSIG solvethenetwork.com/A:
;; Algorithm: ECDSAP256SHA256
;; Signature expiration: 2026-03-01 12:00:00 UTC
;; Signature inception: 2026-02-01 12:00:00 UTC
;; validation failed: RRSIG has expired
How to Fix It
If you're running BIND with inline signing, force an immediate re-sign:
[infrarunbook-admin@sw-infrarunbook-01 ~]$ rndc sign solvethenetwork.com
zone solvethenetwork.com/IN (signed): loaded serial 2026041601
If you're using
dnssec-signzonemanually, re-sign and reload:
$ dnssec-signzone -A -3 $(head -c 16 /dev/urandom | xxd -ps) \
-N INCREMENT -o solvethenetwork.com \
-t /etc/bind/zones/solvethenetwork.com.zone
Verifying the zone using the following algorithms: ECDSAP256SHA256
Zone fully signed:
Algorithm: ECDSAP256SHA256: KSKs: 1 active, 0 stand-by, 0 revoked
ZSKs: 1 active, 0 stand-by, 0 revoked
/etc/bind/zones/solvethenetwork.com.zone.signed
[infrarunbook-admin@sw-infrarunbook-01 ~]$ rndc reload solvethenetwork.com
For long-term stability, set your RRSIG validity window to 14 days and re-sign every 7 days. That gives you a full week to detect and fix automation failures before signatures expire. Monitor RRSIG expiry actively — by the time users are reporting outages, you're already in the damage window.
Root Cause 2: DNSKEY Not Matching DS
DS records live in the parent zone. For solvethenetwork.com, that means the .com TLD zone. A DS record is essentially a hash of your Key Signing Key (KSK), and it's what establishes the link between the parent zone's trust and your zone's DNSKEY RRset. When you roll your KSK and forget to update the DS at your registrar — or when a DS record points to a key that no longer exists — validation breaks at exactly that delegation boundary.
In my experience, this happens most often during rushed KSK rollovers. Someone generates new keys, signs the zone, but the DS submission to the registrar slips through the cracks. Or the opposite: the DS at the registrar was updated but the old KSK was removed from the zone before the DS propagated. Both directions cause the same failure.
How to Identify It
Pull the DS record from the parent and compare it against what your zone is serving:
$ dig DS solvethenetwork.com @a.gtld-servers.net +short
37551 13 2 8F6A4A1D3E91B7C245FD7AC10E88E5BA4A7D99F01BDCE78C3F892D02A3B1C4E5
$ dig DNSKEY solvethenetwork.com @192.168.1.10 +short
257 3 13 mdsswUyr3DPW132mOi8V9xESWE8jTo0dxCjjnopKl+GqJxpVXckHAeF+KFit/093YHHN9SgHHE6SEBzmApILJ2Q==
Now compute the DS hash of your current DNSKEY and compare:
$ dnssec-dsfromkey -a SHA-256 Ksolvethenetwork.com.+013+37551.key
solvethenetwork.com. IN DS 37551 13 2 AABB1122...
If the digest doesn't match what the parent is serving, you have a mismatch.
delvwill also tell you directly:
$ delv +vtrace solvethenetwork.com A @192.168.1.10
;; fetch: solvethenetwork.com/DNSKEY
;; validating solvethenetwork.com/DNSKEY: no DS record found matching DNSKEY
;; validating solvethenetwork.com/A: no valid signature found
;; resolution failed: no valid DS
How to Fix It
Submit the correct DS record to your registrar. The values come from the key file itself:
$ dnssec-dsfromkey -a SHA-256 Ksolvethenetwork.com.+013+37551.key
solvethenetwork.com. IN DS 37551 13 2 8F6A4A1D3E91B7C245FD7AC10E88E5BA4A7D99F01BDCE78C3F892D02A3B1C4E5
Give the registrar: key tag (37551), algorithm (13), digest type (2), and the digest. DS propagation can take several hours to a day depending on the TLD operator's refresh cycle. During a KSK rollover, always maintain the old KSK in the zone until the new DS has propagated and old DS TTLs have fully expired across the internet — never remove the old key prematurely.
Root Cause 3: Chain of Trust Broken
DNSSEC operates on a chain that starts at the root zone, whose trust anchor is hard-coded into every validating resolver, and flows downward through each delegation. The root signs .com's DS, .com signs solvethenetwork.com's DS, and solvethenetwork.com signs its own records. Break any link in that chain and validation fails for everything below the break point.
Chain breaks are particularly common during zone migrations. A domain moves to a new DNS provider, the new provider doesn't carry over DNSSEC configuration and serves an unsigned zone, but the DS record from the previous operator is still sitting in the .com zone. The resolver walks down from the root, finds a DS record for solvethenetwork.com, queries the zone for DNSKEY records to match against it, finds nothing, and SERVFAIL. The fix is obvious in retrospect; in production at 2 AM, less so.
How to Identify It
Use
drillwith the
-Sflag to trace the full chain:
$ drill -S solvethenetwork.com A @192.168.1.1
;; Number of trusted keys: 1
;; Chasing: solvethenetwork.com. A
DNSSEC Trust tree:
solvethenetwork.com. (A)
|---solvethenetwork.com. (DNSKEY keytag: 37551 alg: 13 flags: 257)
|---solvethenetwork.com. (DS keytag: 37551 digest type: 2)
|---com. (DNSKEY keytag: 4534 alg: 8 flags: 257)
|---com. (DS keytag: 4534 digest type: 2)
|---. (DNSKEY keytag: 20326 alg: 8 flags: 257)
;; Chase successful
A broken chain will drop out partway through — you'll see the tree stop where the link is broken and an error instead of "Chase successful." You can also use
delvwith verbose tracing to pinpoint the exact delegation where validation fails:
$ delv +vtrace +multi solvethenetwork.com A @192.168.1.53
;; fetch: solvethenetwork.com/DNSKEY
;; fetch: solvethenetwork.com/DS
;; validating solvethenetwork.com/DS: starting
;; validating com/DNSKEY: starting
;; error (no valid DS) resolving 'solvethenetwork.com/DS/IN': 192.168.1.53#53
;; resolution failed: no valid DS
"No valid DS" at the solvethenetwork.com/DS level means the break is between .com and your zone. No DS in .com at all means DNSSEC is effectively disabled for the zone from the parent's perspective, but if an old DS exists pointing nowhere, that's a hard validation failure.
How to Fix It
Identify which delegation is broken and fix it at that level. If the zone is unsigned at the new provider and the old DS is stale in the parent, you have two options: sign the zone at the new provider and submit fresh DS records, or remove the stale DS from the parent entirely. Removing DS disables DNSSEC for the zone but stops the SERVFAIL immediately — it's the emergency lever. Re-signing is the correct solution.
To get a zone signed and publishing DNSKEY quickly on sw-infrarunbook-01:
# Generate ZSK and KSK
[infrarunbook-admin@sw-infrarunbook-01 ~]$ dnssec-keygen -a ECDSAP256SHA256 -n ZONE solvethenetwork.com
[infrarunbook-admin@sw-infrarunbook-01 ~]$ dnssec-keygen -a ECDSAP256SHA256 -n ZONE -f KSK solvethenetwork.com
# Add to named.conf zone block
# dnssec-policy "default";
[infrarunbook-admin@sw-infrarunbook-01 ~]$ rndc reconfig
[infrarunbook-admin@sw-infrarunbook-01 ~]$ rndc sign solvethenetwork.com
zone solvethenetwork.com/IN (signed): loaded serial 2026041602
Then publish the new DS records at the registrar and wait for propagation before considering the chain restored.
Root Cause 4: NSEC3 Denial Wrong
NSEC3 is the mechanism DNSSEC uses to prove nonexistence — it lets a resolver cryptographically verify that a queried name genuinely doesn't exist in the zone without exposing the full zone contents (which is what NSEC, the older alternative, would do). When NSEC3 records are malformed, cover the wrong hash ranges, or use parameters inconsistent with the published NSEC3PARAM record, validation fails specifically for NXDOMAIN responses.
The pattern that makes this identifiable: queries for names that do exist validate cleanly, but queries for nonexistent names SERVFAIL. I've seen this appear after zone re-signing where the NSEC3PARAM salt was regenerated with a different value, creating a mismatch between the published parameters and what was actually used to construct the denial chain. It also shows up when NSEC3 coverage has gaps — segments of the hash space that aren't covered by any NSEC3 record, leaving the resolver unable to verify denial for names hashing into that range.
How to Identify It
$ dig +dnssec nonexistent.solvethenetwork.com @192.168.1.53
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 55221
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
Compare against a query for a name that exists:
$ dig +dnssec www.solvethenetwork.com @192.168.1.53
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 55222
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
Existing names work with the AD flag set; nonexistent names SERVFAIL. That asymmetry points directly at NSEC3. Query the authoritative directly with
+cdto see the NSEC3 records the resolver is receiving:
$ dig +dnssec nonexistent.solvethenetwork.com @192.168.1.10 +cd
;; AUTHORITY SECTION:
solvethenetwork.com. 300 IN SOA ns1.solvethenetwork.com. hostmaster.solvethenetwork.com. 2026041601 3600 900 604800 300
a1b2c3d4e5f6g7h8.solvethenetwork.com. 300 IN NSEC3 1 0 10 AABBCCDD (
f9g8h7i6j5k4l3m2.solvethenetwork.com.
A NS SOA MX AAAA RRSIG DNSKEY NSEC3PARAM )
a1b2c3d4e5f6g7h8.solvethenetwork.com. 300 IN RRSIG NSEC3 13 2 300 ...
$ dig NSEC3PARAM solvethenetwork.com @192.168.1.10 +short
1 0 10 AABBCCDD
Check whether the queried name's hash actually falls within the covered range of an NSEC3 record:
$ nsec3hash AABBCCDD 1 10 nonexistent.solvethenetwork.com
nonexistent.solvethenetwork.com. -> v8p6q3m2n1k4j5h6.solvethenetwork.com.
That hash should fall between the owner name of one NSEC3 record and the next owner name in the chain. If it doesn't — if there's a gap in the NSEC3 chain that should cover this hash — you've found the problem.
How to Fix It
Re-sign the zone with consistent, explicit NSEC3 parameters. If you're using a BIND dnssec-policy, pin the parameters:
dnssec-policy "solvethenetwork-policy" {
nsec3param iterations 10 optout no salt-length 8;
keys {
ksk key-directory lifetime unlimited algorithm ecdsap256sha256;
zsk key-directory lifetime P90D algorithm ecdsap256sha256;
};
};
Force a zone re-sign:
[infrarunbook-admin@sw-infrarunbook-01 ~]$ rndc sign solvethenetwork.com
zone solvethenetwork.com/IN (signed): loaded serial 2026041603
After re-signing, verify the NSEC3 chain is complete using
dnssec-checkzone:
[infrarunbook-admin@sw-infrarunbook-01 ~]$ dnssec-checkzone -o solvethenetwork.com \
/etc/bind/zones/solvethenetwork.com.zone.signed
zone solvethenetwork.com/IN: loaded serial 2026041603
OK
Root Cause 5: Resolver Not DNSSEC-Aware
This is the inverse problem. Instead of a validation failure causing SERVFAIL, you have a resolver that isn't validating at all — it's either not setting the DO (DNSSEC OK) bit in queries, stripping DNSSEC records from responses, or accepting answers without verifying signatures. You won't see SERVFAIL here; you'll see NOERROR answers that lack the
ad(Authenticated Data) flag, meaning the resolver is handing you records it never actually verified.
This matters because applications and security tools that depend on DNSSEC — DANE/TLSA for certificate validation, SSHFP record verification, or anything relying on the authenticated data flag for trust decisions — will silently fail to get the guarantees they expect. A non-validating resolver is worse than no DNSSEC in some ways: it creates false confidence that the infrastructure is secure.
How to Identify It
The tell is the absence of the
adflag in the response header:
$ dig +dnssec solvethenetwork.com A @192.168.1.53
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 7721
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; ANSWER SECTION:
solvethenetwork.com. 300 IN A 192.168.10.45
The flags are
qr rd ra. No
adflag. Compare this against a known-validating resolver:
$ dig +dnssec solvethenetwork.com A @1.1.1.1
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
The
adflag is present from the external resolver, absent from your internal one. Also check whether DNSSEC records are being requested and returned at all:
$ dig solvethenetwork.com DNSKEY @192.168.1.53
# If no RRSIG records appear alongside the DNSKEY answer, the resolver
# either isn't requesting them (DO bit not set) or is stripping them
You can also use a purpose-built test: the domain
sigfail.verteiltesysteme.nethas intentionally broken DNSSEC. A validating resolver will SERVFAIL it; a non-validating resolver will return NOERROR.
$ dig sigfail.verteiltesysteme.net @192.168.1.53
# NOERROR = resolver is NOT validating
# SERVFAIL = resolver IS validating (correct behavior)
How to Fix It
Enable DNSSEC validation in your resolver configuration. For BIND on sw-infrarunbook-01:
options {
dnssec-validation auto;
// "auto" uses the built-in managed root trust anchor
// Requires BIND 9.9+ and that managed-keys-directory is writable
managed-keys-directory "/var/named/dynamic";
};
For Unbound:
server:
module-config: "validator iterator"
auto-trust-anchor-file: "/var/lib/unbound/root.key"
After updating configuration:
[infrarunbook-admin@sw-infrarunbook-01 ~]$ systemctl restart named
# Confirm validation is now active
[infrarunbook-admin@sw-infrarunbook-01 ~]$ dig sigfail.verteiltesysteme.net @127.0.0.1
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 44312
That SERVFAIL confirms the resolver is now validating correctly.
Root Cause 6: Clock Skew
RRSIG inception and expiration timestamps are compared against the resolver's wall clock. If the system clock on sw-infrarunbook-01 drifts significantly — even by a few minutes in extreme cases, but usually needing to be off by more before causing failures — signatures that are cryptographically valid will appear expired or not-yet-valid. VM clock drift after a snapshot restore, NTP misconfiguration, and hardware clock failures are the usual culprits.
How to Identify It
[infrarunbook-admin@sw-infrarunbook-01 ~]$ timedatectl
Local time: Wed 2026-04-16 09:15:22 UTC
Universal time: Wed 2026-04-16 09:15:22 UTC
System clock synchronized: no
NTP service: inactive
[infrarunbook-admin@sw-infrarunbook-01 ~]$ chronyc tracking
Reference ID : C0A80101 (192.168.1.1)
Stratum : 3
Ref time (UTC) : Wed Apr 16 09:00:00 2026
System time : 1847.293 seconds fast of NTP time
1847 seconds fast is over 30 minutes of drift. From the resolver's perspective, signatures whose expiration is still 30 minutes away look already expired. Or worse — signatures whose inception is 20 minutes in the past look "not yet valid" because the resolver's clock thinks it's 10 minutes in the future.
How to Fix It
[infrarunbook-admin@sw-infrarunbook-01 ~]$ systemctl enable --now chronyd
[infrarunbook-admin@sw-infrarunbook-01 ~]$ chronyc makestep
200 OK
[infrarunbook-admin@sw-infrarunbook-01 ~]$ chronyc tracking
Reference ID : C0A80101 (192.168.1.1)
System time : 0.000134 seconds fast of NTP time
makestepforces an immediate clock jump rather than gradually slewing toward the correct time. DNSSEC validation typically resumes within seconds of the clock correction. After recovery, verify NTP stays synchronized by adding clock sync monitoring to your infrastructure checks.
Root Cause 7: Unsupported Signing Algorithm
DNSSEC supports multiple signing algorithms — RSA/SHA-256 (algorithm 8), ECDSA P-256/SHA-256 (algorithm 13), Ed25519 (algorithm 15), and others. Older resolver implementations may not support newer algorithms. If you've migrated your signing from RSA to Ed25519 but your resolver is running a version of BIND that predates Ed25519 support (BIND 9.12 and earlier), it will encounter DNSKEY records with an algorithm it doesn't recognize and treat the signatures as unverifiable.
How to Identify It
$ dig DNSKEY solvethenetwork.com @192.168.1.10 +short
257 3 15 l02Woi0iS8Nn3DihHb+AZLzBFBY4dHalfA7pkP5wkzY=
Apr 16 10:01:12 sw-infrarunbook-01 named[3421]: validating solvethenetwork.com/A: no supported algorithm/digest
Apr 16 10:01:12 sw-infrarunbook-01 named[3421]: algorithm 15 (ED25519) is not supported
[infrarunbook-admin@sw-infrarunbook-01 ~]$ named -V | head -1
BIND 9.11.5-P4 (Extended Support Version)
BIND 9.11 doesn't support algorithm 15. The resolver can't validate signatures made with Ed25519 and fails the entire zone.
How to Fix It
Upgrade BIND to 9.16 or later, which added full support for algorithms 15 (Ed25519) and 16 (Ed448). If an upgrade isn't immediately possible, re-sign the zone using a universally supported algorithm:
[infrarunbook-admin@sw-infrarunbook-01 ~]$ dnssec-keygen -a ECDSAP256SHA256 -n ZONE solvethenetwork.com
[infrarunbook-admin@sw-infrarunbook-01 ~]$ dnssec-keygen -a ECDSAP256SHA256 -n ZONE -f KSK solvethenetwork.com
ECDSAP256SHA256 (algorithm 13) has excellent support across all modern resolvers including BIND 9.9+, Unbound 1.5+, and all major public resolvers. During the algorithm rollover, publish both old and new DNSKEY records and sign with both algorithms simultaneously before removing the old algorithm — this ensures resolvers that only support one algorithm or the other can still validate during the transition period.
Prevention
Most DNSSEC failures are preventable with the right combination of automation, monitoring, and operational discipline. RRSIG expiry — the most frequent cause by a wide margin — disappears entirely when you have working automated re-signing and active monitoring of signature lifetimes.
Set your RRSIG validity window to 14 days and re-sign every 7 days. That gives you a week of runway if automation breaks. Then monitor it:
[infrarunbook-admin@sw-infrarunbook-01 ~]$ dig +dnssec +multi solvethenetwork.com SOA @127.0.0.1 | \
awk '/RRSIG SOA/{getline; print "RRSIG expiry:", $1}'
RRSIG expiry: 20260430120000
Build that into your monitoring system with an alert threshold of 7 days before expiry. By the time RRSIG has expired, it's an outage — you want to catch it as a warning, not an incident.
Run
dnssec-checkzonein your zone deployment pipeline. Make it a gating step that prevents unsigned or incorrectly signed zones from being loaded into production:
[infrarunbook-admin@sw-infrarunbook-01 ~]$ dnssec-checkzone solvethenetwork.com \
/etc/bind/zones/solvethenetwork.com.zone.signed
zone solvethenetwork.com/IN: loaded serial 2026041601
OK
For key rollovers, follow the RFC 6781 rollover procedures. Double-signature rollover for ZSKs: publish the new ZSK alongside the old, wait one TTL for it to propagate, start signing with the new ZSK, wait another TTL, then remove the old ZSK. For KSK rollovers: publish the new KSK, wait for propagation, submit the new DS to the registrar, wait for DS propagation and TTL expiry, then remove the old KSK. Never skip steps and never rush the timing.
Keep NTP synchronized on every resolver in your infrastructure. This sounds basic, but I've traced DNSSEC failures back to clock drift on VMs restored from snapshots where NTP was disabled or lost its configuration. Make NTP synchronization a monitored service, not an assumption.
Finally, test DNSSEC validation from multiple vantage points regularly. A signing algorithm unsupported by one resolver might work fine from another. Synthetic monitoring that queries your zone from external resolvers — not just your internal infrastructure — will catch validation failures that only affect specific resolver implementations or ISP deployments, giving you the full picture before users do.
