Symptoms
You pushed a config change to BIND9 — added a zone, adjusted a forwarder, imported a new key — ran
systemctl restart named, and now DNS is down. The service exits immediately or refuses to start. Monitoring is alerting. Clients can't resolve anything.
What you'll typically see from
systemctl status named:
● named.service - Berkeley Internet Name Domain (DNS)
Loaded: loaded (/lib/systemd/system/named.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Thu 2026-04-17 14:23:11 UTC; 4s ago
Process: 18342 ExecStart=/usr/sbin/named -f -u bind (code=exited, status=1/FAILURE)
Main PID: 18342 (code=exited, status=1/FAILURE)
Or you ran
rndc reloadand got a vague error with no zones updating. In some cases BIND starts but silently skips the broken zone — the service looks healthy, but your domain is unresolvable. Either way, something is wrong and you need to find it fast.
Here are the most common culprits, in the order I'd check them on a live system.
Root Cause 1: Syntax Error in named.conf
Why It Happens
This is the most frequent reason BIND9 refuses to start after a config change.
named.confand its included files use a strict semicolon-terminated block syntax that does not forgive a single missing
;, an unclosed brace, or a misspelled directive. A missing semicolon at the end of an options block can break the entire configuration. I've seen this happen after a copy-paste from a browser that silently swapped in a Unicode curly-quote instead of a straight one — the file looks fine in your terminal but BIND rejects it at parse time.
How to Identify It
BIND ships with
named-checkconfprecisely for this. Run it before every reload:
named-checkconf /etc/bind/named.conf
If there's a syntax problem, you'll see output pointing directly at the offending line:
/etc/bind/named.conf:47: missing ';' before '}'
/etc/bind/named.conf.local:12: unknown option 'forwarders'
Also check the system journal for BIND's own startup error output:
journalctl -u named --since "10 minutes ago" | grep -i error
Which might show:
Apr 17 14:23:11 sw-infrarunbook-01 named[18342]: /etc/bind/named.conf.options:23: bad acl name '10.10.0.0'
How to Fix It
Go to the reported line number, read the surrounding block carefully, and fix the syntax. Common traps include missing semicolons after closing braces, ACL names containing slashes that need quoting, and options that only apply to a newer BIND version than what's installed on the system. After fixing, validate before restarting:
named-checkconf /etc/bind/named.conf && echo "Config OK"
No output and exit code 0 means you're clean. Then restart the service:
systemctl restart named
systemctl status named
Root Cause 2: Zone File Syntax Error
Why It Happens
Zone files have their own syntax entirely separate from
named.conf, and
named-checkconfwon't catch errors inside zone file content. The SOA record format is notoriously strict: wrong number of fields, a missing trailing dot on a fully-qualified domain name, an invalid TTL value, or a misspelled resource record type will all cause BIND to refuse to load that zone. By default, a failed zone load prevents BIND from starting entirely unless you've explicitly configured it to tolerate zone errors.
The trailing dot issue trips up even experienced engineers. If you write
mail.solvethenetwork.comwithout the trailing dot in a zone file where the origin is
solvethenetwork.com, BIND reads it as the relative name and expands it to
mail.solvethenetwork.com.solvethenetwork.com— which is almost certainly not what you intended.
How to Identify It
Use
named-checkzone— this is the right tool for zone file validation:
named-checkzone solvethenetwork.com /etc/bind/zones/db.solvethenetwork.com
A valid zone file returns:
zone solvethenetwork.com/IN: loaded serial 2026041701
OK
A broken one returns something like:
zone solvethenetwork.com/IN: solvethenetwork.com/MX 'mail.solvethenetwork.com' (out of zone) is not within the zone
zone solvethenetwork.com/IN: loading from master file /etc/bind/zones/db.solvethenetwork.com failed: out of zone data
zone solvethenetwork.com/IN: not loaded due to errors.
Also check the journal for zone-specific errors:
journalctl -u named -n 50 | grep -E "error|failed|zone"
How to Fix It
Open the zone file, go to the reported line, and fix the record. Add trailing dots to any FQDN that's meant to be absolute. Verify the SOA record fields are in the correct order: serial, refresh, retry, expire, negative cache TTL. Correct any MX record that's missing its priority value.
After editing, re-validate:
named-checkzone solvethenetwork.com /etc/bind/zones/db.solvethenetwork.com
While you're in there, increment the SOA serial. If you corrected an existing record without bumping the serial, secondary nameservers won't pull the updated zone — they'll assume they already have the current copy.
Root Cause 3: Port 53 Already in Use
Why It Happens
BIND needs to bind to port 53 on both UDP and TCP. If anything else is already listening there, BIND fails at startup with an address-in-use error. On modern Ubuntu and Debian systems this is almost always
systemd-resolvedholding port 53 on 127.0.0.53. On RHEL and CentOS it's sometimes a stale
namedprocess that didn't terminate cleanly, or another DNS daemon —
dnsmasqor
unbound— running alongside BIND. I've seen this happen after a botched migration where both
bind9and
unboundwere installed and neither was disabled.
How to Identify It
Check what's currently holding port 53:
ss -tulpn | grep ':53'
Output that shows something other than
namedon port 53:
udp UNCONN 0 0 127.0.0.53%lo:53 0.0.0.0:* users:(("systemd-resolve",pid=612,fd=18))
tcp LISTEN 0 4096 127.0.0.53%lo:53 0.0.0.0:* users:(("systemd-resolve",pid=612,fd=19))
BIND's own error message in the journal for this condition looks like:
Apr 17 14:23:11 sw-infrarunbook-01 named[18342]: couldn't add command channel 127.0.0.1#953: address in use
Apr 17 14:23:11 sw-infrarunbook-01 named[18342]: creating IPv4 socket: address in use
How to Fix It
If the offender is
systemd-resolved, disable its stub resolver. Edit
/etc/systemd/resolved.conf:
[Resolve]
DNSStubListener=no
Then restart resolved and update
/etc/resolv.confto point at your BIND instance:
systemctl restart systemd-resolved
ln -sf /run/systemd/resolve/resolv.conf /etc/resolv.conf
If the offender is another DNS daemon, stop and disable it before starting BIND:
systemctl stop dnsmasq
systemctl disable dnsmasq
Don't just kill the process with
-9without understanding why it's still running. It could indicate a systemd unit conflict, a missing
ExecStop, or a second service definition that re-spawns the daemon on termination.
Root Cause 4: Wrong Permissions on Zone File
Why It Happens
BIND runs as the
binduser on Debian and Ubuntu, and as the
nameduser on RHEL and CentOS. If your zone files aren't readable by that user, BIND can't load them. This happens most often when someone copies zone files as root, creates new zone files using a root-owned text editor session, or restores from a backup that didn't preserve file ownership. It also surfaces when zone files are stored on a filesystem mounted with restrictive options, or when a config management tool creates files before the service user is created on a fresh build.
Dynamic zones add another wrinkle: BIND needs write access to the zone directory so it can create and update journal files (
.jnl). Read-only zone files work fine for static zones but will cause dynamic update failures even if the zone loads at startup.
How to Identify It
Check ownership and permissions on your zone files:
ls -la /etc/bind/zones/
total 24
drwxr-xr-x 2 root root 4096 Apr 17 14:10 .
drwxr-xr-x 8 root bind 4096 Apr 17 14:08 ..
-rw-r--r-- 1 root root 1243 Apr 17 14:10 db.solvethenetwork.com
-rw-r--r-- 1 root root 876 Apr 16 09:42 db.10.10.0
Files owned by root with group root. The BIND journal entry for a permissions failure looks like:
Apr 17 14:23:11 sw-infrarunbook-01 named[18342]: zone solvethenetwork.com/IN: open: /etc/bind/zones/db.solvethenetwork.com: permission denied
For dynamic update journal file failures:
Apr 17 14:23:11 sw-infrarunbook-01 named[18342]: dns_journal_open: journal open failed: permission denied
How to Fix It
Set ownership correctly. On Debian and Ubuntu:
chown -R bind:bind /etc/bind/zones/
chmod 640 /etc/bind/zones/db.solvethenetwork.com
chmod 750 /etc/bind/zones/
On RHEL and CentOS where BIND runs as
named:
chown -R named:named /var/named/
chmod 640 /var/named/solvethenetwork.com.zone
chmod 750 /var/named/
If you find
.jnlfiles in the zone directory that are also root-owned, delete them — BIND will recreate them cleanly on next startup:
find /etc/bind/zones/ -name '*.jnl' -user root -delete
Root Cause 5: Missing Include File
Why It Happens
named.confcommonly uses
includedirectives to pull in separate files for options, ACLs, TSIG keys, and zone definitions. It's a clean practice that keeps configuration modular. The problem is that if an included file doesn't exist, BIND fails hard at startup — it doesn't skip the include or warn and continue. This happens after deploying a config that references a new key file that wasn't generated yet, restoring
named.conffrom one server to another without copying the referenced files, or when a config management tool (Ansible, Puppet, Chef) deploys the main config before all the dependent files are in place.
How to Identify It
named-checkconfcatches this one clearly:
named-checkconf /etc/bind/named.conf
/etc/bind/named.conf:5: open: /etc/bind/named.conf.tsig-keys: file not found
/etc/bind/named.conf.local:18: open: /etc/bind/zones/db.solvethenetwork.internal: file not found
The journal will show the same if you try to force a start anyway:
Apr 17 14:23:11 sw-infrarunbook-01 named[18342]: loading configuration: open: /etc/bind/named.conf.tsig-keys: file not found
Apr 17 14:23:11 sw-infrarunbook-01 named[18342]: exiting (due to fatal error)
You can audit all include paths and verify they exist in one shot:
while IFS= read -r f; do
[ -f "$f" ] || echo "MISSING: $f"
done < <(grep -rh 'include' /etc/bind/named.conf* | awk '{print $2}' | tr -d '";')
How to Fix It
The fix depends on whether the file should exist or whether the include should be removed. If it's a TSIG key file that got lost, regenerate it:
tsig-keygen -a hmac-sha256 transfer-key > /etc/bind/named.conf.tsig-keys
chown bind:bind /etc/bind/named.conf.tsig-keys
chmod 640 /etc/bind/named.conf.tsig-keys
If the include references a zone file that hasn't been created yet, create a minimal valid zone file before starting BIND:
cat > /etc/bind/zones/db.solvethenetwork.internal << 'EOF'
$TTL 3600
@ IN SOA ns1.solvethenetwork.com. infrarunbook-admin.solvethenetwork.com. (
2026041701 ; Serial
3600 ; Refresh
900 ; Retry
604800 ; Expire
300 ) ; Negative Cache TTL
IN NS ns1.solvethenetwork.com.
EOF
chown bind:bind /etc/bind/zones/db.solvethenetwork.internal
named-checkzone solvethenetwork.internal /etc/bind/zones/db.solvethenetwork.internal
If the include is for a file that's genuinely no longer needed, comment out or remove the include directive from
named.conf. Always run
named-checkconfafter making the change before attempting a restart.
Root Cause 6: AppArmor or SELinux Denying Access
Why It Happens
If your zone files or config files live in a non-standard location, Linux security modules will block BIND from accessing them even when the POSIX file permissions look correct. AppArmor on Ubuntu ships with a BIND profile at
/etc/apparmor.d/usr.sbin.namedthat explicitly whitelists paths. SELinux on RHEL and CentOS enforces the same through file contexts. Moving zone files outside the standard directories without updating the security policy is a subtle gotcha that's easy to overlook because
named-checkconfand
named-checkzonerun as root and won't hit the denial — only the
bindor
nameduser will.
How to Identify It
On Ubuntu with AppArmor, look for denial entries in the audit log or kernel ring buffer:
journalctl -k | grep -i "apparmor.*named"
dmesg | grep -i apparmor | grep named
[ 1234.567] audit: type=1400 apparmor="DENIED" operation="open" profile="/usr/sbin/named" name="/srv/dns/zones/db.solvethenetwork.com" pid=18342 comm="named" requested_mask="r" denied_mask="r"
On RHEL with SELinux:
ausearch -m avc -ts recent | grep named
sealert -a /var/log/audit/audit.log | grep named
How to Fix It
The cleanest fix is moving the zone files back into the path the security profile expects — typically
/etc/bind/or
/var/named/. If you need to keep a custom path, update the policy. For AppArmor, add the path to the local override file:
echo "/srv/dns/zones/** rw," >> /etc/apparmor.d/local/usr.sbin.named
apparmor_parser -r /etc/apparmor.d/usr.sbin.named
For SELinux, apply the correct file context and relabel:
semanage fcontext -a -t named_zone_t "/srv/dns/zones(/.*)?"
restorecon -Rv /srv/dns/zones/
Root Cause 7: Duplicate Zone Declaration
Why It Happens
If you declare the same zone twice in your configuration — perhaps once in
named.conf.localand again in a newly included file — BIND refuses to start. This also surfaces in view-based configurations when the same zone appears in multiple views without proper scoping, or when someone adds a forward zone that conflicts with a built-in hint zone. It's a common mistake when consolidating configs from multiple servers or when a template-based config management tool generates duplicate stanzas.
How to Identify It
The journal makes this one obvious:
Apr 17 14:23:11 sw-infrarunbook-01 named[18342]: zone solvethenetwork.com: already exists previous definition: /etc/bind/named.conf.local:8
Find all declarations for the conflicting zone:
grep -rn 'zone "solvethenetwork.com"' /etc/bind/
How to Fix It
Remove or comment out the duplicate, keeping the authoritative definition in the right place. Then validate before restarting:
named-checkconf /etc/bind/named.conf && echo "OK"
Prevention
The single most effective habit you can build is running
named-checkconfand
named-checkzoneas mandatory steps before every reload or restart. Wrap them in a shell alias, a Makefile target, or a wrapper script your team uses instead of calling
systemctldirectly. Better yet, add them to your CI/CD pipeline so config changes are validated before they ever reach the server.
Store zone files in version control. A git repository for your DNS zones gives you a natural audit trail, makes rollbacks trivial, and lets you enforce validation in a pre-commit hook. A hook that runs
named-checkzoneagainst every modified zone file will catch problems before they reach the server. In my experience, this single practice eliminates the majority of zone-related outages.
Use
rndc reloadinstead of
systemctl restart namedfor zone-only changes. A reload applies zone changes without dropping existing connections and without killing the whole service if one zone has an error — BIND logs the problem and skips the broken zone rather than dying completely. A full
systemctl restartis more unforgiving and isn't necessary unless you've changed global options or added new listen addresses.
Automate permission management through your config management tool. If you're using Ansible, ensure every task that writes a zone file also sets owner, group, and mode explicitly. Don't rely on operators to remember. A task like
file: path=/etc/bind/zones owner=bind group=bind mode=0640 recurse=yescosts ten seconds to write and prevents an entire class of outage.
Document your include file dependencies. Keep a comment block at the top of
named.confthat lists every file it depends on, or maintain a README in
/etc/bind/enumerating all referenced external files. When someone deploys to a new server or restores a backup to a fresh host, they'll know exactly what needs to be in place before the service starts.
Finally, configure a systemd override that runs
named-checkconfas a pre-start check. This makes the service refuse to start if the configuration is invalid, instead of failing with a cryptic exit code:
# /etc/systemd/system/named.service.d/precheck.conf
[Service]
ExecStartPre=/usr/sbin/named-checkconf /etc/bind/named.conf
systemctl daemon-reload
With that in place, a misconfigured restart fails fast with a clear, actionable error instead of a generic service exit. DNS stays up on the running instance until you fix the underlying problem. That's exactly the behavior you want on a production nameserver — fail loudly in pre-flight, not silently in flight.
