Symptoms
You run a ping and get back Network is unreachable. Or curl exits immediately with connect: Network is unreachable. An application that normally reaches its upstream API starts timing out. SSH drops mid-session and won't reconnect. A monitoring agent stops checking in and nobody noticed for twenty minutes.
These are all the same kernel message underneath: it has no route for that packet. Not a firewall drop. Not a refused connection from the remote end. The packet never left the machine because the routing subsystem had nowhere to send it. That distinction is actually useful — it tells you the problem is local, not remote. You don't need to call the network team yet. You need a terminal and a methodical approach.
In my experience, most of these incidents follow repeating patterns: a DHCP client fails to renew during a kernel upgrade reboot, someone tightens a firewall OUTPUT policy too aggressively and forgets about new outbound connections, or a DNS server goes dark and the whole team convinces themselves the network is broken. Let's work through each cause with real commands and outputs so you can move fast when it counts.
Root Cause 1: Missing Default Route
This is the single most common cause I run into. When a Linux host has no default route, it can reach hosts on its directly connected subnets just fine — but any packet destined for anything outside those subnets gets dropped immediately. The kernel checks its routing table, finds no matching entry, and returns Network is unreachable before the packet ever hits the wire.
It happens more than you'd expect. A DHCP lease expires and the client fails to renew cleanly. A network interface gets restarted and the route doesn't come back because it was added manually at some point rather than through persistent config. A VM gets live-migrated and the new hypervisor segment has a different gateway. Someone runs
ip route flush table mainduring a debugging session without fully thinking it through. These all land in the same place.
To identify it, start here:
$ ip route show
192.168.10.0/24 dev eth0 proto kernel scope link src 192.168.10.45
If that's your entire routing table — no line starting with default — you've found your problem. You can confirm with a targeted lookup:
$ ip route get 10.0.0.1
RTNETLINK answers: Network is unreachable
Compare that to what a healthy system returns:
$ ip route get 10.0.0.1
10.0.0.1 via 192.168.10.1 dev eth0 src 192.168.10.45 uid 1000
cache
To restore connectivity immediately:
$ sudo ip route add default via 192.168.10.1 dev eth0
That works right now but won't survive a reboot. For a permanent fix on systems using NetworkManager:
$ sudo nmcli connection modify "Wired connection 1" ipv4.gateway 192.168.10.1
$ sudo nmcli connection up "Wired connection 1"
On systems using
/etc/network/interfaces, confirm the stanza includes a
gatewaydirective:
iface eth0 inet static
address 192.168.10.45
netmask 255.255.255.0
gateway 192.168.10.1
If you're relying on DHCP, check whether the client is actually running and whether the last renewal succeeded.
journalctl -u NetworkManager --since -10mor
journalctl -u dhclient --since -10mwill show you exactly what happened during the last attempt. A failed renewal that nobody caught is a remarkably frequent root cause on servers that have been running for years without a maintenance window.
Root Cause 2: Interface Down
An interface that's administratively down or has lost carrier won't transmit anything. Routes in the table still appear perfectly healthy — they just go nowhere. The result is the same connectivity failure from the application's perspective, though the failure mode can manifest slightly differently depending on whether the OS still thinks it has a route.
This shows up after a driver crash, after someone accidentally runs
ip link set eth0 downduring a configuration change, after a cable gets disconnected, when a switch port gets administratively disabled on the infrastructure side, or when a VM's virtual NIC detaches during a migration that didn't complete cleanly. I've also seen it happen when a NIC browns out under sustained load — the driver reports it as link down and doesn't always recover without intervention.
Check interface state:
$ ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN
link/ether 52:54:00:ab:cd:ef brd ff:ff:ff:ff:ff:ff
The key indicators are state DOWN and the absence of LOWER_UP in the flag set. LOWER_UP means the physical layer has carrier. No LOWER_UP means no carrier — check the physical connection or the upstream switch port before anything else.
To bring the interface back up:
$ sudo ip link set eth0 up
If it was purely an administrative down, that's enough and the interface will come straight back. If there's a carrier issue, the interface will come up but immediately show NO-CARRIER. At that point dig into the physical layer —
ethtool eth0shows link speed negotiation status, and
dmesg | grep eth0will surface any driver-level errors. On virtual machines, check the hypervisor side: a detached vNIC looks exactly like a pulled cable from the guest's perspective.
Root Cause 3: Wrong IP Assigned
A host with the wrong IP address can appear operational — it responds to pings on its local segment — but it can't reach anything outside it. If the assigned IP puts the host on the wrong subnet entirely, it won't be able to reach the default gateway at layer 3. The host will ARP for the gateway and get no response because they're logically on different segments, even if physically on the same switch.
This happens when DHCP hands out a stale or incorrect lease, when someone copies a static IP configuration from another server and forgets to update the address, or after a network renumbering project where not every system was touched. In containerized and virtualized environments, I've seen IPAM pools get exhausted and workloads fall back to 169.254.x.x link-local addresses — which look like assigned IPs but can't route anywhere.
Check the assigned address against your expected configuration:
$ ip addr show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP
link/ether 52:54:00:ab:cd:ef brd ff:ff:ff:ff:ff:ff
inet 10.0.99.45/24 brd 10.0.99.255 scope global eth0
If the host should be on 192.168.10.0/24 but shows 10.0.99.45, that explains everything. Confirm by trying to reach the gateway:
$ ping -c 3 192.168.10.1
connect: Network is unreachable
The routing table has no path to 192.168.10.1 because the host thinks it's on 10.0.99.0/24. To fix it manually:
$ sudo ip addr del 10.0.99.45/24 dev eth0
$ sudo ip addr add 192.168.10.45/24 dev eth0
$ sudo ip route add default via 192.168.10.1 dev eth0
If DHCP is involved, it's cleaner to force a fresh lease rather than patching the live state by hand:
$ sudo dhclient -r eth0 && sudo dhclient eth0
Either way, ensure the fix is reflected in the persistent configuration — not just the live runtime state. A reboot will undo any manual
ipcommands you apply without also updating the underlying config files or NetworkManager connection profiles.
Root Cause 4: Firewall Blocking Outbound Traffic
A misconfigured firewall produces symptoms that look exactly like a routing failure to the application layer. The packets are being dropped by iptables or nftables rules — stopped before they leave — so from the application's perspective, the network simply doesn't exist. This catches people out regularly, and I've seen senior engineers spend twenty minutes checking routing tables when the answer was a single OUTPUT chain policy line.
It happens when someone sets the OUTPUT chain policy to DROP without carefully enumerating the traffic they need to allow, when a security hardening script applies aggressive egress filtering, or when FORWARD rules are wrong on a host acting as a router or container host and transit traffic gets silently discarded.
List your iptables rules with packet and byte counts so you can see what's actively matching:
$ sudo iptables -L -n -v --line-numbers
Chain INPUT (policy ACCEPT 142 packets, 18432 bytes)
...
Chain OUTPUT (policy DROP 0 packets, 0 bytes)
num pkts bytes target prot opt in out source destination
1 0 0 ACCEPT all -- * lo 0.0.0.0/0 0.0.0.0/0
2 15 1260 ACCEPT tcp -- * * 0.0.0.0/0 0.0.0.0/0 state ESTABLISHED
OUTPUT policy is DROP, there's no rule allowing NEW outbound connections, and established-only acceptance means nothing initiating from this host gets out. To watch it in real time, run
watch -n1 sudo iptables -L -n -vin one terminal while attempting a connection in another — the packet counter on the implicit DROP will increment with each attempt.
To restore full outbound access quickly:
$ sudo iptables -P OUTPUT ACCEPT
For a surgical fix that preserves the DROP default but allows new outbound connections:
$ sudo iptables -I OUTPUT 1 -m state --state NEW,ESTABLISHED,RELATED -j ACCEPT
If your system uses nftables, inspect the full ruleset first:
$ sudo nft list ruleset
table inet filter {
chain output {
type filter hook output priority 0; policy drop;
ct state established,related accept
}
}
Insert a rule to permit new outbound connections:
$ sudo nft insert rule inet filter output ct state new accept
Make sure any fix you apply in-memory also gets written back to
/etc/iptables/rules.v4or your nftables configuration file. A temporary iptables fix that disappears on reboot is worse than no fix — it creates an intermittent problem that's much harder to diagnose the second time.
Root Cause 5: DNS Failure Masking the Real Network State
This one is subtle and trips up experienced engineers more often than it should. The actual network is fine. Routing works. Firewall rules are clean. The interface is up. But every connection attempt fails because the application can't resolve the hostname, and the error it reports looks indistinguishable from a network failure. The team spends forty minutes debugging a network that isn't broken.
It happens when
/etc/resolv.confpoints to a DNS server that's unreachable or has gone down, when
systemd-resolvedis in a degraded state after a service crash, or when the DNS server is technically reachable but not responding to queries — perhaps because its own firewall blocks UDP/53 from this host. The application tries to resolve
api.solvethenetwork.com, waits for a timeout, and reports a generic connection failure.
The diagnostic that cuts through immediately is simple: try connecting by IP address instead of hostname.
$ curl -v https://api.solvethenetwork.com/health
curl: (6) Could not resolve host: api.solvethenetwork.com
$ curl -v --insecure https://192.168.10.100/health
HTTP/1.1 200 OK
If the IP succeeds and the hostname fails, you're dealing with DNS — not the network. Check your resolver:
$ cat /etc/resolv.conf
nameserver 192.168.10.53
$ dig @192.168.10.53 api.solvethenetwork.com
;; connection timed out; no servers could be reached
The nameserver is unreachable or not responding. If your system uses
systemd-resolved, check its current state and configured upstreams:
$ systemd-resolve --status
Global
DNS Servers: 192.168.10.53
DNS Domain: solvethenetwork.com
DNSSEC: no
Link 2 (eth0)
Current Scopes: DNS
DNS Servers: 192.168.10.53
Fallback: none
No fallback DNS configured, and the primary is down — that's why resolution is failing. Fix it by updating the resolved configuration:
$ sudo nano /etc/systemd/resolved.conf
[Resolve]
DNS=192.168.1.1
FallbackDNS=10.0.0.1
Domains=solvethenetwork.com
$ sudo systemctl restart systemd-resolved
On systems where
systemd-resolvedisn't managing DNS, update the file directly — but be aware that NetworkManager may overwrite it on next connection event:
$ sudo bash -c 'echo "nameserver 192.168.1.1" > /etc/resolv.conf'
Always test DNS resolution as its own isolated step.
ping 192.168.10.1followed by
ping sw-infrarunbook-01.solvethenetwork.comtakes five seconds and immediately tells you whether you're chasing a routing problem or a name resolution problem. Building this habit into your troubleshooting reflex saves a lot of misdirected effort.
Root Cause 6: Stale or Corrupt ARP Cache
The ARP cache maps IP addresses to MAC addresses on your local subnet. When it holds stale entries — particularly after a device gets replaced, an IP moves to a different host, or an IP conflict causes ARP poisoning — packets get sent to the wrong MAC address and disappear. The routing table looks fine, the interface is up, the IP assignment is correct, but traffic still doesn't flow to local hosts or the gateway.
Inspect the neighbor table to see the current ARP state:
$ ip neigh show
192.168.10.1 dev eth0 lladdr 00:00:00:00:00:00 FAILED
192.168.10.50 dev eth0 lladdr 52:54:00:11:22:33 REACHABLE
192.168.10.60 dev eth0 INCOMPLETE
The gateway at 192.168.10.1 is showing FAILED — the host sent ARP requests and got no valid reply. An INCOMPLETE entry means resolution is in progress or has never succeeded. Either state means traffic to those IPs is going nowhere useful.
Clear the stale entry and let ARP re-resolve on the next packet:
$ sudo ip neigh del 192.168.10.1 dev eth0
$ ping -c 1 192.168.10.1
To flush all stale entries from an interface at once:
$ sudo ip neigh flush dev eth0
If the gateway consistently returns to FAILED even after flushing, the problem is upstream rather than local — the gateway itself may be down, the switch port may be blocking, or an IP conflict is generating ARP chaos on the segment. At that point, a packet capture with
tcpdump -i eth0 arpwill show you whether ARP replies are arriving at all and whether multiple MAC addresses are claiming the same IP.
Root Cause 7: MTU Mismatch
MTU mismatches are infuriating because they produce inconsistent failures that can look like intermittent network problems rather than a configuration error. Small packets get through fine. Large packets — TCP data transfers, sizable HTTP responses, any bulk transfer — get fragmented or silently dropped somewhere in the path, and connections stall or fail in ways that make no sense given the routing table looks healthy.
This happens most commonly when VPN tunnels are involved (the tunnel adds encapsulation overhead that reduces effective MTU below the standard 1500), when jumbo frames are configured on a NIC but the upstream switch doesn't support them, or in environments running PPPoE which drops the MTU to 1492.
Test it by pinging with a specific payload size and the don't-fragment bit set:
$ ping -c 3 -M do -s 1472 192.168.10.1
PING 192.168.10.1 (192.168.10.1) 1472(1500) bytes of data.
ping: local error: message too long, mtu=1450
That tells you the effective path MTU is 1450 bytes. The
-s 1472adds 28 bytes of ICMP/IP header to reach 1500 bytes total — the standard Ethernet MTU. If that fails, start reducing the payload size until packets get through. Check the configured MTU on the interface:
$ ip link show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT
The interface is configured for 1500 but the path only supports 1450, which means something between here and the destination is dropping oversized packets. Set the interface MTU to match the actual path capacity:
$ sudo ip link set eth0 mtu 1450
For VPN interfaces, configure the appropriate MTU in the client configuration rather than patching it at runtime. WireGuard typically needs
MTU = 1420in the interface section. OpenVPN users should set
tun-mtu 1400in the client config. Make these changes persistent — a runtime MTU fix that gets lost on reconnect creates a repeating incident.
Prevention
Most of these problems are preventable with monitoring, disciplined configuration management, and a few habits baked into your team's operating procedures.
Monitor your default route. A check every minute —
ip route show defaultalerting when it returns empty — catches DHCP renewal failures before users notice. On systems with static routing, use configuration management (Ansible, Salt, Puppet) to enforce the routing table state rather than trusting that manual setup has held since the last engineer touched the box.
Test firewall changes with a built-in revert. Before applying any iptables or nftables modifications, set a timed revert so you get access back if something goes wrong:
$ sudo iptables-save > /tmp/iptables-backup-$(date +%s)
$ echo "sudo iptables-restore < /tmp/iptables-backup-*" | at now + 5 minutes
If your changes work, cancel the job with
atrm. If they lock you out, you recover automatically. This habit has saved me from self-imposed outages more times than I'd like to admit.
Train your team to always test with an IP address before concluding the network is down. A quick
ping 192.168.10.1versus
ping sw-infrarunbook-01.solvethenetwork.comtakes five seconds and immediately tells you whether you're dealing with routing or name resolution. Build this into your runbooks explicitly — if the IP works and the hostname doesn't, stop debugging the network.
Capture a baseline on every new system build:
ip addr show,
ip route show table all,
ip neigh show,
iptables -L -n -v, and
cat /etc/resolv.conf. Store it in your CMDB or alongside the system's infrastructure-as-code definition. When something breaks months later, having that baseline makes comparison trivial and cuts diagnosis time from an hour to five minutes.
Finally, make DNS redundant. A single nameserver entry in
/etc/resolv.confis a single point of failure that will eventually manifest as a network outage. Always configure a fallback — either a second internal resolver or a well-known internal recursive resolver that won't disappear when your primary DNS VM gets rebooted for patching.
