Kubernetes Service Not Reachable

Published: Apr 5, 2026

Updated: Apr 13, 2026

A step-by-step troubleshooting guide covering every major reason a Kubernetes Service becomes unreachable, including label selector mismatches, kube-proxy failures, corrupted iptables rules,...

Symptoms

When a Kubernetes Service becomes unreachable, engineers typically encounter one or more of the following indicators across application logs, kubectl output, and in-cluster connectivity tests:

Running
curl http://web-frontend.production.svc.cluster.local
from inside a Pod returns
curl: (6) Could not resolve host
or
curl: (7) Failed to connect to port 8080 after 0 ms
kubectl exec
into a debug Pod and attempting to reach the Service ClusterIP yields
Connection timed out
or
No route to host
Application logs show repeated entries such as
dial tcp 10.96.45.12:8080: i/o timeout
or
connect: connection refused
kubectl get endpoints <service-name>
returns an empty addresses list or shows
<none>
Ingress controllers return HTTP 502 or 503 errors for requests targeting the backend Service
Inter-service communication inside the cluster fails intermittently — some requests succeed while others time out — suggesting partial routing failure on specific nodes
A freshly deployed application that passed staging smoke tests cannot be reached in production despite identical manifests

Each of these symptoms points to a different layer in Kubernetes networking. The sections below dissect the most common root causes, how to identify them with precision, and how to resolve them permanently.

Root Cause 1: Label Selector Mismatch

Why It Happens

A Kubernetes Service does not hold a static list of Pod IP addresses. Instead, it uses a label selector to dynamically discover backing Pods and build an Endpoints object. The endpoints controller watches for Pods whose labels match the selector and writes their IPs and ports into the Endpoints resource. When the labels on your Pods do not exactly match the selector defined in the Service spec — including key name, value, and case — the Endpoints object remains empty and the Service forwards no traffic anywhere. This is one of the most common misconfigurations, particularly after renaming labels during a refactor or copying manifests between projects without updating selectors.

How to Identify It

Start by inspecting the Endpoints object for the affected Service:

kubectl get endpoints web-frontend -n production
NAME           ENDPOINTS   AGE
web-frontend   <none>      14m

<none>

value confirms no Pods matched the selector. Now compare the Service selector against the labels on running Pods:

kubectl get svc web-frontend -n production -o jsonpath='{.spec.selector}'
{"app":"web-frontend","tier":"frontend"}

kubectl get pods -n production --show-labels
NAME                            READY   STATUS    LABELS
web-frontend-7d9f4b8c6-xk2rp   1/1     Running   app=web-frontend,tier=ui

The Service expects

tier=frontend

but the Pod carries

tier=ui

. The mismatch leaves the Endpoints list empty and the Service completely dark.

How to Fix It

Option A — patch the Deployment template labels so all future Pods carry the correct label and trigger a rollout:

kubectl patch deployment web-frontend -n production \
  --type='json' \
  -p='[{"op":"replace","path":"/spec/template/metadata/labels/tier","value":"frontend"}]'

kubectl rollout status deployment web-frontend -n production
deployment "web-frontend" successfully rolled out

Option B — for an immediate hotfix on an existing Pod, patch the label in place:

kubectl label pod web-frontend-7d9f4b8c6-xk2rp tier=frontend --overwrite -n production

Verify that Endpoints are now populated:

kubectl get endpoints web-frontend -n production
NAME           ENDPOINTS             AGE
web-frontend   10.244.1.15:8080      16m

Root Cause 2: kube-proxy Not Running

Why It Happens

kube-proxy

is the component responsible for maintaining the network rules — either

iptables

chains or

IPVS

virtual servers — that implement Service virtual IPs on every node. It runs as a DaemonSet, meaning one instance per node. If a node's kube-proxy Pod crashes and fails to restart (due to resource exhaustion, a broken container image, a missing kernel module, or a taints mismatch preventing scheduling), that node loses the ability to route Service traffic. Requests originating from Pods on the affected node, or routed to Pods running on it, will silently time out.

How to Identify It

Check the kube-proxy DaemonSet and the status of each Pod across nodes:

kubectl get pods -n kube-system -l k8s-app=kube-proxy -o wide
NAME               READY   STATUS             RESTARTS   AGE   NODE
kube-proxy-7xmrq   1/1     Running            0          2d    sw-infrarunbook-01
kube-proxy-9plbt   0/1     CrashLoopBackOff   14         45m   node-worker-02

The Pod on

node-worker-02

is in

CrashLoopBackOff

. Inspect its logs for the underlying error:

kubectl logs kube-proxy-9plbt -n kube-system --previous
E0405 11:23:17.445001       1 proxier.go:1689] Failed to execute iptables-restore: exit status 2
E0405 11:23:17.445123       1 run.go:74] "command failed" err="exit status 2"
F0405 11:23:17.445201       1 server.go:490] "Error running ProxyServer" err="failed to run Proxier: ..."

How to Fix It

Delete the failing Pod to trigger DaemonSet recreation on the node:

kubectl delete pod kube-proxy-9plbt -n kube-system

If it continues to crash, describe the Pod to surface node-level events:

kubectl describe pod kube-proxy-9plbt -n kube-system
Events:
  Warning  BackOff   40s   kubelet   Back-off restarting failed container
  Warning  Failed    45m   kubelet   Error: failed to create containerd task: OCI runtime exec failed
  Warning  Failed    45m   kubelet   Error response from daemon: No such image: registry.k8s.io/kube-proxy:v1.29.0

In this case the image cannot be pulled. Verify connectivity to the registry from the node, or pre-pull the correct image. Once the underlying problem is resolved, confirm the Pod is running and that iptables rules have been re-written:

iptables-save | grep -c KUBE
287

Root Cause 3: iptables Rules Corrupted or Flushed

Why It Happens

Even when kube-proxy is running and healthy, its iptables rules can be silently wiped. This occurs when a security tool, a firewall management daemon, or an administrator runs

iptables -F

on the node. Kernel upgrades that reset netfilter state, Docker daemon restarts on older setups, or competing iptables managers such as

firewalld

ufw

can purge or conflict with the

KUBE-*

chains that kube-proxy writes. The result is that Service ClusterIPs become black holes — TCP packets reach the node's network interface but are never DNAT-translated to a real Pod IP, so they are dropped silently.

How to Identify It

SSH to the affected node and check whether the KUBE chains exist:

ssh infrarunbook-admin@sw-infrarunbook-01
iptables -L KUBE-SERVICES -n --line-numbers 2>&1 | head -5
iptables: No chain/target/match by that name.

On a healthy node the output should list Service DNAT entries:

Chain KUBE-SERVICES (2 references)
num  target     prot opt source               destination
1    KUBE-SVC-XGLOHA7QRQ3V22RZ  tcp  --  0.0.0.0/0  10.96.0.1
2    KUBE-SVC-NPX46M4PTMTKRN6Y  tcp  --  0.0.0.0/0  10.96.0.10

Also check whether firewalld is active on the node, which is incompatible with kube-proxy:

systemctl is-active firewalld
active

How to Fix It

Disable and stop firewalld on all Kubernetes nodes — it must not coexist with kube-proxy:

systemctl stop firewalld
systemctl disable firewalld

Then force kube-proxy to rewrite all KUBE-* chains by rolling it out or deleting the Pod on the affected node:

kubectl rollout restart daemonset kube-proxy -n kube-system
kubectl rollout status daemonset kube-proxy -n kube-system
daemon set "kube-proxy" successfully rolled out

Alternatively, target only the affected node:

kubectl delete pod -n kube-system -l k8s-app=kube-proxy \
  --field-selector spec.nodeName=sw-infrarunbook-01

Confirm the chains are restored:

iptables-save | grep "KUBE-SERVICES" | wc -l
14

Root Cause 4: CoreDNS Failure

Why It Happens

Kubernetes Services are routinely accessed by DNS name — for example

api.production.svc.cluster.local

— rather than ClusterIP. CoreDNS is the in-cluster authoritative DNS resolver that translates these short names and fully-qualified names to ClusterIPs. If CoreDNS Pods are down, in

CrashLoopBackOff

, or if the CoreDNS ConfigMap (the Corefile) has been accidentally modified with a syntax error or incorrect upstream, DNS resolution fails cluster-wide. Applications that rely on DNS-based service discovery will report

could not resolve host

even though the underlying Service, Endpoints, and kube-proxy rules are perfectly healthy.

How to Identify It

Check CoreDNS Pod status in the kube-system namespace:

kubectl get pods -n kube-system -l k8s-app=kube-dns
NAME                       READY   STATUS             RESTARTS   AGE
coredns-5d78c9869d-4xm7g   0/1     CrashLoopBackOff   8          22m
coredns-5d78c9869d-9wqkf   1/1     Running            0          2d

Run a DNS resolution test from a temporary debug Pod:

kubectl run dns-test --image=busybox:1.36 --restart=Never -it --rm -- nslookup kubernetes.default
;; connection timed out; no servers could be reached

Inspect the logs of the failing CoreDNS Pod:

kubectl logs coredns-5d78c9869d-4xm7g -n kube-system
[ERROR] plugin/errors: 2 SERVFAIL for kubernetes.default.svc.cluster.local. A
[FATAL] Failed to initialize server: open /etc/coredns/Corefile: no such file or directory

Also inspect the CoreDNS ConfigMap for corruption:

kubectl get configmap coredns -n kube-system -o yaml

How to Fix It

If the Corefile ConfigMap has been damaged, restore it to a valid minimal configuration:

kubectl edit configmap coredns -n kube-system
# Ensure the data.Corefile key contains:
.:53 {
    errors
    health {
       lameduck 5s
    }
    ready
    kubernetes cluster.local in-addr.arpa ip6.arpa {
       pods insecure
       fallthrough in-addr.arpa ip6.arpa
    }
    prometheus :9153
    forward . /etc/resolv.conf
    cache 30
    loop
    reload
    loadbalance
}

Then restart the CoreDNS Deployment so it picks up the fixed ConfigMap:

kubectl rollout restart deployment coredns -n kube-system
kubectl rollout status deployment coredns -n kube-system
deployment "coredns" successfully rolled out

Verify DNS resolution is restored:

kubectl run dns-test --image=busybox:1.36 --restart=Never -it --rm -- nslookup kubernetes.default
Server:    10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name:      kubernetes.default
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local

Root Cause 5: NetworkPolicy Blocking Traffic

Why It Happens

Kubernetes

NetworkPolicy

resources implement firewall rules at the Pod level using the cluster's CNI plugin (Calico, Cilium, Weave, and others). A critical behavior to understand: once any NetworkPolicy selects a Pod as its target, all traffic not explicitly permitted by that policy — or any other policy targeting that Pod — is denied. Engineers commonly apply a default-deny policy to a namespace for security hardening, then forget to create matching allow rules for their application's traffic. The result is that inter-service calls that worked before the policy was applied are dropped at the CNI layer, often with no log entry at the application level — only connection timeouts.

How to Identify It

List all NetworkPolicies in the affected namespace:

kubectl get networkpolicy -n production
NAME                      POD-SELECTOR   AGE
default-deny-all          <none>         3d
allow-ingress-to-api      app=api        3d

Describe the deny policy to confirm it targets all Pods and blocks all traffic:

kubectl describe networkpolicy default-deny-all -n production
Name:         default-deny-all
Namespace:    production
Pod Selector: <none> (Selects all Pods in namespace)
Policy Types: Ingress, Egress
Allowing ingress traffic:
  <none> (Selected pods are isolated for ingress connectivity)
Allowing egress traffic:
  <none> (Selected pods are isolated for egress connectivity)

Confirm the connection is being dropped with a direct connectivity test:

kubectl exec -it debug-pod -n production -- curl -v --max-time 5 http://10.96.45.12:8080
* Trying 10.96.45.12:8080...
* connect to 10.96.45.12 port 8080 failed: Connection timed out
curl: (28) Connection timed out after 5001 milliseconds

With Calico installed you can also inspect policy enforcement decisions:

calicoctl get networkpolicy -n production -o yaml | grep -A10 selector

How to Fix It

Create an explicit ingress allow policy that permits the required source Pod to reach the destination Pod on the correct port:

kubectl apply -f - <<'EOF'
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend-to-api
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: api
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: web-frontend
    ports:
    - protocol: TCP
      port: 8080
EOF

If a default-deny-egress policy is also present, add a matching egress allow from the source Pod:

kubectl apply -f - <<'EOF'
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend-egress-to-api
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: web-frontend
  egress:
  - to:
    - podSelector:
        matchLabels:
          app: api
    ports:
    - protocol: TCP
      port: 8080
  policyTypes:
  - Egress
EOF

Retest connectivity and confirm you now receive an HTTP response:

kubectl exec -it debug-pod -n production -- curl -s -o /dev/null -w "%{http_code}" http://10.96.45.12:8080
200

Root Cause 6: Service Port or TargetPort Misconfiguration

Why It Happens

A Service exposes a

port

— the port clients use when reaching the Service — and a

targetPort

— the port on the Pod where the application actually listens. If

targetPort

does not match the container's listening port, TCP connections establish successfully to the Service ClusterIP (because kube-proxy happily creates rules for whatever targetPort you specify) but are forwarded to the wrong port on the Pod, resulting in immediate connection refused errors. This frequently occurs when a container image is updated with a different default port and the Service manifest is not updated to match.

How to Identify It

Compare the Service targetPort against the actual container port:

kubectl describe svc api -n production
Port:              http  8080/TCP
TargetPort:        8080/TCP
Endpoints:         10.244.1.15:8080,10.244.2.9:8080

kubectl describe pod api-5f7d9c84b-j4rlx -n production | grep -A3 Ports
    Port:          9090/TCP
    Host Port:     0/TCP

The Service targets port 8080 but the container listens on 9090. All forwarded connections hit a closed port and are immediately refused.

How to Fix It

kubectl patch svc api -n production \
  --type='json' \
  -p='[{"op":"replace","path":"/spec/ports/0/targetPort","value":9090}]'

kubectl get endpoints api -n production
NAME   ENDPOINTS               AGE
api    10.244.1.15:9090        4m

Root Cause 7: Pod Readiness Probe Failure

Why It Happens

Kubernetes automatically removes a Pod from the Service's Endpoints list when its readiness probe fails. This is a safety feature designed to prevent traffic from reaching Pods that are not yet ready to serve requests. However, an incorrectly configured readiness probe — wrong HTTP path, wrong port, or a timeout too short for the application's startup time — causes healthy Pods to be continuously excluded from Endpoints. The Service exists, the Pods are running, but no traffic is ever forwarded.

How to Identify It

kubectl get pods -n production
NAME                   READY   STATUS    RESTARTS   AGE
api-5f7d9c84b-j4rlx   0/1     Running   0          8m

kubectl describe pod api-5f7d9c84b-j4rlx -n production
Readiness:  http-get http://:8080/healthz delay=5s timeout=1s period=10s
Events:
  Warning  Unhealthy  30s   kubelet  Readiness probe failed: Get http://10.244.1.15:8080/healthz: dial tcp 10.244.1.15:8080: connect: connection refused

The

0/1 READY

status indicates the Pod has been removed from Endpoints and is invisible to the Service.

How to Fix It

Update the readiness probe in the Deployment to use the correct path and port:

kubectl patch deployment api -n production \
  --type='json' \
  -p='[{"op":"replace","path":"/spec/template/spec/containers/0/readinessProbe/httpGet/path","value":"/ready"}]'

kubectl rollout status deployment api -n production
deployment "api" successfully rolled out

kubectl get endpoints api -n production
NAME   ENDPOINTS             AGE
api    10.244.1.15:8080      92s

Prevention

Avoiding Service reachability failures requires discipline across the full lifecycle of Kubernetes deployments. The following practices eliminate the most common failure modes before they reach production:

Validate manifests before applying. Run
kubectl diff -f manifest.yaml
to preview changes and
kubectl apply --dry-run=server -f manifest.yaml
to catch structural misconfigurations against the live API server before committing them.
Enforce label conventions with admission control. Use OPA/Gatekeeper or Kyverno to reject Deployments whose Pod template labels do not include the required selector keys. This catches label mismatches at admission time, long before a Service endpoint list goes empty.
Monitor kube-proxy health continuously. Alert on any discrepancy between the DaemonSet's desired replica count and its ready count. A single node running without kube-proxy creates a hard-to-diagnose partial routing failure where some requests succeed and others time out depending on which node the client Pod is scheduled on.
Protect CoreDNS from resource pressure. Assign CoreDNS to a high-priority PriorityClass so it is not evicted under node memory pressure. Set resource requests and limits carefully, and monitor its Prometheus metrics (exposed on port 9153) for request latency spikes and error rate increases.
Test NetworkPolicies in a staging namespace first. Use tools such as
netassert
or Cilium's built-in connectivity test suite to validate that allow and deny rules behave as expected before applying them to production. Always add a matching egress allow rule whenever you add an ingress allow rule between two namespaces.
Prohibit firewalld and ufw on Kubernetes nodes. These tools conflict with kube-proxy's iptables management. Disable and mask them during node provisioning via your configuration management tooling (Ansible, Chef, or cloud-init), and prevent re-installation through package management policies.
Design readiness probes carefully. A readiness probe should check application-level readiness — for example, that a database connection pool is established and the application is serving traffic. Use generous
initialDelaySeconds
and
failureThreshold
values to avoid evicting Pods during normal slow starts.
Include connectivity smoke tests in CI/CD pipelines. After every deployment, run a
kubectl exec
test from a Pod in the same namespace to verify the Service is reachable via its DNS name before marking the deployment successful. Gate production promotion on this check.

Frequently Asked Questions

How do I quickly check whether a Service has any healthy endpoints?

Run `kubectl get endpoints <service-name> -n <namespace>`. If the ENDPOINTS column shows `<none>`, no Pods are currently matching the Service selector or passing their readiness probe. If it shows IP:port pairs, the Service has backing Pods. You can also run `kubectl describe endpoints <service-name>` for more detail including NotReadyAddresses, which shows Pods that exist but are failing their readiness probe.

What does `<none>` mean in the output of kubectl get endpoints?

`<none>` means the Endpoints object exists (it is created automatically with the Service) but contains zero ready addresses. This happens when no running Pods match the Service's label selector, when all matching Pods are failing their readiness probe, or when matching Pods exist but are in a Terminating state. Start by comparing `kubectl get svc -o jsonpath='{.spec.selector}'` against `kubectl get pods --show-labels` to rule out a label mismatch.

How do I test Service connectivity from inside the cluster without deploying a dedicated test application?

Run a temporary busybox or curl Pod: `kubectl run connectivity-test --image=curlimages/curl:8.6.0 --restart=Never -it --rm -n <namespace> -- curl -v http://<service-name>:<port>`. This Pod is created in the same namespace as your Service and is deleted automatically when the session ends. For DNS testing, use `kubectl run dns-test --image=busybox:1.36 --restart=Never -it --rm -- nslookup <service-name>.<namespace>.svc.cluster.local`.

Can I bypass CoreDNS and use the ClusterIP directly to isolate a DNS problem?

Yes. Run `kubectl get svc <service-name> -n <namespace> -o jsonpath='{.spec.clusterIP}'` to retrieve the ClusterIP, then `curl http://<clusterIP>:<port>` from inside a Pod. If the ClusterIP works but the DNS name does not, the problem is isolated to CoreDNS. If both fail, the problem is at the iptables/kube-proxy layer or in NetworkPolicy rules.

How do I distinguish between a NetworkPolicy timeout and a kube-proxy routing failure?

NetworkPolicy drops produce silent connection timeouts — the TCP SYN is never answered. A kube-proxy or iptables failure also produces timeouts when the ClusterIP is unreachable. The key difference: if you can reach the Pod IP directly (bypassing the Service) and the connection times out, NetworkPolicy is likely the cause. If direct Pod-to-Pod connectivity works but Service ClusterIP fails, suspect kube-proxy or iptables. Check `kubectl get networkpolicy -n <namespace>` first, then `iptables-save | grep KUBE` on the node.

What is the difference between kube-proxy iptables mode and IPVS mode, and does it affect troubleshooting?

In iptables mode, kube-proxy writes DNAT rules into the KUBE-SERVICES chain. In IPVS mode, it programs the Linux kernel's IP Virtual Server. The failure modes differ: in iptables mode, look for missing KUBE-* chains with `iptables -L KUBE-SERVICES -n`. In IPVS mode, inspect virtual servers with `ipvsadm -Ln` — missing entries for your Service ClusterIP indicate kube-proxy has not synced. Check `kubectl get configmap kube-proxy -n kube-system -o yaml | grep mode` to identify which mode your cluster uses.

How do I find which CNI plugin is running and whether it supports NetworkPolicy?

Run `kubectl get pods -n kube-system -o wide` and look for Pods named after the CNI (calico-node, cilium, weave-net, flannel, etc.). Flannel does not support NetworkPolicy natively — if you apply NetworkPolicy objects in a Flannel-only cluster, they are silently ignored. Calico, Cilium, and Weave all enforce NetworkPolicy. If your cluster uses Flannel and you need NetworkPolicy, you must add a separate policy enforcement layer such as Calico in policy-only mode.

Why does my Service work from some namespaces but not others?

Cross-namespace Service access requires the full DNS name (`<service>.<namespace>.svc.cluster.local`) rather than the short name. Short names only resolve within the same namespace. Also check for NetworkPolicies that restrict ingress by `namespaceSelector` — a policy allowing traffic only from Pods in the `production` namespace will block requests from Pods in `staging` even if the selector otherwise matches. Inspect policies on the destination namespace with `kubectl get networkpolicy -n <destination-namespace> -o yaml`.

How do I restore kube-proxy iptables rules immediately without restarting the node?

Delete the kube-proxy Pod on the affected node; the DaemonSet controller will recreate it and kube-proxy will perform a full iptables sync on startup: `kubectl delete pod -n kube-system -l k8s-app=kube-proxy --field-selector spec.nodeName=<node-name>`. The new Pod rewrites all KUBE-* chains within seconds. You can confirm with `iptables-save | grep -c KUBE` — a healthy node typically shows 200 or more KUBE-related rules depending on cluster size.

What tool can I use to visualize NetworkPolicy rules and test them before applying?

Cilium's Network Policy Editor (available in the Cilium documentation) provides a visual interface for building and testing policies. The open-source `netassert` tool lets you write YAML-based connectivity assertions and test them against a live cluster. For Calico clusters, `calicoctl` can dump and display the full policy evaluation order. Kyverno includes a policy testing framework that can simulate NetworkPolicy scenarios without deploying to a live cluster.

Symptoms

Root Cause 1: Label Selector Mismatch

Why It Happens

How to Identify It

How to Fix It

Root Cause 2: kube-proxy Not Running

Why It Happens

How to Identify It

How to Fix It

Root Cause 3: iptables Rules Corrupted or Flushed

Why It Happens

How to Identify It

How to Fix It

Root Cause 4: CoreDNS Failure

Why It Happens

How to Identify It

How to Fix It

Root Cause 5: NetworkPolicy Blocking Traffic

Why It Happens

How to Identify It

How to Fix It

Root Cause 6: Service Port or TargetPort Misconfiguration

Why It Happens

How to Identify It

How to Fix It

Root Cause 7: Pod Readiness Probe Failure

Why It Happens

How to Identify It

How to Fix It

Prevention

Related Articles

Frequently Asked Questions

How do I quickly check whether a Service has any healthy endpoints?

What does `<none>` mean in the output of kubectl get endpoints?

How do I test Service connectivity from inside the cluster without deploying a dedicated test application?

Can I bypass CoreDNS and use the ClusterIP directly to isolate a DNS problem?

How do I distinguish between a NetworkPolicy timeout and a kube-proxy routing failure?

What is the difference between kube-proxy iptables mode and IPVS mode, and does it affect troubleshooting?

How do I find which CNI plugin is running and whether it supports NetworkPolicy?

Why does my Service work from some namespaces but not others?

How do I restore kube-proxy iptables rules immediately without restarting the node?

What tool can I use to visualize NetworkPolicy rules and test them before applying?

Related Articles