InfraRunBook
    Back to articles

    Kubernetes Service Not Reachable

    Kubernetes
    Published: Apr 5, 2026
    Updated: Apr 5, 2026

    A step-by-step troubleshooting guide covering every major reason a Kubernetes Service becomes unreachable, including label selector mismatches, kube-proxy failures, corrupted iptables rules, CoreDNS outages, and NetworkPolicy blocks — with real CLI commands and remediation steps.

    Kubernetes Service Not Reachable

    Symptoms

    When a Kubernetes Service becomes unreachable, engineers typically encounter one or more of the following indicators across application logs, kubectl output, and in-cluster connectivity tests:

    • Running
      curl http://web-frontend.production.svc.cluster.local
      from inside a Pod returns
      curl: (6) Could not resolve host
      or
      curl: (7) Failed to connect to port 8080 after 0 ms
    • kubectl exec
      into a debug Pod and attempting to reach the Service ClusterIP yields
      Connection timed out
      or
      No route to host
    • Application logs show repeated entries such as
      dial tcp 10.96.45.12:8080: i/o timeout
      or
      connect: connection refused
    • kubectl get endpoints <service-name>
      returns an empty addresses list or shows
      <none>
    • Ingress controllers return HTTP 502 or 503 errors for requests targeting the backend Service
    • Inter-service communication inside the cluster fails intermittently — some requests succeed while others time out — suggesting partial routing failure on specific nodes
    • A freshly deployed application that passed staging smoke tests cannot be reached in production despite identical manifests

    Each of these symptoms points to a different layer in Kubernetes networking. The sections below dissect the most common root causes, how to identify them with precision, and how to resolve them permanently.


    Root Cause 1: Label Selector Mismatch

    Why It Happens

    A Kubernetes Service does not hold a static list of Pod IP addresses. Instead, it uses a label selector to dynamically discover backing Pods and build an Endpoints object. The endpoints controller watches for Pods whose labels match the selector and writes their IPs and ports into the Endpoints resource. When the labels on your Pods do not exactly match the selector defined in the Service spec — including key name, value, and case — the Endpoints object remains empty and the Service forwards no traffic anywhere. This is one of the most common misconfigurations, particularly after renaming labels during a refactor or copying manifests between projects without updating selectors.

    How to Identify It

    Start by inspecting the Endpoints object for the affected Service:

    kubectl get endpoints web-frontend -n production
    NAME           ENDPOINTS   AGE
    web-frontend   <none>      14m

    An

    <none>
    value confirms no Pods matched the selector. Now compare the Service selector against the labels on running Pods:

    kubectl get svc web-frontend -n production -o jsonpath='{.spec.selector}'
    {"app":"web-frontend","tier":"frontend"}
    
    kubectl get pods -n production --show-labels
    NAME                            READY   STATUS    LABELS
    web-frontend-7d9f4b8c6-xk2rp   1/1     Running   app=web-frontend,tier=ui

    The Service expects

    tier=frontend
    but the Pod carries
    tier=ui
    . The mismatch leaves the Endpoints list empty and the Service completely dark.

    How to Fix It

    Option A — patch the Deployment template labels so all future Pods carry the correct label and trigger a rollout:

    kubectl patch deployment web-frontend -n production \
      --type='json' \
      -p='[{"op":"replace","path":"/spec/template/metadata/labels/tier","value":"frontend"}]'
    
    kubectl rollout status deployment web-frontend -n production
    deployment "web-frontend" successfully rolled out

    Option B — for an immediate hotfix on an existing Pod, patch the label in place:

    kubectl label pod web-frontend-7d9f4b8c6-xk2rp tier=frontend --overwrite -n production

    Verify that Endpoints are now populated:

    kubectl get endpoints web-frontend -n production
    NAME           ENDPOINTS             AGE
    web-frontend   10.244.1.15:8080      16m

    Root Cause 2: kube-proxy Not Running

    Why It Happens

    kube-proxy
    is the component responsible for maintaining the network rules — either
    iptables
    chains or
    IPVS
    virtual servers — that implement Service virtual IPs on every node. It runs as a DaemonSet, meaning one instance per node. If a node's kube-proxy Pod crashes and fails to restart (due to resource exhaustion, a broken container image, a missing kernel module, or a taints mismatch preventing scheduling), that node loses the ability to route Service traffic. Requests originating from Pods on the affected node, or routed to Pods running on it, will silently time out.

    How to Identify It

    Check the kube-proxy DaemonSet and the status of each Pod across nodes:

    kubectl get pods -n kube-system -l k8s-app=kube-proxy -o wide
    NAME               READY   STATUS             RESTARTS   AGE   NODE
    kube-proxy-7xmrq   1/1     Running            0          2d    sw-infrarunbook-01
    kube-proxy-9plbt   0/1     CrashLoopBackOff   14         45m   node-worker-02

    The Pod on

    node-worker-02
    is in
    CrashLoopBackOff
    . Inspect its logs for the underlying error:

    kubectl logs kube-proxy-9plbt -n kube-system --previous
    E0405 11:23:17.445001       1 proxier.go:1689] Failed to execute iptables-restore: exit status 2
    E0405 11:23:17.445123       1 run.go:74] "command failed" err="exit status 2"
    F0405 11:23:17.445201       1 server.go:490] "Error running ProxyServer" err="failed to run Proxier: ..."

    How to Fix It

    Delete the failing Pod to trigger DaemonSet recreation on the node:

    kubectl delete pod kube-proxy-9plbt -n kube-system

    If it continues to crash, describe the Pod to surface node-level events:

    kubectl describe pod kube-proxy-9plbt -n kube-system
    Events:
      Warning  BackOff   40s   kubelet   Back-off restarting failed container
      Warning  Failed    45m   kubelet   Error: failed to create containerd task: OCI runtime exec failed
      Warning  Failed    45m   kubelet   Error response from daemon: No such image: registry.k8s.io/kube-proxy:v1.29.0

    In this case the image cannot be pulled. Verify connectivity to the registry from the node, or pre-pull the correct image. Once the underlying problem is resolved, confirm the Pod is running and that iptables rules have been re-written:

    iptables-save | grep -c KUBE
    287

    Root Cause 3: iptables Rules Corrupted or Flushed

    Why It Happens

    Even when kube-proxy is running and healthy, its iptables rules can be silently wiped. This occurs when a security tool, a firewall management daemon, or an administrator runs

    iptables -F
    on the node. Kernel upgrades that reset netfilter state, Docker daemon restarts on older setups, or competing iptables managers such as
    firewalld
    or
    ufw
    can purge or conflict with the
    KUBE-*
    chains that kube-proxy writes. The result is that Service ClusterIPs become black holes — TCP packets reach the node's network interface but are never DNAT-translated to a real Pod IP, so they are dropped silently.

    How to Identify It

    SSH to the affected node and check whether the KUBE chains exist:

    ssh infrarunbook-admin@sw-infrarunbook-01
    iptables -L KUBE-SERVICES -n --line-numbers 2>&1 | head -5
    iptables: No chain/target/match by that name.

    On a healthy node the output should list Service DNAT entries:

    Chain KUBE-SERVICES (2 references)
    num  target     prot opt source               destination
    1    KUBE-SVC-XGLOHA7QRQ3V22RZ  tcp  --  0.0.0.0/0  10.96.0.1
    2    KUBE-SVC-NPX46M4PTMTKRN6Y  tcp  --  0.0.0.0/0  10.96.0.10

    Also check whether firewalld is active on the node, which is incompatible with kube-proxy:

    systemctl is-active firewalld
    active

    How to Fix It

    Disable and stop firewalld on all Kubernetes nodes — it must not coexist with kube-proxy:

    systemctl stop firewalld
    systemctl disable firewalld

    Then force kube-proxy to rewrite all KUBE-* chains by rolling it out or deleting the Pod on the affected node:

    kubectl rollout restart daemonset kube-proxy -n kube-system
    kubectl rollout status daemonset kube-proxy -n kube-system
    daemon set "kube-proxy" successfully rolled out

    Alternatively, target only the affected node:

    kubectl delete pod -n kube-system -l k8s-app=kube-proxy \
      --field-selector spec.nodeName=sw-infrarunbook-01

    Confirm the chains are restored:

    iptables-save | grep "KUBE-SERVICES" | wc -l
    14

    Root Cause 4: CoreDNS Failure

    Why It Happens

    Kubernetes Services are routinely accessed by DNS name — for example

    api.production.svc.cluster.local
    — rather than ClusterIP. CoreDNS is the in-cluster authoritative DNS resolver that translates these short names and fully-qualified names to ClusterIPs. If CoreDNS Pods are down, in
    CrashLoopBackOff
    , or if the CoreDNS ConfigMap (the Corefile) has been accidentally modified with a syntax error or incorrect upstream, DNS resolution fails cluster-wide. Applications that rely on DNS-based service discovery will report
    could not resolve host
    even though the underlying Service, Endpoints, and kube-proxy rules are perfectly healthy.

    How to Identify It

    Check CoreDNS Pod status in the kube-system namespace:

    kubectl get pods -n kube-system -l k8s-app=kube-dns
    NAME                       READY   STATUS             RESTARTS   AGE
    coredns-5d78c9869d-4xm7g   0/1     CrashLoopBackOff   8          22m
    coredns-5d78c9869d-9wqkf   1/1     Running            0          2d

    Run a DNS resolution test from a temporary debug Pod:

    kubectl run dns-test --image=busybox:1.36 --restart=Never -it --rm -- nslookup kubernetes.default
    ;; connection timed out; no servers could be reached

    Inspect the logs of the failing CoreDNS Pod:

    kubectl logs coredns-5d78c9869d-4xm7g -n kube-system
    [ERROR] plugin/errors: 2 SERVFAIL for kubernetes.default.svc.cluster.local. A
    [FATAL] Failed to initialize server: open /etc/coredns/Corefile: no such file or directory

    Also inspect the CoreDNS ConfigMap for corruption:

    kubectl get configmap coredns -n kube-system -o yaml

    How to Fix It

    If the Corefile ConfigMap has been damaged, restore it to a valid minimal configuration:

    kubectl edit configmap coredns -n kube-system
    # Ensure the data.Corefile key contains:
    .:53 {
        errors
        health {
           lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        forward . /etc/resolv.conf
        cache 30
        loop
        reload
        loadbalance
    }

    Then restart the CoreDNS Deployment so it picks up the fixed ConfigMap:

    kubectl rollout restart deployment coredns -n kube-system
    kubectl rollout status deployment coredns -n kube-system
    deployment "coredns" successfully rolled out

    Verify DNS resolution is restored:

    kubectl run dns-test --image=busybox:1.36 --restart=Never -it --rm -- nslookup kubernetes.default
    Server:    10.96.0.10
    Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
    Name:      kubernetes.default
    Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local

    Root Cause 5: NetworkPolicy Blocking Traffic

    Why It Happens

    Kubernetes

    NetworkPolicy
    resources implement firewall rules at the Pod level using the cluster's CNI plugin (Calico, Cilium, Weave, and others). A critical behavior to understand: once any NetworkPolicy selects a Pod as its target, all traffic not explicitly permitted by that policy — or any other policy targeting that Pod — is denied. Engineers commonly apply a default-deny policy to a namespace for security hardening, then forget to create matching allow rules for their application's traffic. The result is that inter-service calls that worked before the policy was applied are dropped at the CNI layer, often with no log entry at the application level — only connection timeouts.

    How to Identify It

    List all NetworkPolicies in the affected namespace:

    kubectl get networkpolicy -n production
    NAME                      POD-SELECTOR   AGE
    default-deny-all          <none>         3d
    allow-ingress-to-api      app=api        3d

    Describe the deny policy to confirm it targets all Pods and blocks all traffic:

    kubectl describe networkpolicy default-deny-all -n production
    Name:         default-deny-all
    Namespace:    production
    Pod Selector: <none> (Selects all Pods in namespace)
    Policy Types: Ingress, Egress
    Allowing ingress traffic:
      <none> (Selected pods are isolated for ingress connectivity)
    Allowing egress traffic:
      <none> (Selected pods are isolated for egress connectivity)

    Confirm the connection is being dropped with a direct connectivity test:

    kubectl exec -it debug-pod -n production -- curl -v --max-time 5 http://10.96.45.12:8080
    * Trying 10.96.45.12:8080...
    * connect to 10.96.45.12 port 8080 failed: Connection timed out
    curl: (28) Connection timed out after 5001 milliseconds

    With Calico installed you can also inspect policy enforcement decisions:

    calicoctl get networkpolicy -n production -o yaml | grep -A10 selector

    How to Fix It

    Create an explicit ingress allow policy that permits the required source Pod to reach the destination Pod on the correct port:

    kubectl apply -f - <<'EOF'
    apiVersion: networking.k8s.io/v1
    kind: NetworkPolicy
    metadata:
      name: allow-frontend-to-api
      namespace: production
    spec:
      podSelector:
        matchLabels:
          app: api
      ingress:
      - from:
        - podSelector:
            matchLabels:
              app: web-frontend
        ports:
        - protocol: TCP
          port: 8080
    EOF

    If a default-deny-egress policy is also present, add a matching egress allow from the source Pod:

    kubectl apply -f - <<'EOF'
    apiVersion: networking.k8s.io/v1
    kind: NetworkPolicy
    metadata:
      name: allow-frontend-egress-to-api
      namespace: production
    spec:
      podSelector:
        matchLabels:
          app: web-frontend
      egress:
      - to:
        - podSelector:
            matchLabels:
              app: api
        ports:
        - protocol: TCP
          port: 8080
      policyTypes:
      - Egress
    EOF

    Retest connectivity and confirm you now receive an HTTP response:

    kubectl exec -it debug-pod -n production -- curl -s -o /dev/null -w "%{http_code}" http://10.96.45.12:8080
    200

    Root Cause 6: Service Port or TargetPort Misconfiguration

    Why It Happens

    A Service exposes a

    port
    — the port clients use when reaching the Service — and a
    targetPort
    — the port on the Pod where the application actually listens. If
    targetPort
    does not match the container's listening port, TCP connections establish successfully to the Service ClusterIP (because kube-proxy happily creates rules for whatever targetPort you specify) but are forwarded to the wrong port on the Pod, resulting in immediate connection refused errors. This frequently occurs when a container image is updated with a different default port and the Service manifest is not updated to match.

    How to Identify It

    Compare the Service targetPort against the actual container port:

    kubectl describe svc api -n production
    Port:              http  8080/TCP
    TargetPort:        8080/TCP
    Endpoints:         10.244.1.15:8080,10.244.2.9:8080
    
    kubectl describe pod api-5f7d9c84b-j4rlx -n production | grep -A3 Ports
        Port:          9090/TCP
        Host Port:     0/TCP

    The Service targets port 8080 but the container listens on 9090. All forwarded connections hit a closed port and are immediately refused.

    How to Fix It

    kubectl patch svc api -n production \
      --type='json' \
      -p='[{"op":"replace","path":"/spec/ports/0/targetPort","value":9090}]'
    
    kubectl get endpoints api -n production
    NAME   ENDPOINTS               AGE
    api    10.244.1.15:9090        4m

    Root Cause 7: Pod Readiness Probe Failure

    Why It Happens

    Kubernetes automatically removes a Pod from the Service's Endpoints list when its readiness probe fails. This is a safety feature designed to prevent traffic from reaching Pods that are not yet ready to serve requests. However, an incorrectly configured readiness probe — wrong HTTP path, wrong port, or a timeout too short for the application's startup time — causes healthy Pods to be continuously excluded from Endpoints. The Service exists, the Pods are running, but no traffic is ever forwarded.

    How to Identify It

    kubectl get pods -n production
    NAME                   READY   STATUS    RESTARTS   AGE
    api-5f7d9c84b-j4rlx   0/1     Running   0          8m
    
    kubectl describe pod api-5f7d9c84b-j4rlx -n production
    Readiness:  http-get http://:8080/healthz delay=5s timeout=1s period=10s
    Events:
      Warning  Unhealthy  30s   kubelet  Readiness probe failed: Get http://10.244.1.15:8080/healthz: dial tcp 10.244.1.15:8080: connect: connection refused

    The

    0/1 READY
    status indicates the Pod has been removed from Endpoints and is invisible to the Service.

    How to Fix It

    Update the readiness probe in the Deployment to use the correct path and port:

    kubectl patch deployment api -n production \
      --type='json' \
      -p='[{"op":"replace","path":"/spec/template/spec/containers/0/readinessProbe/httpGet/path","value":"/ready"}]'
    
    kubectl rollout status deployment api -n production
    deployment "api" successfully rolled out
    
    kubectl get endpoints api -n production
    NAME   ENDPOINTS             AGE
    api    10.244.1.15:8080      92s

    Prevention

    Avoiding Service reachability failures requires discipline across the full lifecycle of Kubernetes deployments. The following practices eliminate the most common failure modes before they reach production:

    • Validate manifests before applying. Run
      kubectl diff -f manifest.yaml
      to preview changes and
      kubectl apply --dry-run=server -f manifest.yaml
      to catch structural misconfigurations against the live API server before committing them.
    • Enforce label conventions with admission control. Use OPA/Gatekeeper or Kyverno to reject Deployments whose Pod template labels do not include the required selector keys. This catches label mismatches at admission time, long before a Service endpoint list goes empty.
    • Monitor kube-proxy health continuously. Alert on any discrepancy between the DaemonSet's desired replica count and its ready count. A single node running without kube-proxy creates a hard-to-diagnose partial routing failure where some requests succeed and others time out depending on which node the client Pod is scheduled on.
    • Protect CoreDNS from resource pressure. Assign CoreDNS to a high-priority PriorityClass so it is not evicted under node memory pressure. Set resource requests and limits carefully, and monitor its Prometheus metrics (exposed on port 9153) for request latency spikes and error rate increases.
    • Test NetworkPolicies in a staging namespace first. Use tools such as
      netassert
      or Cilium's built-in connectivity test suite to validate that allow and deny rules behave as expected before applying them to production. Always add a matching egress allow rule whenever you add an ingress allow rule between two namespaces.
    • Prohibit firewalld and ufw on Kubernetes nodes. These tools conflict with kube-proxy's iptables management. Disable and mask them during node provisioning via your configuration management tooling (Ansible, Chef, or cloud-init), and prevent re-installation through package management policies.
    • Design readiness probes carefully. A readiness probe should check application-level readiness — for example, that a database connection pool is established and the application is serving traffic. Use generous
      initialDelaySeconds
      and
      failureThreshold
      values to avoid evicting Pods during normal slow starts.
    • Include connectivity smoke tests in CI/CD pipelines. After every deployment, run a
      kubectl exec
      test from a Pod in the same namespace to verify the Service is reachable via its DNS name before marking the deployment successful. Gate production promotion on this check.

    Frequently Asked Questions

    How do I quickly check whether a Service has any healthy endpoints?

    Run `kubectl get endpoints <service-name> -n <namespace>`. If the ENDPOINTS column shows `<none>`, no Pods are currently matching the Service selector or passing their readiness probe. If it shows IP:port pairs, the Service has backing Pods. You can also run `kubectl describe endpoints <service-name>` for more detail including NotReadyAddresses, which shows Pods that exist but are failing their readiness probe.

    What does `<none>` mean in the output of kubectl get endpoints?

    `<none>` means the Endpoints object exists (it is created automatically with the Service) but contains zero ready addresses. This happens when no running Pods match the Service's label selector, when all matching Pods are failing their readiness probe, or when matching Pods exist but are in a Terminating state. Start by comparing `kubectl get svc -o jsonpath='{.spec.selector}'` against `kubectl get pods --show-labels` to rule out a label mismatch.

    How do I test Service connectivity from inside the cluster without deploying a dedicated test application?

    Run a temporary busybox or curl Pod: `kubectl run connectivity-test --image=curlimages/curl:8.6.0 --restart=Never -it --rm -n <namespace> -- curl -v http://<service-name>:<port>`. This Pod is created in the same namespace as your Service and is deleted automatically when the session ends. For DNS testing, use `kubectl run dns-test --image=busybox:1.36 --restart=Never -it --rm -- nslookup <service-name>.<namespace>.svc.cluster.local`.

    Can I bypass CoreDNS and use the ClusterIP directly to isolate a DNS problem?

    Yes. Run `kubectl get svc <service-name> -n <namespace> -o jsonpath='{.spec.clusterIP}'` to retrieve the ClusterIP, then `curl http://<clusterIP>:<port>` from inside a Pod. If the ClusterIP works but the DNS name does not, the problem is isolated to CoreDNS. If both fail, the problem is at the iptables/kube-proxy layer or in NetworkPolicy rules.

    How do I distinguish between a NetworkPolicy timeout and a kube-proxy routing failure?

    NetworkPolicy drops produce silent connection timeouts — the TCP SYN is never answered. A kube-proxy or iptables failure also produces timeouts when the ClusterIP is unreachable. The key difference: if you can reach the Pod IP directly (bypassing the Service) and the connection times out, NetworkPolicy is likely the cause. If direct Pod-to-Pod connectivity works but Service ClusterIP fails, suspect kube-proxy or iptables. Check `kubectl get networkpolicy -n <namespace>` first, then `iptables-save | grep KUBE` on the node.

    What is the difference between kube-proxy iptables mode and IPVS mode, and does it affect troubleshooting?

    In iptables mode, kube-proxy writes DNAT rules into the KUBE-SERVICES chain. In IPVS mode, it programs the Linux kernel's IP Virtual Server. The failure modes differ: in iptables mode, look for missing KUBE-* chains with `iptables -L KUBE-SERVICES -n`. In IPVS mode, inspect virtual servers with `ipvsadm -Ln` — missing entries for your Service ClusterIP indicate kube-proxy has not synced. Check `kubectl get configmap kube-proxy -n kube-system -o yaml | grep mode` to identify which mode your cluster uses.

    How do I find which CNI plugin is running and whether it supports NetworkPolicy?

    Run `kubectl get pods -n kube-system -o wide` and look for Pods named after the CNI (calico-node, cilium, weave-net, flannel, etc.). Flannel does not support NetworkPolicy natively — if you apply NetworkPolicy objects in a Flannel-only cluster, they are silently ignored. Calico, Cilium, and Weave all enforce NetworkPolicy. If your cluster uses Flannel and you need NetworkPolicy, you must add a separate policy enforcement layer such as Calico in policy-only mode.

    Why does my Service work from some namespaces but not others?

    Cross-namespace Service access requires the full DNS name (`<service>.<namespace>.svc.cluster.local`) rather than the short name. Short names only resolve within the same namespace. Also check for NetworkPolicies that restrict ingress by `namespaceSelector` — a policy allowing traffic only from Pods in the `production` namespace will block requests from Pods in `staging` even if the selector otherwise matches. Inspect policies on the destination namespace with `kubectl get networkpolicy -n <destination-namespace> -o yaml`.

    How do I restore kube-proxy iptables rules immediately without restarting the node?

    Delete the kube-proxy Pod on the affected node; the DaemonSet controller will recreate it and kube-proxy will perform a full iptables sync on startup: `kubectl delete pod -n kube-system -l k8s-app=kube-proxy --field-selector spec.nodeName=<node-name>`. The new Pod rewrites all KUBE-* chains within seconds. You can confirm with `iptables-save | grep -c KUBE` — a healthy node typically shows 200 or more KUBE-related rules depending on cluster size.

    What tool can I use to visualize NetworkPolicy rules and test them before applying?

    Cilium's Network Policy Editor (available in the Cilium documentation) provides a visual interface for building and testing policies. The open-source `netassert` tool lets you write YAML-based connectivity assertions and test them against a live cluster. For Calico clusters, `calicoctl` can dump and display the full policy evaluation order. Kyverno includes a policy testing framework that can simulate NetworkPolicy scenarios without deploying to a live cluster.

    Related Articles