InfraRunBook
    Back to articles

    Kubernetes HPA Not Scaling

    Kubernetes
    Published: Apr 16, 2026
    Updated: Apr 16, 2026

    Your Kubernetes HPA isn't scaling pods even under load? This runbook covers the most common root causes — from missing metrics server to misconfigured targets — with real CLI commands and fixes.

    Kubernetes HPA Not Scaling

    Symptoms

    You've deployed your HPA, the pods are running, load is climbing — and nothing happens. The pod count stays exactly where it started. No new pods. No errors surfacing in the deployment. Just silence.

    When you run

    kubectl get hpa
    , you might see something like this:

    NAME         REFERENCE           TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
    my-api-hpa   Deployment/my-api   <unknown>/50%   2         10        2          14m

    That

    <unknown>
    is your first clue. Or maybe the target is showing a real percentage but replicas never change, even when the metric climbs well past the threshold. Or it scaled up fine but has been stuck at maximum replicas for hours despite load dropping off. Either way, something is broken in the scaling pipeline and it's failing quietly.

    The HPA controller is deceptively simple on the surface — watch a metric, compare to a target, adjust replicas. In practice, it depends on a chain of components all working correctly in sequence. This runbook walks through the most common failure points, how to confirm them, and how to fix them properly.


    Root Cause 1: Metrics Server Not Running

    In my experience, this is the single most common cause of HPA silence — especially in freshly provisioned clusters or self-managed Kubernetes environments. The HPA relies on the Metrics API to retrieve CPU and memory utilization data from pods and nodes. No metrics server means no data, and without data the HPA sits idle.

    Why It Happens

    Managed Kubernetes distributions like EKS, GKE, and AKS ship with the metrics server pre-installed. But kubeadm clusters, k3s, Talos, or any hand-rolled setup won't have it by default. It's easy to forget when you're focused on getting workloads running and nobody has tried to use autoscaling yet.

    How to Identify It

    Check whether the metrics server pod is present and healthy:

    kubectl get pods -n kube-system | grep metrics-server

    No output means it's not deployed. You can also try the API directly:

    kubectl top nodes

    If the Metrics API isn't available you'll get:

    error: Metrics API not available

    Run

    kubectl describe hpa my-api-hpa
    and look for events like:

    Warning  FailedGetResourceMetric  2m  horizontal-pod-autoscaler
      unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API:
      the server could not find the requested resource

    How to Fix It

    Deploy the metrics server from the official release manifest:

    kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

    On clusters where the kubelet doesn't serve verified TLS — common in dev and bare-metal environments — the metrics server will fail to connect to kubelets and stay in a crash loop. You'll need to add the

    --kubelet-insecure-tls
    flag to the container args:

    kubectl edit deployment metrics-server -n kube-system

    Under

    spec.containers[0].args
    , add:

    - --kubelet-insecure-tls

    After a minute or two,

    kubectl top nodes
    should return real utilization data and the HPA will begin reporting metrics instead of
    <unknown>
    .


    Root Cause 2: Resource Requests Not Set

    This one catches even experienced engineers off guard. The HPA calculates CPU utilization as a percentage of the pod's requested CPU — not the node's total capacity, not the container's limit. If your pods don't define resource requests, the HPA has nothing to divide against and reports

    <unknown>
    for the target utilization.

    Why It Happens

    Developers routinely omit resource requests during early development. The pods run fine without them, the application works, nobody notices — until someone sets up autoscaling and wonders why it never triggers. Quick-start guides and Helm chart defaults are frequently the culprit here too.

    How to Identify It

    Describe the HPA and look for the

    Conditions
    block:

    kubectl describe hpa my-api-hpa
    Conditions:
      Type            Status  Reason            Message
      ----            ------  ------            -------
      ScalingActive   False   FailedGetScale    the HPA was unable to compute the replica count:
                                                failed to get cpu utilization: missing request for cpu

    You can verify missing requests directly on the pods:

    kubectl get pods -l app=my-api -o jsonpath='{.items[*].spec.containers[*].resources}'

    An output of

    {} {} {}
    confirms there are no resource definitions at all.

    How to Fix It

    Don't guess at request values. First, deploy the metrics server and run the application under realistic load, then observe with

    kubectl top pods
    . Use that baseline to set requests conservatively — enough to represent normal usage, not peak burst. A typical pattern:

    resources:
      requests:
        cpu: "250m"
        memory: "256Mi"
      limits:
        cpu: "1"
        memory: "512Mi"

    After the deployment rolls out with requests set, the HPA will begin computing utilization within one scrape interval — typically 15 to 30 seconds.

    To prevent this at the cluster level, enforce resource requests as an admission policy using Kyverno or OPA/Gatekeeper. A simple policy that rejects pods without CPU requests eliminates this entire class of HPA failures.


    Root Cause 3: Custom Metrics Not Available

    When you move beyond CPU and memory — scaling on HTTP request rate, queue depth, latency percentiles, or any application-level signal — you're working with the

    custom.metrics.k8s.io
    API. This requires a custom metrics adapter such as Prometheus Adapter or KEDA. If the adapter isn't running, isn't configured correctly, or if the metric name doesn't match exactly, the HPA will fail silently or show
    <unknown>
    .

    Why It Happens

    Custom metrics involve a longer chain: your application exposes a metric, Prometheus scrapes it, the Prometheus Adapter reads it, translates it via a PromQL query, and serves it through the Kubernetes API. The metric name in your HPA spec must exactly match what the adapter exposes — including case sensitivity and the full resource path. A single character difference means the HPA gets nothing back.

    How to Identify It

    Check whether the custom metrics API endpoint is reachable at all:

    kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq .

    A 404 or connection error means the adapter isn't running. If the API responds, check whether your specific metric exists:

    kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/production/pods/*/http_requests_per_second" | jq .

    Then check HPA events:

    kubectl describe hpa my-api-hpa -n production
    Warning  FailedGetPodsMetric  3m  horizontal-pod-autoscaler
      failed to get pods metric value: unable to get metric http_requests_per_second:
      no metrics returned from custom metrics API

    How to Fix It

    Start with the Prometheus Adapter ConfigMap and verify the rule covering your metric:

    kubectl get configmap adapter-config -n monitoring -o yaml

    The

    rules
    block needs a
    seriesQuery
    matching the Prometheus metric, a
    name
    section that maps to the name used in your HPA, and a
    metricsQuery
    that produces the right value. After modifying the ConfigMap, restart the adapter to pick up the change:

    kubectl rollout restart deployment prometheus-adapter -n monitoring

    Always validate the metric appears in the custom metrics API before testing the HPA. Working backwards from the API endpoint is far faster than iterating on the HPA spec itself.


    Root Cause 4: Target Utilization Set Wrong

    Sometimes the HPA is functioning correctly — it's just configured in a way that will never trigger. Either the target is too high to be realistic given the application's actual load profile, or it's so low that the HPA immediately scales to maximum replicas and stays there. Both situations look like broken behavior but are actually misconfiguration.

    Why It Happens

    I've seen engineers set a target of

    80
    on an application that never pushes past 20% CPU under normal peak traffic. The HPA will genuinely never fire — and that's the right behavior given the config. On the flip side, I've seen targets set at
    10
    by engineers who misread the docs and thought the value was absolute millicores rather than a percentage of the request. At 10% utilization as the scale-out trigger, every moderately busy pod causes an explosion of replicas.

    How to Identify It

    Run describe and read the Metrics section carefully:

    kubectl describe hpa my-api-hpa
    Metrics:                                              ( current / target )
      resource cpu on pods  (as a percentage of request):  43% (43m) / 80%

    If current usage is 43% and target is 80%, the HPA is working exactly as configured — there's just no scaling reason. Cross-reference against real pod utilization:

    kubectl top pods -l app=my-api --sort-by=cpu

    If even your busiest pods are consistently below the threshold, the target is too high for your workload's actual resource profile.

    How to Fix It

    Profile the application properly before setting a target. Run a realistic load test, watch

    kubectl top pods
    , and understand what CPU looks like at 50%, 75%, and 100% of expected traffic. A safe general starting point is 60% for CPU — high enough to avoid over-provisioning, low enough to scale out before pods become saturated.

    Patch the HPA without recreating it:

    kubectl patch hpa my-api-hpa --patch \
      '{"spec":{"metrics":[{"type":"Resource","resource":{"name":"cpu","target":{"type":"Utilization","averageUtilization":60}}}]}}'

    After the patch, watch the HPA status update within the next scrape cycle:

    kubectl get hpa my-api-hpa --watch

    Root Cause 5: Cooldown Period (Stabilization Window)

    Load spikes. The HPA fires and scales up. Load drops. You wait. And wait. Pods don't scale back down. This is the stabilization window doing exactly what it was designed to do — and it's frequently mistaken for a broken HPA.

    Why It Happens

    The Kubernetes HPA v2 controller has a built-in stabilization window to prevent flapping. By default, scale-down is stabilized over 300 seconds (five full minutes). The HPA will not reduce replicas until the recommended replica count has been consistently lower than the current count for the entire stabilization window. Scale-up has a default stabilization window of zero seconds — it acts immediately, but scale-down is deliberately cautious.

    Beyond the stabilization window itself, there are also

    scaleUp
    and
    scaleDown
    policy constraints that limit how many pods can be added or removed in a given time period. These stack on top of the stabilization window and can make scale-down look even slower than expected.

    How to Identify It

    kubectl describe hpa my-api-hpa
    Conditions:
      Type            Status  Reason                Message
      ----            ------  ------                -------
      AbleToScale     True    ScaleDownStabilized   recent recommendations were higher than current one,
                                                    applying the highest recent recommendation

    That

    ScaleDownStabilized
    condition tells you the HPA is aware of the lower recommendation but is holding position. Check whether there's an explicit behavior spec in the HPA:

    kubectl get hpa my-api-hpa -o yaml | grep -A 20 behavior

    How to Fix It

    If the default five-minute scale-down window is too slow for your use case — batch processing, event-driven workloads, or cost-sensitive environments — tune it with the

    behavior
    stanza:

    spec:
      behavior:
        scaleDown:
          stabilizationWindowSeconds: 60
          policies:
          - type: Percent
            value: 50
            periodSeconds: 60
        scaleUp:
          stabilizationWindowSeconds: 0
          policies:
          - type: Percent
            value: 100
            periodSeconds: 15

    This configuration reduces scale-down stabilization to 60 seconds and permits removing up to 50% of pods per minute. Don't go too aggressive with scale-down on traffic-facing workloads — if your traffic is spiky and pod startup is slow, you'll scale down during a lull and then scramble to recover when the next burst hits. Monitor HPA events closely after any behavior change:

    kubectl get events --field-selector involvedObject.kind=HorizontalPodAutoscaler --sort-by=.lastTimestamp

    Root Cause 6: RBAC Preventing Metric Collection

    Less common but worth knowing: the HPA controller itself may lack permissions to query the Metrics API. This can surface after cluster upgrades, namespace migrations, or in hardened environments with custom RBAC policies that inadvertently restrict the system controllers.

    Why It Happens

    The HPA controller runs as

    system:controller:horizontal-pod-autoscaler
    . It needs read access to
    metrics.k8s.io
    for built-in resource metrics and to
    custom.metrics.k8s.io
    for custom metrics. If a cluster upgrade overwrites or removes the default ClusterRole bindings, or if a security hardening runbook accidentally restricts API group access, the controller starts getting 403s it can't act on.

    How to Identify It

    kubectl describe hpa my-api-hpa
    Warning  FailedGetResourceMetric  horizontal-pod-autoscaler
      failed to get cpu utilization: unable to fetch metrics from resource metrics API:
      the server is currently unable to handle the request

    Check the ClusterRoleBinding for the HPA controller:

    kubectl get clusterrolebinding system:controller:horizontal-pod-autoscaler -o yaml

    Also check the associated ClusterRole to verify the

    metrics.k8s.io
    API group is still present:

    kubectl get clusterrole system:controller:horizontal-pod-autoscaler -o yaml

    How to Fix It

    In most cases, running

    kubeadm upgrade apply
    for the target version will restore the default RBAC. If you've manually modified cluster roles, ensure the
    metrics.k8s.io
    and
    custom.metrics.k8s.io
    API groups are present with
    get
    and
    list
    verbs. Avoid editing these system ClusterRoles directly; use aggregated RBAC labels instead so that upgrades don't clobber your changes.


    Root Cause 7: HPA Referencing a Deprecated API Version

    Cluster upgrades can silently break HPAs that reference deprecated API versions. When Kubernetes removes

    autoscaling/v2beta1
    or
    autoscaling/v2beta2
    from the API server, any HPA manifest using those versions stops functioning — but may not immediately surface an obvious error.

    How to Identify It

    kubectl get hpa my-api-hpa -o yaml | grep apiVersion

    Compare against what the cluster currently serves:

    kubectl api-versions | grep autoscaling

    If your HPA is using

    autoscaling/v2beta2
    but only
    autoscaling/v2
    is available, the existing object persists in etcd but won't be reconciled correctly.

    How to Fix It

    Export the current HPA, update the API version and metrics spec format to

    autoscaling/v2
    , then reapply:

    kubectl get hpa my-api-hpa -o yaml > hpa-backup.yaml
    # Edit hpa-backup.yaml: set apiVersion to autoscaling/v2
    # Remove status block and resourceVersion before re-applying
    kubectl apply -f hpa-updated.yaml

    The v2 spec uses a structured

    metrics
    array with typed entries and a
    behavior
    block — a cleaner format than the beta versions and the only one you should be writing new HPAs against.


    Prevention

    Preventing HPA failures is about building validation into your delivery process, not discovering breakage during an incident.

    The most impactful change you can make is enforcing resource requests at the admission layer. A Kyverno policy that rejects pods without CPU requests costs five minutes to write and eliminates an entire category of HPA failures permanently. Do it. Don't rely on documentation or team convention — those erode under deadline pressure.

    Treat the metrics server like CoreDNS: it's infrastructure, not an optional add-on. Include it in your cluster bootstrap automation. Add a health check for it in your cluster readiness tests — if

    kubectl top nodes
    fails, the cluster isn't ready for production workloads that depend on autoscaling.

    Add a post-deploy HPA validation step to your CI/CD pipeline. After applying manifests, wait 30 seconds and run

    kubectl describe hpa
    for every HPA in the namespace. If any target shows
    <unknown>
    or any condition shows
    ScalingActive: False
    , fail the deploy. A shell check is enough:

    kubectl get hpa -n production -o jsonpath='{.items[*].status.conditions[?(@.type=="ScalingActive")].status}' | grep -v True && echo "All HPAs healthy" || echo "HPA check failed"

    For custom metrics, validate the full pipeline independently before wiring up the HPA. Hit

    /apis/custom.metrics.k8s.io/v1beta1
    directly, confirm your metric is present, and verify the value looks sane. If the metric isn't there, the HPA won't find it either — and it won't tell you why in any helpful way.

    Finally, document your stabilization window and behavior choices alongside the HPA manifests in your GitOps repo. It's easy to forget six months later why scale-down was tuned to 45 seconds rather than the default 300. Your on-call colleague at 2am will appreciate the context, and you'll avoid someone "fixing" the short window back to default because it looks like a misconfiguration.

    The HPA is a powerful and reliable primitive when the full chain — metrics server, resource requests, metrics API, correct targets — is intact. Most failures I've debugged came down to one link in that chain being broken or absent. Learn to validate each link independently, and HPA debugging becomes a five-minute exercise instead of a multi-hour incident.

    Frequently Asked Questions

    Why does my HPA show <unknown> for the target metric?

    The most common reasons are that the metrics server is not running in the cluster, or that the pods targeted by the HPA do not have CPU resource requests defined. Run 'kubectl describe hpa <name>' and check the Conditions block for a FailedGetResourceMetric or FailedGetScale event to confirm the exact cause.

    How long does it take for the HPA to scale down after load drops?

    By default, the HPA applies a 300-second (5-minute) stabilization window to scale-down decisions. This means pods won't be removed until the lower replica count recommendation has been stable for the full 5 minutes. You can reduce this by configuring a custom 'behavior.scaleDown.stabilizationWindowSeconds' value in the HPA spec.

    Can the HPA scale based on custom application metrics like request rate?

    Yes, but it requires deploying a custom metrics adapter such as Prometheus Adapter or KEDA, which serves metrics through the 'custom.metrics.k8s.io' API. The metric name in the HPA spec must exactly match what the adapter exposes. Validate the metric is available by querying the API directly with 'kubectl get --raw' before configuring the HPA.

    What is the difference between CPU requests and CPU limits for HPA purposes?

    The HPA uses CPU requests — not limits — as the denominator when calculating utilization percentage. If a pod requests 200m CPU and a target of 80% is set, the HPA scales out when average usage exceeds 160m. Limits are irrelevant to the HPA calculation; only requests determine the scaling threshold.

    Will upgrading Kubernetes break existing HPAs?

    It can. Kubernetes periodically removes deprecated API versions such as autoscaling/v2beta1 and autoscaling/v2beta2. HPAs stored in etcd using those versions may stop being reconciled after an upgrade. Always migrate HPA manifests to autoscaling/v2 and validate with 'kubectl api-versions | grep autoscaling' after a cluster upgrade.

    Related Articles