Kubernetes HPA Not Scaling

Symptoms

You've deployed your HPA, the pods are running, load is climbing — and nothing happens. The pod count stays exactly where it started. No new pods. No errors surfacing in the deployment. Just silence.

When you run

kubectl get hpa

, you might see something like this:

NAME         REFERENCE           TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
my-api-hpa   Deployment/my-api   <unknown>/50%   2         10        2          14m

That

<unknown>

is your first clue. Or maybe the target is showing a real percentage but replicas never change, even when the metric climbs well past the threshold. Or it scaled up fine but has been stuck at maximum replicas for hours despite load dropping off. Either way, something is broken in the scaling pipeline and it's failing quietly.

The HPA controller is deceptively simple on the surface — watch a metric, compare to a target, adjust replicas. In practice, it depends on a chain of components all working correctly in sequence. This runbook walks through the most common failure points, how to confirm them, and how to fix them properly.

Root Cause 1: Metrics Server Not Running

In my experience, this is the single most common cause of HPA silence — especially in freshly provisioned clusters or self-managed Kubernetes environments. The HPA relies on the Metrics API to retrieve CPU and memory utilization data from pods and nodes. No metrics server means no data, and without data the HPA sits idle.

Why It Happens

Managed Kubernetes distributions like EKS, GKE, and AKS ship with the metrics server pre-installed. But kubeadm clusters, k3s, Talos, or any hand-rolled setup won't have it by default. It's easy to forget when you're focused on getting workloads running and nobody has tried to use autoscaling yet.

How to Identify It

Check whether the metrics server pod is present and healthy:

kubectl get pods -n kube-system | grep metrics-server

No output means it's not deployed. You can also try the API directly:

kubectl top nodes

If the Metrics API isn't available you'll get:

error: Metrics API not available

Run

kubectl describe hpa my-api-hpa

and look for events like:

Warning  FailedGetResourceMetric  2m  horizontal-pod-autoscaler
  unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API:
  the server could not find the requested resource

How to Fix It

Deploy the metrics server from the official release manifest:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

On clusters where the kubelet doesn't serve verified TLS — common in dev and bare-metal environments — the metrics server will fail to connect to kubelets and stay in a crash loop. You'll need to add the

--kubelet-insecure-tls

flag to the container args:

kubectl edit deployment metrics-server -n kube-system

Under

spec.containers[0].args

, add:

- --kubelet-insecure-tls

After a minute or two,

kubectl top nodes

should return real utilization data and the HPA will begin reporting metrics instead of

<unknown>

Root Cause 2: Resource Requests Not Set

This one catches even experienced engineers off guard. The HPA calculates CPU utilization as a percentage of the pod's requested CPU — not the node's total capacity, not the container's limit. If your pods don't define resource requests, the HPA has nothing to divide against and reports

<unknown>

for the target utilization.

Why It Happens

Developers routinely omit resource requests during early development. The pods run fine without them, the application works, nobody notices — until someone sets up autoscaling and wonders why it never triggers. Quick-start guides and Helm chart defaults are frequently the culprit here too.

How to Identify It

Describe the HPA and look for the

Conditions

block:

kubectl describe hpa my-api-hpa

Conditions:
  Type            Status  Reason            Message
  ----            ------  ------            -------
  ScalingActive   False   FailedGetScale    the HPA was unable to compute the replica count:
                                            failed to get cpu utilization: missing request for cpu

You can verify missing requests directly on the pods:

kubectl get pods -l app=my-api -o jsonpath='{.items[*].spec.containers[*].resources}'

An output of

{} {} {}

confirms there are no resource definitions at all.

How to Fix It

Don't guess at request values. First, deploy the metrics server and run the application under realistic load, then observe with

kubectl top pods

. Use that baseline to set requests conservatively — enough to represent normal usage, not peak burst. A typical pattern:

resources:
  requests:
    cpu: "250m"
    memory: "256Mi"
  limits:
    cpu: "1"
    memory: "512Mi"

After the deployment rolls out with requests set, the HPA will begin computing utilization within one scrape interval — typically 15 to 30 seconds.

To prevent this at the cluster level, enforce resource requests as an admission policy using Kyverno or OPA/Gatekeeper. A simple policy that rejects pods without CPU requests eliminates this entire class of HPA failures.

Root Cause 3: Custom Metrics Not Available

When you move beyond CPU and memory — scaling on HTTP request rate, queue depth, latency percentiles, or any application-level signal — you're working with the

custom.metrics.k8s.io

API. This requires a custom metrics adapter such as Prometheus Adapter or KEDA. If the adapter isn't running, isn't configured correctly, or if the metric name doesn't match exactly, the HPA will fail silently or show

<unknown>

Why It Happens

Custom metrics involve a longer chain: your application exposes a metric, Prometheus scrapes it, the Prometheus Adapter reads it, translates it via a PromQL query, and serves it through the Kubernetes API. The metric name in your HPA spec must exactly match what the adapter exposes — including case sensitivity and the full resource path. A single character difference means the HPA gets nothing back.

How to Identify It

Check whether the custom metrics API endpoint is reachable at all:

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq .

A 404 or connection error means the adapter isn't running. If the API responds, check whether your specific metric exists:

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/production/pods/*/http_requests_per_second" | jq .

Then check HPA events:

kubectl describe hpa my-api-hpa -n production

Warning  FailedGetPodsMetric  3m  horizontal-pod-autoscaler
  failed to get pods metric value: unable to get metric http_requests_per_second:
  no metrics returned from custom metrics API

How to Fix It

Start with the Prometheus Adapter ConfigMap and verify the rule covering your metric:

kubectl get configmap adapter-config -n monitoring -o yaml

The

rules

block needs a

seriesQuery

matching the Prometheus metric, a

name

section that maps to the name used in your HPA, and a

metricsQuery

that produces the right value. After modifying the ConfigMap, restart the adapter to pick up the change:

kubectl rollout restart deployment prometheus-adapter -n monitoring

Always validate the metric appears in the custom metrics API before testing the HPA. Working backwards from the API endpoint is far faster than iterating on the HPA spec itself.

Root Cause 4: Target Utilization Set Wrong

Sometimes the HPA is functioning correctly — it's just configured in a way that will never trigger. Either the target is too high to be realistic given the application's actual load profile, or it's so low that the HPA immediately scales to maximum replicas and stays there. Both situations look like broken behavior but are actually misconfiguration.

Why It Happens

I've seen engineers set a target of

80

on an application that never pushes past 20% CPU under normal peak traffic. The HPA will genuinely never fire — and that's the right behavior given the config. On the flip side, I've seen targets set at

10

by engineers who misread the docs and thought the value was absolute millicores rather than a percentage of the request. At 10% utilization as the scale-out trigger, every moderately busy pod causes an explosion of replicas.

How to Identify It

Run describe and read the Metrics section carefully:

kubectl describe hpa my-api-hpa

Metrics:                                              ( current / target )
  resource cpu on pods  (as a percentage of request):  43% (43m) / 80%

If current usage is 43% and target is 80%, the HPA is working exactly as configured — there's just no scaling reason. Cross-reference against real pod utilization:

kubectl top pods -l app=my-api --sort-by=cpu

If even your busiest pods are consistently below the threshold, the target is too high for your workload's actual resource profile.

How to Fix It

Profile the application properly before setting a target. Run a realistic load test, watch

kubectl top pods

, and understand what CPU looks like at 50%, 75%, and 100% of expected traffic. A safe general starting point is 60% for CPU — high enough to avoid over-provisioning, low enough to scale out before pods become saturated.

Patch the HPA without recreating it:

kubectl patch hpa my-api-hpa --patch \
  '{"spec":{"metrics":[{"type":"Resource","resource":{"name":"cpu","target":{"type":"Utilization","averageUtilization":60}}}]}}'

After the patch, watch the HPA status update within the next scrape cycle:

kubectl get hpa my-api-hpa --watch

Root Cause 5: Cooldown Period (Stabilization Window)

Load spikes. The HPA fires and scales up. Load drops. You wait. And wait. Pods don't scale back down. This is the stabilization window doing exactly what it was designed to do — and it's frequently mistaken for a broken HPA.

Why It Happens

The Kubernetes HPA v2 controller has a built-in stabilization window to prevent flapping. By default, scale-down is stabilized over 300 seconds (five full minutes). The HPA will not reduce replicas until the recommended replica count has been consistently lower than the current count for the entire stabilization window. Scale-up has a default stabilization window of zero seconds — it acts immediately, but scale-down is deliberately cautious.

Beyond the stabilization window itself, there are also

scaleUp

and

scaleDown

policy constraints that limit how many pods can be added or removed in a given time period. These stack on top of the stabilization window and can make scale-down look even slower than expected.

How to Identify It

kubectl describe hpa my-api-hpa

Conditions:
  Type            Status  Reason                Message
  ----            ------  ------                -------
  AbleToScale     True    ScaleDownStabilized   recent recommendations were higher than current one,
                                                applying the highest recent recommendation

That

ScaleDownStabilized

condition tells you the HPA is aware of the lower recommendation but is holding position. Check whether there's an explicit behavior spec in the HPA:

kubectl get hpa my-api-hpa -o yaml | grep -A 20 behavior

How to Fix It

If the default five-minute scale-down window is too slow for your use case — batch processing, event-driven workloads, or cost-sensitive environments — tune it with the

behavior

stanza:

spec:
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 60
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15

This configuration reduces scale-down stabilization to 60 seconds and permits removing up to 50% of pods per minute. Don't go too aggressive with scale-down on traffic-facing workloads — if your traffic is spiky and pod startup is slow, you'll scale down during a lull and then scramble to recover when the next burst hits. Monitor HPA events closely after any behavior change:

kubectl get events --field-selector involvedObject.kind=HorizontalPodAutoscaler --sort-by=.lastTimestamp

Root Cause 6: RBAC Preventing Metric Collection

Less common but worth knowing: the HPA controller itself may lack permissions to query the Metrics API. This can surface after cluster upgrades, namespace migrations, or in hardened environments with custom RBAC policies that inadvertently restrict the system controllers.

Why It Happens

The HPA controller runs as

system:controller:horizontal-pod-autoscaler

. It needs read access to

metrics.k8s.io

for built-in resource metrics and to

custom.metrics.k8s.io

for custom metrics. If a cluster upgrade overwrites or removes the default ClusterRole bindings, or if a security hardening runbook accidentally restricts API group access, the controller starts getting 403s it can't act on.

How to Identify It

kubectl describe hpa my-api-hpa

Warning  FailedGetResourceMetric  horizontal-pod-autoscaler
  failed to get cpu utilization: unable to fetch metrics from resource metrics API:
  the server is currently unable to handle the request

Check the ClusterRoleBinding for the HPA controller:

kubectl get clusterrolebinding system:controller:horizontal-pod-autoscaler -o yaml

Also check the associated ClusterRole to verify the

metrics.k8s.io

API group is still present:

kubectl get clusterrole system:controller:horizontal-pod-autoscaler -o yaml

How to Fix It

In most cases, running

kubeadm upgrade apply

for the target version will restore the default RBAC. If you've manually modified cluster roles, ensure the

metrics.k8s.io

and

custom.metrics.k8s.io

API groups are present with

get

and

list

verbs. Avoid editing these system ClusterRoles directly; use aggregated RBAC labels instead so that upgrades don't clobber your changes.

Root Cause 7: HPA Referencing a Deprecated API Version

Cluster upgrades can silently break HPAs that reference deprecated API versions. When Kubernetes removes

autoscaling/v2beta1

autoscaling/v2beta2

from the API server, any HPA manifest using those versions stops functioning — but may not immediately surface an obvious error.

How to Identify It

kubectl get hpa my-api-hpa -o yaml | grep apiVersion

Compare against what the cluster currently serves:

kubectl api-versions | grep autoscaling

If your HPA is using

autoscaling/v2beta2

but only

autoscaling/v2

is available, the existing object persists in etcd but won't be reconciled correctly.

How to Fix It

Export the current HPA, update the API version and metrics spec format to

autoscaling/v2

, then reapply:

kubectl get hpa my-api-hpa -o yaml > hpa-backup.yaml
# Edit hpa-backup.yaml: set apiVersion to autoscaling/v2
# Remove status block and resourceVersion before re-applying
kubectl apply -f hpa-updated.yaml

The v2 spec uses a structured

metrics

array with typed entries and a

behavior

block — a cleaner format than the beta versions and the only one you should be writing new HPAs against.

Prevention

Preventing HPA failures is about building validation into your delivery process, not discovering breakage during an incident.

The most impactful change you can make is enforcing resource requests at the admission layer. A Kyverno policy that rejects pods without CPU requests costs five minutes to write and eliminates an entire category of HPA failures permanently. Do it. Don't rely on documentation or team convention — those erode under deadline pressure.

Treat the metrics server like CoreDNS: it's infrastructure, not an optional add-on. Include it in your cluster bootstrap automation. Add a health check for it in your cluster readiness tests — if

kubectl top nodes

fails, the cluster isn't ready for production workloads that depend on autoscaling.

Add a post-deploy HPA validation step to your CI/CD pipeline. After applying manifests, wait 30 seconds and run

kubectl describe hpa

for every HPA in the namespace. If any target shows

<unknown>

or any condition shows

ScalingActive: False

, fail the deploy. A shell check is enough:

kubectl get hpa -n production -o jsonpath='{.items[*].status.conditions[?(@.type=="ScalingActive")].status}' | grep -v True && echo "All HPAs healthy" || echo "HPA check failed"

For custom metrics, validate the full pipeline independently before wiring up the HPA. Hit

/apis/custom.metrics.k8s.io/v1beta1

directly, confirm your metric is present, and verify the value looks sane. If the metric isn't there, the HPA won't find it either — and it won't tell you why in any helpful way.

Finally, document your stabilization window and behavior choices alongside the HPA manifests in your GitOps repo. It's easy to forget six months later why scale-down was tuned to 45 seconds rather than the default 300. Your on-call colleague at 2am will appreciate the context, and you'll avoid someone "fixing" the short window back to default because it looks like a misconfiguration.

The HPA is a powerful and reliable primitive when the full chain — metrics server, resource requests, metrics API, correct targets — is intact. Most failures I've debugged came down to one link in that chain being broken or absent. Learn to validate each link independently, and HPA debugging becomes a five-minute exercise instead of a multi-hour incident.

Kubernetes HPA Not Scaling

Symptoms

Root Cause 1: Metrics Server Not Running

Why It Happens

How to Identify It

How to Fix It

Root Cause 2: Resource Requests Not Set

Why It Happens

How to Identify It

How to Fix It

Root Cause 3: Custom Metrics Not Available

Why It Happens

How to Identify It

How to Fix It

Root Cause 4: Target Utilization Set Wrong

Why It Happens

How to Identify It

How to Fix It

Root Cause 5: Cooldown Period (Stabilization Window)

Why It Happens

How to Identify It

How to Fix It

Root Cause 6: RBAC Preventing Metric Collection

Why It Happens

How to Identify It

How to Fix It

Root Cause 7: HPA Referencing a Deprecated API Version

How to Identify It

How to Fix It

Prevention

Frequently Asked Questions

Why does my HPA show <unknown> for the target metric?

How long does it take for the HPA to scale down after load drops?

Can the HPA scale based on custom application metrics like request rate?

What is the difference between CPU requests and CPU limits for HPA purposes?

Will upgrading Kubernetes break existing HPAs?

Related Articles