Symptoms
You've deployed your HPA, the pods are running, load is climbing — and nothing happens. The pod count stays exactly where it started. No new pods. No errors surfacing in the deployment. Just silence.
When you run
kubectl get hpa, you might see something like this:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-api-hpa Deployment/my-api <unknown>/50% 2 10 2 14m
That
<unknown>is your first clue. Or maybe the target is showing a real percentage but replicas never change, even when the metric climbs well past the threshold. Or it scaled up fine but has been stuck at maximum replicas for hours despite load dropping off. Either way, something is broken in the scaling pipeline and it's failing quietly.
The HPA controller is deceptively simple on the surface — watch a metric, compare to a target, adjust replicas. In practice, it depends on a chain of components all working correctly in sequence. This runbook walks through the most common failure points, how to confirm them, and how to fix them properly.
Root Cause 1: Metrics Server Not Running
In my experience, this is the single most common cause of HPA silence — especially in freshly provisioned clusters or self-managed Kubernetes environments. The HPA relies on the Metrics API to retrieve CPU and memory utilization data from pods and nodes. No metrics server means no data, and without data the HPA sits idle.
Why It Happens
Managed Kubernetes distributions like EKS, GKE, and AKS ship with the metrics server pre-installed. But kubeadm clusters, k3s, Talos, or any hand-rolled setup won't have it by default. It's easy to forget when you're focused on getting workloads running and nobody has tried to use autoscaling yet.
How to Identify It
Check whether the metrics server pod is present and healthy:
kubectl get pods -n kube-system | grep metrics-server
No output means it's not deployed. You can also try the API directly:
kubectl top nodes
If the Metrics API isn't available you'll get:
error: Metrics API not available
Run
kubectl describe hpa my-api-hpaand look for events like:
Warning FailedGetResourceMetric 2m horizontal-pod-autoscaler
unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API:
the server could not find the requested resource
How to Fix It
Deploy the metrics server from the official release manifest:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
On clusters where the kubelet doesn't serve verified TLS — common in dev and bare-metal environments — the metrics server will fail to connect to kubelets and stay in a crash loop. You'll need to add the
--kubelet-insecure-tlsflag to the container args:
kubectl edit deployment metrics-server -n kube-system
Under
spec.containers[0].args, add:
- --kubelet-insecure-tls
After a minute or two,
kubectl top nodesshould return real utilization data and the HPA will begin reporting metrics instead of
<unknown>.
Root Cause 2: Resource Requests Not Set
This one catches even experienced engineers off guard. The HPA calculates CPU utilization as a percentage of the pod's requested CPU — not the node's total capacity, not the container's limit. If your pods don't define resource requests, the HPA has nothing to divide against and reports
<unknown>for the target utilization.
Why It Happens
Developers routinely omit resource requests during early development. The pods run fine without them, the application works, nobody notices — until someone sets up autoscaling and wonders why it never triggers. Quick-start guides and Helm chart defaults are frequently the culprit here too.
How to Identify It
Describe the HPA and look for the
Conditionsblock:
kubectl describe hpa my-api-hpa
Conditions:
Type Status Reason Message
---- ------ ------ -------
ScalingActive False FailedGetScale the HPA was unable to compute the replica count:
failed to get cpu utilization: missing request for cpu
You can verify missing requests directly on the pods:
kubectl get pods -l app=my-api -o jsonpath='{.items[*].spec.containers[*].resources}'
An output of
{} {} {}confirms there are no resource definitions at all.
How to Fix It
Don't guess at request values. First, deploy the metrics server and run the application under realistic load, then observe with
kubectl top pods. Use that baseline to set requests conservatively — enough to represent normal usage, not peak burst. A typical pattern:
resources:
requests:
cpu: "250m"
memory: "256Mi"
limits:
cpu: "1"
memory: "512Mi"
After the deployment rolls out with requests set, the HPA will begin computing utilization within one scrape interval — typically 15 to 30 seconds.
To prevent this at the cluster level, enforce resource requests as an admission policy using Kyverno or OPA/Gatekeeper. A simple policy that rejects pods without CPU requests eliminates this entire class of HPA failures.
Root Cause 3: Custom Metrics Not Available
When you move beyond CPU and memory — scaling on HTTP request rate, queue depth, latency percentiles, or any application-level signal — you're working with the
custom.metrics.k8s.ioAPI. This requires a custom metrics adapter such as Prometheus Adapter or KEDA. If the adapter isn't running, isn't configured correctly, or if the metric name doesn't match exactly, the HPA will fail silently or show
<unknown>.
Why It Happens
Custom metrics involve a longer chain: your application exposes a metric, Prometheus scrapes it, the Prometheus Adapter reads it, translates it via a PromQL query, and serves it through the Kubernetes API. The metric name in your HPA spec must exactly match what the adapter exposes — including case sensitivity and the full resource path. A single character difference means the HPA gets nothing back.
How to Identify It
Check whether the custom metrics API endpoint is reachable at all:
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq .
A 404 or connection error means the adapter isn't running. If the API responds, check whether your specific metric exists:
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/production/pods/*/http_requests_per_second" | jq .
Then check HPA events:
kubectl describe hpa my-api-hpa -n production
Warning FailedGetPodsMetric 3m horizontal-pod-autoscaler
failed to get pods metric value: unable to get metric http_requests_per_second:
no metrics returned from custom metrics API
How to Fix It
Start with the Prometheus Adapter ConfigMap and verify the rule covering your metric:
kubectl get configmap adapter-config -n monitoring -o yaml
The
rulesblock needs a
seriesQuerymatching the Prometheus metric, a
namesection that maps to the name used in your HPA, and a
metricsQuerythat produces the right value. After modifying the ConfigMap, restart the adapter to pick up the change:
kubectl rollout restart deployment prometheus-adapter -n monitoring
Always validate the metric appears in the custom metrics API before testing the HPA. Working backwards from the API endpoint is far faster than iterating on the HPA spec itself.
Root Cause 4: Target Utilization Set Wrong
Sometimes the HPA is functioning correctly — it's just configured in a way that will never trigger. Either the target is too high to be realistic given the application's actual load profile, or it's so low that the HPA immediately scales to maximum replicas and stays there. Both situations look like broken behavior but are actually misconfiguration.
Why It Happens
I've seen engineers set a target of
80on an application that never pushes past 20% CPU under normal peak traffic. The HPA will genuinely never fire — and that's the right behavior given the config. On the flip side, I've seen targets set at
10by engineers who misread the docs and thought the value was absolute millicores rather than a percentage of the request. At 10% utilization as the scale-out trigger, every moderately busy pod causes an explosion of replicas.
How to Identify It
Run describe and read the Metrics section carefully:
kubectl describe hpa my-api-hpa
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): 43% (43m) / 80%
If current usage is 43% and target is 80%, the HPA is working exactly as configured — there's just no scaling reason. Cross-reference against real pod utilization:
kubectl top pods -l app=my-api --sort-by=cpu
If even your busiest pods are consistently below the threshold, the target is too high for your workload's actual resource profile.
How to Fix It
Profile the application properly before setting a target. Run a realistic load test, watch
kubectl top pods, and understand what CPU looks like at 50%, 75%, and 100% of expected traffic. A safe general starting point is 60% for CPU — high enough to avoid over-provisioning, low enough to scale out before pods become saturated.
Patch the HPA without recreating it:
kubectl patch hpa my-api-hpa --patch \
'{"spec":{"metrics":[{"type":"Resource","resource":{"name":"cpu","target":{"type":"Utilization","averageUtilization":60}}}]}}'
After the patch, watch the HPA status update within the next scrape cycle:
kubectl get hpa my-api-hpa --watch
Root Cause 5: Cooldown Period (Stabilization Window)
Load spikes. The HPA fires and scales up. Load drops. You wait. And wait. Pods don't scale back down. This is the stabilization window doing exactly what it was designed to do — and it's frequently mistaken for a broken HPA.
Why It Happens
The Kubernetes HPA v2 controller has a built-in stabilization window to prevent flapping. By default, scale-down is stabilized over 300 seconds (five full minutes). The HPA will not reduce replicas until the recommended replica count has been consistently lower than the current count for the entire stabilization window. Scale-up has a default stabilization window of zero seconds — it acts immediately, but scale-down is deliberately cautious.
Beyond the stabilization window itself, there are also
scaleUpand
scaleDownpolicy constraints that limit how many pods can be added or removed in a given time period. These stack on top of the stabilization window and can make scale-down look even slower than expected.
How to Identify It
kubectl describe hpa my-api-hpa
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True ScaleDownStabilized recent recommendations were higher than current one,
applying the highest recent recommendation
That
ScaleDownStabilizedcondition tells you the HPA is aware of the lower recommendation but is holding position. Check whether there's an explicit behavior spec in the HPA:
kubectl get hpa my-api-hpa -o yaml | grep -A 20 behavior
How to Fix It
If the default five-minute scale-down window is too slow for your use case — batch processing, event-driven workloads, or cost-sensitive environments — tune it with the
behaviorstanza:
spec:
behavior:
scaleDown:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 50
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
This configuration reduces scale-down stabilization to 60 seconds and permits removing up to 50% of pods per minute. Don't go too aggressive with scale-down on traffic-facing workloads — if your traffic is spiky and pod startup is slow, you'll scale down during a lull and then scramble to recover when the next burst hits. Monitor HPA events closely after any behavior change:
kubectl get events --field-selector involvedObject.kind=HorizontalPodAutoscaler --sort-by=.lastTimestamp
Root Cause 6: RBAC Preventing Metric Collection
Less common but worth knowing: the HPA controller itself may lack permissions to query the Metrics API. This can surface after cluster upgrades, namespace migrations, or in hardened environments with custom RBAC policies that inadvertently restrict the system controllers.
Why It Happens
The HPA controller runs as
system:controller:horizontal-pod-autoscaler. It needs read access to
metrics.k8s.iofor built-in resource metrics and to
custom.metrics.k8s.iofor custom metrics. If a cluster upgrade overwrites or removes the default ClusterRole bindings, or if a security hardening runbook accidentally restricts API group access, the controller starts getting 403s it can't act on.
How to Identify It
kubectl describe hpa my-api-hpa
Warning FailedGetResourceMetric horizontal-pod-autoscaler
failed to get cpu utilization: unable to fetch metrics from resource metrics API:
the server is currently unable to handle the request
Check the ClusterRoleBinding for the HPA controller:
kubectl get clusterrolebinding system:controller:horizontal-pod-autoscaler -o yaml
Also check the associated ClusterRole to verify the
metrics.k8s.ioAPI group is still present:
kubectl get clusterrole system:controller:horizontal-pod-autoscaler -o yaml
How to Fix It
In most cases, running
kubeadm upgrade applyfor the target version will restore the default RBAC. If you've manually modified cluster roles, ensure the
metrics.k8s.ioand
custom.metrics.k8s.ioAPI groups are present with
getand
listverbs. Avoid editing these system ClusterRoles directly; use aggregated RBAC labels instead so that upgrades don't clobber your changes.
Root Cause 7: HPA Referencing a Deprecated API Version
Cluster upgrades can silently break HPAs that reference deprecated API versions. When Kubernetes removes
autoscaling/v2beta1or
autoscaling/v2beta2from the API server, any HPA manifest using those versions stops functioning — but may not immediately surface an obvious error.
How to Identify It
kubectl get hpa my-api-hpa -o yaml | grep apiVersion
Compare against what the cluster currently serves:
kubectl api-versions | grep autoscaling
If your HPA is using
autoscaling/v2beta2but only
autoscaling/v2is available, the existing object persists in etcd but won't be reconciled correctly.
How to Fix It
Export the current HPA, update the API version and metrics spec format to
autoscaling/v2, then reapply:
kubectl get hpa my-api-hpa -o yaml > hpa-backup.yaml
# Edit hpa-backup.yaml: set apiVersion to autoscaling/v2
# Remove status block and resourceVersion before re-applying
kubectl apply -f hpa-updated.yaml
The v2 spec uses a structured
metricsarray with typed entries and a
behaviorblock — a cleaner format than the beta versions and the only one you should be writing new HPAs against.
Prevention
Preventing HPA failures is about building validation into your delivery process, not discovering breakage during an incident.
The most impactful change you can make is enforcing resource requests at the admission layer. A Kyverno policy that rejects pods without CPU requests costs five minutes to write and eliminates an entire category of HPA failures permanently. Do it. Don't rely on documentation or team convention — those erode under deadline pressure.
Treat the metrics server like CoreDNS: it's infrastructure, not an optional add-on. Include it in your cluster bootstrap automation. Add a health check for it in your cluster readiness tests — if
kubectl top nodesfails, the cluster isn't ready for production workloads that depend on autoscaling.
Add a post-deploy HPA validation step to your CI/CD pipeline. After applying manifests, wait 30 seconds and run
kubectl describe hpafor every HPA in the namespace. If any target shows
<unknown>or any condition shows
ScalingActive: False, fail the deploy. A shell check is enough:
kubectl get hpa -n production -o jsonpath='{.items[*].status.conditions[?(@.type=="ScalingActive")].status}' | grep -v True && echo "All HPAs healthy" || echo "HPA check failed"
For custom metrics, validate the full pipeline independently before wiring up the HPA. Hit
/apis/custom.metrics.k8s.io/v1beta1directly, confirm your metric is present, and verify the value looks sane. If the metric isn't there, the HPA won't find it either — and it won't tell you why in any helpful way.
Finally, document your stabilization window and behavior choices alongside the HPA manifests in your GitOps repo. It's easy to forget six months later why scale-down was tuned to 45 seconds rather than the default 300. Your on-call colleague at 2am will appreciate the context, and you'll avoid someone "fixing" the short window back to default because it looks like a misconfiguration.
The HPA is a powerful and reliable primitive when the full chain — metrics server, resource requests, metrics API, correct targets — is intact. Most failures I've debugged came down to one link in that chain being broken or absent. Learn to validate each link independently, and HPA debugging becomes a five-minute exercise instead of a multi-hour incident.
