InfraRunBook
    Back to articles

    Kubernetes Helm Deployment Failing

    CI/CD
    Published: Apr 16, 2026
    Updated: Apr 16, 2026

    A practical troubleshooting guide for Kubernetes Helm deployment failures covering values mismatches, chart version conflicts, missing CRDs, RBAC errors, and rollout timeouts with real CLI commands and fixes.

    Kubernetes Helm Deployment Failing

    Symptoms

    You run

    helm upgrade --install
    and it hangs. Or it fails immediately with a cryptic error. Or — the worst variant — it reports success but your pods never come up. Helm deployment failures come in a few distinct flavors, and the one you're staring at right now usually points to one of a handful of well-known root causes.

    Here's what the failure surface looks like in practice:

    • Error: INSTALLATION FAILED: timed out waiting for the condition
    • Error: UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress
    • Error: unable to build kubernetes objects from release manifest: unable to recognize "": no matches for kind "Certificate" in version "cert-manager.io/v1"
    • Pods stuck in
      Pending
      ,
      CrashLoopBackOff
      , or
      ImagePullBackOff
    • Error: INSTALLATION FAILED: Forbidden: User "system:serviceaccount:ci:helm-deployer" cannot create resource...
    • The release shows as
      failed
      in
      helm list
      , leaving a lock that blocks all subsequent runs

    These symptoms span several root causes. Let's walk through each one — why it happens, how to identify it, and how to fix it.


    Root Cause 1: Values File Is Wrong

    This is the most common cause I see in teams that are new to Helm or have recently refactored their chart structure. A values file that doesn't match what the chart expects will either cause a rendering error at install time or silently produce incorrect manifests that deploy broken workloads.

    Why does it happen? Charts evolve. When someone bumps a chart version or restructures the values schema, the old values file doesn't always get updated in lockstep. Maybe a key got renamed from

    image.tag
    to
    image.version
    , or a nested block changed its structure entirely. Helm's templating engine will often just render an empty string or a zero-value rather than throwing an error, so the manifest looks valid but produces pods that fail at runtime. That silent failure mode is what makes this cause so insidious.

    How to Identify It

    Start with a dry run and inspect the rendered output:

    helm upgrade --install myapp ./charts/myapp \
      --values ./values/prod.yaml \
      --dry-run --debug 2>&1 | head -150

    If values are being ignored or misread, you'll often see default placeholders in the rendered YAML — things like an empty image repository, a replica count of zero, or a service port that maps to nothing. You can also check what Helm actually applied to the last release:

    helm get values myapp --namespace production

    Compare that output against what you intended to pass in. Then lint the chart with your values file explicitly specified:

    helm lint ./charts/myapp --values ./values/prod.yaml

    A real lint failure looks like this:

    ==> Linting ./charts/myapp
    [ERROR] templates/deployment.yaml: image: Invalid value: "": image repository is required
    1 chart(s) linted, 1 chart(s) failed

    How to Fix It

    Run

    helm show values ./charts/myapp
    to dump the chart's default values and compare them line by line with your overrides file. Look for keys in your file that don't appear anywhere in the defaults — those are likely stale or misspelled. If the chart ships a
    values.schema.json
    ,
    helm lint
    will automatically run JSON schema validation and flag type mismatches and required fields.

    Once you've corrected the values file, always do a dry run before applying. Don't skip the

    --debug
    flag — it prints the full rendered manifests and makes it obvious when a template produces unexpected output.


    Root Cause 2: Chart Version Conflict

    Helm tracks releases by storing versioned secrets in the target namespace. When you try to install or upgrade, it compares what you're requesting against what's currently deployed. Version conflicts surface in a few different ways — an incompatible API version between the chart and your cluster Kubernetes version, a dependency chart pinned to a version that no longer exists in the upstream repo, or a stale lock left behind by a previous failed upgrade.

    In my experience, the stale lock is by far the most frustrating variant. A failed upgrade leaves the release in

    pending-upgrade
    state, and Helm refuses to do anything with that release until the state is cleared.

    How to Identify It

    helm list --all-namespaces --all
    NAME     NAMESPACE   REVISION  UPDATED                    STATUS           CHART        APP VERSION
    myapp    production  3         2026-04-15 14:22:05 UTC    pending-upgrade  myapp-1.4.2  2.1.0

    The

    pending-upgrade
    status is the tell. For dependency version issues, inspect your
    Chart.lock
    file:

    cat charts/myapp/Chart.lock
    dependencies:
    - name: postgresql
      repository: https://charts.bitnami.com/bitnami
      version: 12.5.6
    digest: sha256:3a7f1c2d...
    generated: "2025-09-10T08:14:22.331Z"

    If that pinned version is no longer in the repo index,

    helm dependency update
    will fail:

    Error: no chart version found for postgresql-12.5.6

    How to Fix It

    For the stale lock, roll back to the last known good revision:

    helm rollback myapp 2 --namespace production

    If rollback also fails because the release state is truly corrupted, you can forcibly delete the release secret and reinstall. Helm stores release state in secrets named

    sh.helm.release.v1.<release-name>.v<revision>
    :

    kubectl get secrets -n production | grep helm.release
    kubectl delete secret sh.helm.release.v1.myapp.v3 -n production

    For dependency version conflicts, update your

    Chart.yaml
    to reference a version that exists in the current upstream index, then regenerate the lock file:

    helm repo update
    helm dependency update ./charts/myapp

    Root Cause 3: CRD Not Installed

    Custom Resource Definitions have to exist in the cluster before Helm can create resources that reference them. If you're deploying something like a Prometheus stack, a cert-manager Issuer, or an Istio VirtualService, and the underlying operator or CRD set hasn't been installed yet, Helm will fail immediately with a "no matches for kind" error.

    This catches teams off guard because the chart looks perfectly fine in a dry run against a cluster that already has the CRDs. Then you deploy to a fresh cluster — a new environment, a DR site, a CI ephemeral cluster — and it blows up on the very first resource. The chart didn't change. The cluster is the difference.

    How to Identify It

    The error is usually unambiguous:

    Error: INSTALLATION FAILED: unable to build kubernetes objects from release manifest:
    [unable to recognize "": no matches for kind "Certificate" in version "cert-manager.io/v1",
     unable to recognize "": no matches for kind "Issuer" in version "cert-manager.io/v1"]

    Verify which CRDs are currently installed:

    kubectl get crds | grep cert-manager

    If that returns nothing, the CRDs aren't there. You can also audit available API resources:

    kubectl api-resources | grep cert-manager.io

    How to Fix It

    Install the CRDs before running the chart that depends on them. Most operators ship their CRDs either as a standalone manifest or via a Helm chart values flag. For cert-manager:

    helm upgrade --install cert-manager jetstack/cert-manager \
      --namespace cert-manager \
      --create-namespace \
      --set installCRDs=true

    Note the

    --set installCRDs=true
    . Many operator charts gate CRD installation behind a values flag that defaults to false. Forgetting it is one of the most common sources of this failure, and I've seen it bite even experienced engineers who are moving fast.

    For charts that bundle CRDs in their own

    crds/
    directory, Helm installs them before other resources automatically — but only on first install, not on upgrades. If you're upgrading a chart and its CRD schema changed, you need to apply the updated CRD manually first:

    kubectl apply -f ./charts/myapp/crds/

    Make this step idempotent by using

    kubectl apply
    rather than
    kubectl create
    , and include it as an explicit step in your bootstrap and upgrade runbooks.


    Root Cause 4: RBAC Preventing the Deploy

    Helm runs with the permissions of whatever service account or kubeconfig credentials you're using. In a CI/CD pipeline this is typically a dedicated service account, and if that service account doesn't have the right Role or ClusterRole bindings, the deploy will fail with a Forbidden error partway through.

    What makes RBAC failures particularly annoying is that they often don't surface until Helm tries to create a specific resource type. The chart might create 15 resources successfully and then fail on the 16th because the service account can't create, say, a ClusterRoleBinding or a PersistentVolumeClaim. At that point you've got a partial deployment in the cluster — some resources created, some not — and a failed release state.

    How to Identify It

    The error message is usually explicit about which permission is missing:

    Error: INSTALLATION FAILED: failed to create resource: clusterrolebindings.rbac.authorization.k8s.io
    is forbidden: User "system:serviceaccount:ci:helm-deployer" cannot create resource
    "clusterrolebindings" in API group "rbac.authorization.k8s.io" at the cluster scope

    You can verify a specific permission directly:

    kubectl auth can-i create clusterrolebindings \
      --as=system:serviceaccount:ci:helm-deployer
    no

    For a full audit of what the service account can and can't do across a namespace:

    kubectl auth can-i --list \
      --as=system:serviceaccount:ci:helm-deployer \
      -n production

    How to Fix It

    You have two approaches. The first is granting the deploying service account a broad ClusterRole that covers the resource types your charts create. Here's a working example:

    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: helm-deployer
    rules:
    - apiGroups: ["", "apps", "batch", "networking.k8s.io", "rbac.authorization.k8s.io",
                  "policy", "autoscaling", "storage.k8s.io"]
      resources: ["*"]
      verbs: ["*"]
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      name: helm-deployer
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: helm-deployer
    subjects:
    - kind: ServiceAccount
      name: helm-deployer
      namespace: ci

    The second approach — and the one I prefer for production — is to scope permissions down to namespace-level Role/RoleBinding pairs for each deployment namespace, with ClusterRole bindings only for the cluster-scoped resources the chart actually needs. It takes more upfront work but avoids running your CI pipeline with effectively unrestricted cluster access. Minimum viable permissions are worth the effort to figure out.

    Apply the RBAC manifests first, then retry the Helm deployment.


    Root Cause 5: Rollout Timeout

    Helm waits for deployments to become ready before marking an install or upgrade as successful. If your pods don't reach a ready state within the timeout window — which defaults to five minutes — Helm reports the install as failed, even though the resources were technically created in the cluster.

    This is one of those failures that feels like a Helm problem but is actually a Kubernetes scheduling or application problem. The chart deployed fine. The pods just never came up. You need to diagnose what's happening at the pod level, not at the Helm level.

    How to Identify It

    Error: INSTALLATION FAILED: timed out waiting for the condition

    After this error, check the actual pod state immediately:

    kubectl get pods -n production -l app.kubernetes.io/name=myapp
    NAME                      READY   STATUS             RESTARTS   AGE
    myapp-7d8f9b4c6-x2kpj    0/1     ImagePullBackOff   0          6m
    myapp-7d8f9b4c6-m9qlr    0/1     Pending            0          6m

    Then describe the pod to see the events:

    kubectl describe pod myapp-7d8f9b4c6-m9qlr -n production
    Events:
      Warning  FailedScheduling  5m   default-scheduler  0/3 nodes are available:
                                       3 Insufficient cpu. preemption: 0/3 nodes are eligible
                                       for preemption

    For a

    CrashLoopBackOff
    , pull the logs from the failing container:

    kubectl logs myapp-7d8f9b4c6-x2kpj -n production --previous

    A misconfigured liveness or readiness probe is another frequent culprit. The pod starts fine but the probe path returns a non-200, and Kubernetes marks it unready indefinitely. Check probe configuration in the describe output under the

    Liveness
    and
    Readiness
    sections.

    How to Fix It

    Fix the underlying pod issue first. Once the root cause is identified — whether it's a resource constraint, an image pull credential problem, a liveness probe misconfiguration, or a missing ConfigMap — address that directly.

    For image pull failures specifically, ensure the pull secret is created in the right namespace and referenced in your values:

    kubectl create secret docker-registry regcred \
      --docker-server=registry.solvethenetwork.com \
      --docker-username=infrarunbook-admin \
      --docker-password='<token>' \
      --namespace production

    Then in your values file:

    imagePullSecrets:
      - name: regcred

    If the application legitimately takes longer than five minutes to start — for example, an init container running a database migration — extend the Helm timeout to match reality:

    helm upgrade --install myapp ./charts/myapp \
      --namespace production \
      --values ./values/prod.yaml \
      --timeout 12m

    Don't just crank up the timeout without understanding why the pods are slow to start. Bumping the number is sometimes the right call, but it shouldn't be your first move. Know what you're waiting for before deciding how long to wait for it.


    Root Cause 6: Image Pull Errors

    Image pull failures mean Kubernetes can't fetch the container image specified in the deployment. This might be a wrong tag, a missing pull secret, a registry authentication failure, or a network policy blocking egress from the cluster to the registry. It's closely related to rollout timeouts but the diagnosis path is different enough to call out separately.

    How to Identify It

    kubectl get events -n production --sort-by='.lastTimestamp' | grep -i pull
    LAST SEEN   TYPE      REASON      OBJECT                          MESSAGE
    3m          Warning   Failed      Pod/myapp-7d8f9b4c6-x2kpj      Failed to pull image
                                      "registry.solvethenetwork.com/myapp:v2.1.0":
                                      unauthorized: access denied
    3m          Warning   Failed      Pod/myapp-7d8f9b4c6-x2kpj      Error: ErrImagePull

    How to Fix It

    Verify the image tag exists in the registry before deploying. Confirm the pull secret is present in the correct namespace and either referenced in the pod spec via

    imagePullSecrets
    or attached to the default service account in that namespace. If your cluster is airgapped or behind a firewall, check that egress network policies allow traffic from the pod's namespace to the registry host on port 443.


    Root Cause 7: Resource Quota Exceeded

    Namespaces in production clusters often have

    ResourceQuota
    objects enforcing limits on CPU, memory, and object counts. When a Helm chart tries to create resources that would push the namespace over quota, Kubernetes rejects the request and Helm reports a failure. This one is easy to diagnose once you know to look for it.

    How to Identify It

    Error: INSTALLATION FAILED: failed to create resource: pods "myapp-7d8f9b4c6" is forbidden:
    exceeded quota: production-quota, requested: requests.cpu=500m,
    used: requests.cpu=3750m, limited: requests.cpu=4000m

    Check current quota usage in the namespace:

    kubectl describe resourcequota -n production
    Name:             production-quota
    Namespace:        production
    Resource          Used    Hard
    --------          ----    ----
    pods              19      20
    requests.cpu      3750m   4000m
    requests.memory   14Gi    16Gi
    limits.cpu        7500m   8000m

    How to Fix It

    Either reduce the resource requests in your values file to fit within the available headroom, scale down or remove other workloads in the namespace to free up capacity, or work with your cluster admin to raise the quota. Don't remove resource requests entirely to bypass the error — that creates noisy neighbor problems on shared clusters and removes the guardrails that quotas are there to enforce.


    Prevention

    Most Helm deployment failures are preventable if you build a few practices into your pipeline before anything hits the cluster.

    Run

    helm lint
    and
    helm template
    in CI against every chart change. This catches rendering errors, schema violations, and template logic bugs before they ever touch a real cluster. Follow that with a
    --dry-run
    against the actual target cluster — not just locally — because a local dry run won't catch API version mismatches or RBAC gaps:

    helm upgrade --install myapp ./charts/myapp \
      --namespace production \
      --values ./values/prod.yaml \
      --dry-run --debug

    Pin chart dependencies. Floating version ranges in

    Chart.yaml
    are fine for development, but commit a
    Chart.lock
    file for production and treat it like a lockfile for any other package manager. This prevents upstream chart changes from silently breaking your deploys on a Tuesday morning.

    Maintain an explicit CRD bootstrap step in your cluster provisioning runbook and in your CI pipeline for ephemeral environments. Document exactly which CRDs each chart depends on, and always apply them with

    kubectl apply
    so the step is idempotent regardless of whether the CRD already exists.

    Audit RBAC permissions proactively. Run

    kubectl auth can-i --list
    against your CI service account in each deployment namespace as a preflight check in the pipeline. Catching a missing permission before the deploy starts is far less painful than diagnosing a partial deployment after the fact.

    Set realistic timeouts for your workloads. If an application takes eight minutes to initialize, document that and set your Helm timeout to twelve minutes with some headroom. Pair this with well-configured readiness probes so Kubernetes accurately reflects whether a pod is actually ready to serve traffic — not just that the process started.

    For production pipelines, use

    --atomic
    to get automatic rollback on failure:

    helm upgrade --install myapp ./charts/myapp \
      --namespace production \
      --values ./values/prod.yaml \
      --atomic \
      --timeout 10m \
      --cleanup-on-fail

    --atomic
    rolls the release back to the previous good revision automatically if the deployment fails, leaving the cluster in a known state.
    --cleanup-on-fail
    removes any resources created during a failed install so you don't end up with orphaned objects. Together they prevent the stale lock and partial deployment problems that make debugging so frustrating. Use them in every automated pipeline deployment and you'll eliminate an entire class of cluster state issues.

    Frequently Asked Questions

    Why does my Helm upgrade get stuck in pending-upgrade status?

    A previous upgrade attempt failed partway through and left the release in a pending-upgrade lock state. Fix it by running helm rollback to the last good revision. If rollback also fails, you can delete the specific Helm release secret with kubectl delete secret sh.helm.release.v1.<release-name>.v<revision> -n <namespace> and reinstall.

    How do I find out which permissions my Helm CI service account is missing?

    Run kubectl auth can-i --list --as=system:serviceaccount:<namespace>:<serviceaccount-name> -n <deployment-namespace>. This lists all resource/verb combinations and whether the account has access. Match the output against the resource types your chart creates to identify gaps.

    Why does helm install succeed on my dev cluster but fail on a fresh cluster with a CRD error?

    Your dev cluster already has the required CRDs installed from a prior operator deployment. Fresh clusters start with no CRDs. You need to explicitly install the CRD set — either via the operator's Helm chart with installCRDs=true, or by applying the CRD manifests with kubectl apply before running your application chart.

    How do I extend the Helm deployment timeout for slow-starting applications?

    Pass --timeout to the helm upgrade --install command with a duration like 10m or 15m. The default is 5m. Set it based on how long your slowest init container or application startup realistically takes, with some additional headroom. Always diagnose why startup is slow before treating the timeout as the fix.

    What does 'helm lint' actually check and should I run it in CI?

    helm lint checks chart structure, YAML syntax, required field presence, and — if the chart ships a values.schema.json — validates your values file against the schema. Yes, run it in CI on every chart change. It catches template rendering errors and schema violations before any cluster interaction happens, which is always the right time to catch them.

    Related Articles