Symptoms
Your CI pipeline kicks off, Docker starts building, and then it dies. Sometimes it dies fast with a cryptic error about a missing image. Sometimes it sits there for five minutes before timing out on a network pull. Sometimes it worked perfectly yesterday and now it doesn't, and nothing in the diff looks relevant. Whatever the failure mode, the build step is red and your team's deployment is blocked.
Common symptoms that bring engineers to this page include:
- The pipeline exits with a non-zero code during the
docker build
step - Error messages referencing
manifest unknown
,not found
, orunauthorized
- Builds that succeed locally but fail in CI every single time
- Build times that have ballooned from 90 seconds to 12 minutes without explanation
- The CI runner running out of disk mid-build
- Secrets that are clearly set as CI environment variables but are completely invisible inside the build context
These failures cluster around a handful of root causes. Let's walk through each one with the exact error output you'll see and the concrete steps to fix it.
Root Cause 1: Base Image Not Found
Why It Happens
Your
FROMline references an image that doesn't exist at the registry. The tag was deleted after a cleanup job, the image name has a typo, the image is private and the runner isn't authenticated, or you're pointing at an internal registry that the CI runner can't reach over the network. In my experience, the sneakiest version of this is when someone builds a custom base image on their workstation, pushes it to an internal registry, updates the
FROMline, and then commits without documenting how to rebuild that base. Every other engineer on the team has the image cached locally and never notices — until CI picks up a fresh runner that has never seen it.
How to Identify It
The error is usually unambiguous. You'll see something like this in your pipeline log:
Step 1/12 : FROM node:18-alpine-custom
ERROR: failed to solve: node:18-alpine-custom: failed to resolve source metadata
for docker.io/library/node:18-alpine-custom: docker.io/library/node:18-alpine-custom: not foundFor a private internal registry the message shifts slightly:
Step 1/12 : FROM registry.solvethenetwork.com/internal/base-node:3.1
ERROR: failed to solve: registry.solvethenetwork.com/internal/base-node:3.1:
failed to authorize: failed to fetch oauth token: unexpected status: 401 UnauthorizedReproduce it locally by running a direct pull:
docker pull registry.solvethenetwork.com/internal/base-node:3.1If that fails on your workstation too, the image genuinely doesn't exist at that tag. If it succeeds locally but fails in CI, you have a registry auth problem — skip ahead to Root Cause 5. Run this to check if a local cached copy has been silently powering your local builds all along:
docker image ls | grep base-nodeHow to Fix It
First, confirm which tags are actually available in the registry. For an internal registry behind token auth:
curl -u infrarunbook-admin:$REGISTRY_TOKEN \
https://registry.solvethenetwork.com/v2/internal/base-node/tags/listIf the tag doesn't exist, either rebuild and push it or update your
FROMto a tag that does. For public base images, stop using mutable tags like
latestor
18-alpine— those can silently change under you. Prefer digest pinning:
FROM node:18-alpine@sha256:a1b2c3d4e5f67890abcdef1234567890abcdef1234567890abcdef1234567890Get the current digest for any image with:
docker pull node:18-alpine && docker inspect node:18-alpine --format='{{index .RepoDigests 0}}'Root Cause 2: Build Context Too Large
Why It Happens
When you run
docker build ., Docker streams the entire build context — everything in the current directory — to the daemon before executing a single
FROMinstruction. Without a
.dockerignorefile, that context includes
node_modules(often 300MB+), the
.gitdirectory (which can be gigabytes on long-lived repos), build artifacts, test fixtures, log files, and whatever else has accumulated. In CI this creates two compounding problems: the transfer itself is slow and eats into job time limits, and some runner configurations cap the context size or have memory constraints that cause the daemon to OOM during the send.
How to Identify It
Look for this line at the very top of your build output, before any numbered steps:
Sending build context to Docker daemon 1.247GBAnything over 100MB deserves scrutiny. A typical application's build context should be under 50MB. Find the biggest contributors before touching anything:
du -sh * .[^.]* 2>/dev/null | sort -rh | head -20You can also inspect exactly what Docker would include in the context without actually building:
docker build --no-cache --progress=plain . 2>&1 | head -5How to Fix It
Create a
.dockerignorefile in the root of your build context. The syntax is identical to
.gitignore. A solid baseline that covers most Node.js and Python projects:
.git
.gitignore
.dockerignore
node_modules
dist
build
*.log
*.md
.env
.env.*
coverage
.nyc_output
__pycache__
*.pyc
.pytest_cache
.vscode
.idea
tests/fixtures/large-datasetAfter adding this, the difference in context size is usually dramatic:
# Before .dockerignore
Sending build context to Docker daemon 1.247GB
# After .dockerignore
Sending build context to Docker daemon 4.821MBIf your repository structure puts the Dockerfile somewhere other than the project root, or you genuinely cannot place a
.dockerignorewhere Docker expects it, pass the Dockerfile path and context directory separately:
docker build -f ./docker/Dockerfile ./srcRoot Cause 3: Secret Not Available During Build
Why It Happens
Build-time secrets are genuinely tricky, and this is one of the most common sources of confusion I see from engineers who are new to CI/CD. You've set the CI variable — an NPM token, a private pip index password, a GitHub PAT for installing private packages — and it's absolutely there in the runner environment. But your
RUN npm installstep fails with a 401. The reason is that Docker build runs in an isolated environment. Environment variables from the CI runner are not automatically forwarded into the build. You have to explicitly declare and pass them, and if you get the method wrong, you either don't get the secret at all or you accidentally bake it into an image layer where it can be extracted later.
How to Identify It
The failure surfaces as a package manager authentication error during a
RUNstep:
Step 7/14 : RUN npm ci
npm ERR! code E401
npm ERR! 401 Unauthorized - GET https://npm.solvethenetwork.com/@internal%2fcore - unauthenticated
# Or for pip:
Step 8/14 : RUN pip install -r requirements.txt --index-url https://pypi.solvethenetwork.com/simple/
ERROR: 401 Client Error: Unauthorized for url: https://pypi.solvethenetwork.com/simple/requests/Confirm the variable exists on the runner but isn't reaching the build. In a CI debug step before your
docker buildcommand:
Frequently Asked Questions
Why does my Docker build work locally but fail in CI?
The most common reasons are missing registry credentials on the CI runner, build secrets that exist in the CI environment but aren't forwarded into the Docker build context, or a local image cache that your workstation has but the ephemeral runner does not. Start by running the build with DOCKER_BUILDKIT=1 and --progress=plain to see which layers are being rebuilt, and confirm the runner can authenticate to your registry with an explicit docker login step.
How do I pass environment variable secrets into a Docker build?
Use BuildKit's --secret flag with a RUN --mount=type=secret instruction in your Dockerfile. This mounts the secret only during that specific build step without persisting it in any image layer. Set DOCKER_BUILDKIT=1 in your CI environment and pass the secret with: docker build --secret id=my_secret,env=MY_ENV_VAR -t myimage . Then inside the Dockerfile use: RUN --mount=type=secret,id=my_secret VALUE=$(cat /run/secrets/my_secret) command
What is the fastest way to reduce Docker build context size?
Create a .dockerignore file in your build context root and exclude node_modules, .git, dist, build artifacts, logs, and test fixtures. A proper .dockerignore can reduce a gigabyte-scale context down to a few megabytes, dramatically improving both build speed and CI reliability. Run du -sh * .[^.]* | sort -rh to identify the biggest directories before writing the file.
How do I preserve Docker layer cache across CI jobs?
Use registry-based cache with BuildKit's inline cache. Pass --build-arg BUILDKIT_INLINE_CACHE=1 and --cache-from pointing at a previously pushed cache image. After a successful build, push the image with a stable tag (like :cache) in addition to the commit SHA tag. On the next run, --cache-from will pull that image and use its embedded cache metadata to skip unchanged layers even on a fresh runner.
How do I stop Docker Hub rate limiting from breaking my CI builds?
Authenticate to Docker Hub in your CI pipeline even when pulling public images — authenticated pulls have a much higher rate limit than anonymous pulls. For high-volume pipelines, mirror frequently used base images to your internal registry and reference them in your Dockerfiles instead of pulling directly from Docker Hub on every build.
Related Articles
