Docker Build Failing in CI Pipeline

Symptoms

Your CI pipeline kicks off, Docker starts building, and then it dies. Sometimes it dies fast with a cryptic error about a missing image. Sometimes it sits there for five minutes before timing out on a network pull. Sometimes it worked perfectly yesterday and now it doesn't, and nothing in the diff looks relevant. Whatever the failure mode, the build step is red and your team's deployment is blocked.

Common symptoms that bring engineers to this page include:

The pipeline exits with a non-zero code during the
docker build
step
Error messages referencing
manifest unknown
,
not found
, or
unauthorized
Builds that succeed locally but fail in CI every single time
Build times that have ballooned from 90 seconds to 12 minutes without explanation
The CI runner running out of disk mid-build
Secrets that are clearly set as CI environment variables but are completely invisible inside the build context

These failures cluster around a handful of root causes. Let's walk through each one with the exact error output you'll see and the concrete steps to fix it.

Root Cause 1: Base Image Not Found

Why It Happens

Your

FROM

line references an image that doesn't exist at the registry. The tag was deleted after a cleanup job, the image name has a typo, the image is private and the runner isn't authenticated, or you're pointing at an internal registry that the CI runner can't reach over the network. In my experience, the sneakiest version of this is when someone builds a custom base image on their workstation, pushes it to an internal registry, updates the

FROM

line, and then commits without documenting how to rebuild that base. Every other engineer on the team has the image cached locally and never notices — until CI picks up a fresh runner that has never seen it.

How to Identify It

The error is usually unambiguous. You'll see something like this in your pipeline log:

Step 1/12 : FROM node:18-alpine-custom
ERROR: failed to solve: node:18-alpine-custom: failed to resolve source metadata
  for docker.io/library/node:18-alpine-custom: docker.io/library/node:18-alpine-custom: not found

For a private internal registry the message shifts slightly:

Step 1/12 : FROM registry.solvethenetwork.com/internal/base-node:3.1
ERROR: failed to solve: registry.solvethenetwork.com/internal/base-node:3.1:
  failed to authorize: failed to fetch oauth token: unexpected status: 401 Unauthorized

Reproduce it locally by running a direct pull:

docker pull registry.solvethenetwork.com/internal/base-node:3.1

If that fails on your workstation too, the image genuinely doesn't exist at that tag. If it succeeds locally but fails in CI, you have a registry auth problem — skip ahead to Root Cause 5. Run this to check if a local cached copy has been silently powering your local builds all along:

docker image ls | grep base-node

How to Fix It

First, confirm which tags are actually available in the registry. For an internal registry behind token auth:

curl -u infrarunbook-admin:$REGISTRY_TOKEN \
  https://registry.solvethenetwork.com/v2/internal/base-node/tags/list

If the tag doesn't exist, either rebuild and push it or update your

FROM

to a tag that does. For public base images, stop using mutable tags like

latest

18-alpine

— those can silently change under you. Prefer digest pinning:

FROM node:18-alpine@sha256:a1b2c3d4e5f67890abcdef1234567890abcdef1234567890abcdef1234567890

Get the current digest for any image with:

docker pull node:18-alpine && docker inspect node:18-alpine --format='{{index .RepoDigests 0}}'

Root Cause 2: Build Context Too Large

Why It Happens

When you run

docker build .

, Docker streams the entire build context — everything in the current directory — to the daemon before executing a single

FROM

instruction. Without a

.dockerignore

file, that context includes

node_modules

(often 300MB+), the

.git

directory (which can be gigabytes on long-lived repos), build artifacts, test fixtures, log files, and whatever else has accumulated. In CI this creates two compounding problems: the transfer itself is slow and eats into job time limits, and some runner configurations cap the context size or have memory constraints that cause the daemon to OOM during the send.

How to Identify It

Look for this line at the very top of your build output, before any numbered steps:

Sending build context to Docker daemon  1.247GB

Anything over 100MB deserves scrutiny. A typical application's build context should be under 50MB. Find the biggest contributors before touching anything:

du -sh * .[^.]* 2>/dev/null | sort -rh | head -20

You can also inspect exactly what Docker would include in the context without actually building:

docker build --no-cache --progress=plain . 2>&1 | head -5

How to Fix It

Create a

.dockerignore

file in the root of your build context. The syntax is identical to

.gitignore

. A solid baseline that covers most Node.js and Python projects:

.git
.gitignore
.dockerignore
node_modules
dist
build
*.log
*.md
.env
.env.*
coverage
.nyc_output
__pycache__
*.pyc
.pytest_cache
.vscode
.idea
tests/fixtures/large-dataset

After adding this, the difference in context size is usually dramatic:

# Before .dockerignore
Sending build context to Docker daemon  1.247GB

# After .dockerignore
Sending build context to Docker daemon  4.821MB

If your repository structure puts the Dockerfile somewhere other than the project root, or you genuinely cannot place a

.dockerignore

where Docker expects it, pass the Dockerfile path and context directory separately:

docker build -f ./docker/Dockerfile ./src

Root Cause 3: Secret Not Available During Build

Why It Happens

Build-time secrets are genuinely tricky, and this is one of the most common sources of confusion I see from engineers who are new to CI/CD. You've set the CI variable — an NPM token, a private pip index password, a GitHub PAT for installing private packages — and it's absolutely there in the runner environment. But your

RUN npm install

step fails with a 401. The reason is that Docker build runs in an isolated environment. Environment variables from the CI runner are not automatically forwarded into the build. You have to explicitly declare and pass them, and if you get the method wrong, you either don't get the secret at all or you accidentally bake it into an image layer where it can be extracted later.

How to Identify It

The failure surfaces as a package manager authentication error during a

RUN

step:

Step 7/14 : RUN npm ci
npm ERR! code E401
npm ERR! 401 Unauthorized - GET https://npm.solvethenetwork.com/@internal%2fcore - unauthenticated

# Or for pip:
Step 8/14 : RUN pip install -r requirements.txt --index-url https://pypi.solvethenetwork.com/simple/
ERROR: 401 Client Error: Unauthorized for url: https://pypi.solvethenetwork.com/simple/requests/

Confirm the variable exists on the runner but isn't reaching the build. In a CI debug step before your

docker build

command:


Related Articles

[CI/CD] GitOps and ArgoCD Explained
[Docker] Docker Permission Denied Errors
[Docker] Docker Image Pull Errors
[Docker] Docker Compose and Multi-Container Setup Explained

Docker Build Failing in CI Pipeline

Symptoms

Root Cause 1: Base Image Not Found

Why It Happens

How to Identify It

How to Fix It

Root Cause 2: Build Context Too Large

Why It Happens

How to Identify It

How to Fix It

Root Cause 3: Secret Not Available During Build

Why It Happens

How to Identify It

Related Articles

`Frequently Asked Questions`

`Why does my Docker build work locally but fail in CI?`

`How do I pass environment variable secrets into a Docker build?`

`What is the fastest way to reduce Docker build context size?`

`How do I preserve Docker layer cache across CI jobs?`

`How do I stop Docker Hub rate limiting from breaking my CI builds?`

`Related Articles`