Docker Daemon Not Starting

Symptoms

You run

systemctl start docker

and it hangs, then fails. Or maybe Docker was running fine until a kernel update, a reboot, or someone edited

/etc/docker/daemon.json

. Either way, the daemon is down and nothing Docker-related works.

Here's what you typically see on the surface:

systemctl status docker
shows Active: failed or spins in activating (start) indefinitely
/var/run/docker.sock
either doesn't exist or sits there dead with no process behind it
docker ps
returns: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Dependent services — docker-compose stacks, container monitoring agents, CI runners — are all dead alongside it

The journal logs are your first stop. Don't spend time guessing. Pull the logs and read the actual error before you touch anything else:

journalctl -u docker.service --no-pager -n 80

Almost every case I've worked through was diagnosed within the first ten lines of that output. The categories below map directly to the errors you'll actually see there. Read the error, match the pattern, fix the cause.

Cause 1: Overlay Filesystem Error

Why It Happens

Docker's default storage driver is

overlay2

, which depends on the

overlay

kernel module. If that module isn't loaded — or if the underlying filesystem on

/var/lib/docker

doesn't support overlayfs — the daemon won't start. This comes up more often than you'd think. I've seen it happen after in-place OS upgrades where the kernel changed but

/etc/modules

wasn't preserved, and after migrations to a new storage backend where someone formatted

/var/lib/docker

on XFS without enabling

ftype=1

. Both cases look like a storage driver failure on the surface, but the root cause is one layer lower.

How to Identify It

In your journal output you'll see something like this:

Apr 20 09:14:22 sw-infrarunbook-01 dockerd[3812]: time="2026-04-20T09:14:22.441Z" level=error msg="failed to start daemon" error="error initializing graphdriver: driver not supported"
Apr 20 09:14:22 sw-infrarunbook-01 dockerd[3812]: time="2026-04-20T09:14:22.441Z" level=error msg="[graphdriver] prior storage driver overlay2 failed: driver not supported"

Or, if the XFS d_type support is missing:

Apr 20 09:14:22 sw-infrarunbook-01 dockerd[3812]: time="2026-04-20T09:14:22.510Z" level=error msg="failed to start daemon" error="error initializing graphdriver: overlay2: the backing xfs filesystem is formatted without d_type support, which leads to incorrect behavior. Reformat the filesystem with ftype=1 to enable d_type support."

Check whether the kernel module is loaded:

lsmod | grep overlay

No output means the module isn't loaded. If you're on XFS, also verify d_type support:

xfs_info /var/lib/docker | grep ftype

You want

ftype=1

. If it shows

ftype=0

, that's your problem and it can't be fixed without reformatting.

How to Fix It

If the module is just not loaded, load it immediately and make it persistent across reboots:

# Load it now
modprobe overlay

# Persist across reboots
echo "overlay" >> /etc/modules

# Then restart Docker
systemctl start docker

If the XFS filesystem was formatted without

ftype=1

, you have to reformat it — there's no in-place fix. Back up anything you need from

/var/lib/docker

(pulled images will need to be re-pulled anyway), unmount the volume, reformat with

mkfs.xfs -n ftype=1 /dev/sdX

, remount, and start Docker fresh. If you're on ext4, this specific issue won't apply — ext4 supports d_type natively.

Cause 2: iptables Conflict

Why It Happens

Docker manages its own iptables rules to handle container networking — NAT for outbound traffic, forwarding between containers, port mapping. When the host is also running firewalld, nftables, or another network management tool that owns the same chains, conflicts arise. The most disruptive scenario is modern Linux distributions — RHEL 9, Debian 11+, Ubuntu 22.04+ — where

iptables

now points to

iptables-nft

by default, but Docker is still writing legacy iptables rules via the old backend. The two sets of rules don't coexist cleanly, and the daemon fails during network controller initialization.

How to Identify It

The journal output looks like this:

Apr 20 09:31:05 sw-infrarunbook-01 dockerd[4102]: time="2026-04-20T09:31:05.821Z" level=warning msg="could not change the host's network settings: could not create ip table rule in docker-forward"
Apr 20 09:31:05 sw-infrarunbook-01 dockerd[4102]: time="2026-04-20T09:31:05.901Z" level=error msg="failed to start daemon" error="network controller initialization failed: error creating default \"bridge\" network: Failed to Setup IP tables: Unable to enable SKIP DNAT rule: (iptables failed: iptables --wait -t nat -I DOCKER -i docker0 -j RETURN: iptables: No chain/target/match by that name."

Check which iptables backend is active and whether firewalld is in the picture:

update-alternatives --display iptables
systemctl is-active firewalld

How to Fix It

The cleanest fix on systems where

iptables-nft

is the default is to switch to

iptables-legacy

, which Docker handles reliably:

update-alternatives --set iptables /usr/sbin/iptables-legacy
update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy
systemctl restart docker

If you need to keep

iptables-nft

and firewalld is the conflict, configure Docker's bridge interface in a trusted firewalld zone:

firewall-cmd --permanent --zone=trusted --add-interface=docker0
firewall-cmd --reload
systemctl restart docker

There's also the option of setting

"iptables": false

/etc/docker/daemon.json

to tell Docker to skip managing iptables entirely, but only do that if you're prepared to write and maintain all the NAT and forwarding rules yourself. That's rarely worth it outside of very specialized environments.

Cause 3: Cgroup v2 Issue

Why It Happens

Linux kernels since 5.2 support cgroup v2 (the unified hierarchy), and many distributions have made it the default. Docker and containerd work fine with cgroup v2, but only when both are configured to use the

systemd

cgroup driver. The classic failure scenario: Docker was installed on a cgroup v1 system, the kernel was upgraded, the system booted into cgroup v2 mode, and now Docker's internal configuration refers to a cgroup structure that no longer exists as expected. The daemon tries to initialize cgroup paths that simply aren't there.

I've also hit this when running Docker inside a VM or LXC container where the host's cgroup configuration doesn't match what the guest expects — particularly in nested virtualization setups.

How to Identify It

First, check which cgroup version is active:

stat -fc %T /sys/fs/cgroup/

If it returns

cgroup2fs

, you're on v2. If it returns

tmpfs

, you're on v1. Then look at the journal:

Apr 20 10:02:11 sw-infrarunbook-01 dockerd[4451]: time="2026-04-20T10:02:11.200Z" level=error msg="failed to start daemon" error="Devices cgroup isn't mounted"
Apr 20 10:02:11 sw-infrarunbook-01 containerd[4389]: time="2026-04-20T10:02:11.198Z" level=error msg="failed to handle event" error="failed to get OOM score for pid 4451: failed to read /proc/4451/oom_score_adj: no such process"

Check what cgroup driver is currently configured in both Docker and containerd:

cat /etc/docker/daemon.json
grep -A5 'runc' /etc/containerd/config.toml | grep -i cgroup

How to Fix It

On a cgroup v2 system with systemd as init, set the cgroup driver to

systemd

/etc/docker/daemon.json

{
  "exec-opts": ["native.cgroupdriver=systemd"]
}

/etc/containerd/config.toml

, enable the systemd cgroup driver for the runc runtime:

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
  SystemdCgroup = true

Always restart containerd before Docker — Docker's runtime depends on containerd being configured correctly first:

systemctl restart containerd
systemctl restart docker

Rolling back to cgroup v1 by adding

systemd.unified_cgroup_hierarchy=0

to your GRUB kernel parameters is technically possible, but it's a band-aid. Fix the driver configuration instead. Cgroup v2 is where the ecosystem is going, and fighting it costs more effort over time than embracing it now.

Cause 4: Storage Driver Misconfigured

Why It Happens

Docker supports multiple storage drivers:

overlay2

devicemapper

btrfs

zfs

, and

vfs

. The daemon reads its storage driver from

/etc/docker/daemon.json

, and if that config specifies a driver that isn't available — because the required kernel module is absent, a binary dependency isn't installed, or the filesystem doesn't support it — the daemon fails to initialize the graph driver and exits.

In my experience, this comes up most often when someone copies a

daemon.json

from one server to another without checking that the target has the same capabilities. It also surfaces when the storage driver config is technically valid, but doesn't match the existing data under

/var/lib/docker

. If you change the storage driver after Docker has been running, existing image layers become inaccessible and the daemon may refuse to start or come up in a broken state.

How to Identify It

The journal error is usually explicit about what failed:

Apr 20 10:45:33 sw-infrarunbook-01 dockerd[5012]: time="2026-04-20T10:45:33.812Z" level=error msg="failed to start daemon" error="error initializing graphdriver: prior storage driver devicemapper failed: driver not supported"
Apr 20 10:45:33 sw-infrarunbook-01 dockerd[5012]: time="2026-04-20T10:45:33.901Z" level=error msg="failed to start daemon" error="error initializing graphdriver: unknown graphdriver: btrfs"

Check what's configured versus what's actually on disk:

cat /etc/docker/daemon.json
ls /var/lib/docker/

The subdirectory inside

/var/lib/docker

named after the driver —

overlay2/

devicemapper/

btrfs/

— tells you what driver wrote the existing state. If that doesn't match what's in

daemon.json

, you have a mismatch.

How to Fix It

If the configured driver is wrong and you want to use

overlay2

, update

/etc/docker/daemon.json

{
  "storage-driver": "overlay2"
}

Always validate the JSON syntax before restarting — a malformed

daemon.json

is one of the most common self-inflicted Docker failures and gives a completely opaque error:

python3 -m json.tool /etc/docker/daemon.json

If you're switching drivers and want to preserve your existing images, export them first with

docker save

, clear

/var/lib/docker

, update the config, start Docker, and re-import with

docker load

. There's no lossless in-place driver migration — the layer formats are incompatible across drivers.

Cause 5: Corrupted Docker State

Why It Happens

Docker maintains its own internal state database under

/var/lib/docker

. This includes layer metadata, container state, network configuration, and volume references — most of it stored in boltDB files. If the daemon is killed mid-write — power loss, hard reboot, an OOM killer that takes out

dockerd

at the wrong moment — you end up with corrupted boltDB files, half-written image layers, or a broken containerd content store. I've also seen this happen when a host ran out of inodes rather than disk space, causing Docker to write partial or garbage data into its metadata files before hitting the inode ceiling.

How to Identify It

The journal will have errors referencing database operations failing or metadata that can't be parsed:

Apr 20 11:12:44 sw-infrarunbook-01 dockerd[5890]: time="2026-04-20T11:12:44.111Z" level=error msg="failed to start daemon" error="error loading config file: unexpected end of JSON input"
Apr 20 11:12:44 sw-infrarunbook-01 dockerd[5890]: time="2026-04-20T11:12:44.330Z" level=error msg="containerd: deleting container" error="bolt: invalid argument"
Apr 20 11:12:44 sw-infrarunbook-01 dockerd[5890]: time="2026-04-20T11:12:44.401Z" level=error msg="failed to start daemon" error="failed to create new content store: bolt DB /var/lib/docker/containerd/daemon/io.containerd.metadata.v1.bolt/meta.db: invalid database"

The key phrases:

bolt: invalid argument

invalid database

unexpected end of JSON input

when reading Docker's own files, and

failed to load container

during startup. Also check inode exhaustion — it's easy to miss:

df -ih /var/lib/docker

If inode usage is at 100%, Docker can't create new metadata entries and will silently corrupt what it can't finish writing.

How to Fix It

The reliable fix is to stop both services, back up anything irreplaceable, and clear the state directory:

systemctl stop docker
systemctl stop containerd

# Back up daemon config
cp /etc/docker/daemon.json /root/daemon.json.bak

# Remove corrupted state
rm -rf /var/lib/docker
rm -rf /var/lib/containerd

systemctl start containerd
systemctl start docker

Be clear with yourself about what this wipes: all pulled images, all stopped containers, and any named volumes stored at the default location. Images can be re-pulled. If you have stateful named volumes that weren't backed up externally, they're gone. In production environments running workloads from an orchestrator, this is usually fine — the orchestrator will reschedule containers and pull images fresh. But audit your volumes before you delete anything.

A more surgical approach involves deleting only the corrupted boltDB files while leaving image layer directories intact, but pinpointing exactly which files are corrupted is time-consuming and error-prone. In most cases, starting clean is faster and more reliable than trying to salvage a partially broken state tree.

Cause 6: Stale or Broken Docker Socket

Why It Happens

Docker listens on a Unix socket at

/var/run/docker.sock

. The daemon creates this socket at startup, and it should be owned by

root:docker

with mode

0660

. Sometimes this breaks: a security hardening script changed the socket permissions, someone manually modified it during debugging, or an incomplete previous startup left behind a stale socket file that the new process can't overwrite because it's owned differently. The daemon sees the address as already in use and refuses to bind.

How to Identify It

ls -la /var/run/docker.sock

Normal output looks like this:

srw-rw---- 1 root docker 0 Apr 20 09:00 /var/run/docker.sock

If the ownership or permissions are wrong, or if a socket file exists while the daemon is stopped, you'll see journal errors like:

Apr 20 11:44:01 sw-infrarunbook-01 dockerd[6201]: time="2026-04-20T11:44:01.301Z" level=error msg="failed to start daemon" error="can't create unix socket /var/run/docker.sock: listen unix /var/run/docker.sock: bind: address already in use"

How to Fix It

If a stale socket file is blocking the bind, remove it and start fresh — Docker will recreate it cleanly:

rm -f /var/run/docker.sock
systemctl start docker

If permissions are wrong after Docker is running, correct them directly:

chown root:docker /var/run/docker.sock
chmod 660 /var/run/docker.sock

Cause 7: Disk Space Exhaustion

Why It Happens

This one feels obvious, but I've watched experienced engineers overlook it for far too long. Docker writes a lot of data to

/var/lib/docker

— image layers, container writable layers, build cache, and container log files. When the partition hosting that directory fills up, Docker can't write its state files or update metadata, and the daemon either refuses to start or crashes shortly after. The same problem hits when you've exhausted inodes without running out of raw disk space, which is especially common on hosts running many small containers that each create dozens of small files.

How to Identify It

# Check disk space
df -h /var/lib/docker

# Check inode usage
df -ih /var/lib/docker

# If Docker is partially functional
docker system df

Journal output for a full disk often surfaces as generic write errors rather than a clean "disk full" message:

Apr 20 12:01:33 sw-infrarunbook-01 dockerd[6601]: time="2026-04-20T12:01:33.402Z" level=error msg="Handler for POST /v1.44/containers/create returned error: write /var/lib/docker/overlay2/abc123/merged/tmp: no space left on device"

How to Fix It

If Docker is partially up, prune unused data first:

docker system prune -af --volumes

If the daemon won't start at all, identify what's consuming space and free it manually before attempting a restart:

du -sh /var/lib/docker/overlay2/* | sort -rh | head -20

Once you have enough headroom to start the daemon, run

docker system prune

to clean up properly. Long-term, mount

/var/lib/docker

on a dedicated volume, and cap container log sizes in

daemon.json

so you don't hit this again.

Prevention

Most Docker daemon failures are entirely preventable. Here's what I put in place on every host running Docker in production.

Pin kernel modules at boot. Add

overlay

/etc/modules

so it always loads regardless of what the initrd does during a kernel upgrade. It takes five seconds and eliminates an entire class of startup failures.

Validate daemon.json before applying changes. Run

python3 -m json.tool /etc/docker/daemon.json

after every edit. JSON syntax errors are silent until Docker reads the file at startup, and a missing comma will take the daemon down just as effectively as a kernel bug. Make validation part of your change procedure, not an afterthought.

Put /var/lib/docker on a dedicated volume. This protects the root filesystem from being filled by Docker's data and makes capacity expansion trivial. In any cloud environment, attach a separate block device and mount it at

/var/lib/docker

before Docker is ever installed or started on the host.

Configure log rotation in daemon.json. Without it, container logs grow unbounded. This is one of the most common causes of disk exhaustion on long-running hosts. Always include this in your baseline configuration:

{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "50m",
    "max-file": "5"
  }
}

Schedule regular pruning. Unused images, stopped containers, and dangling build cache accumulate faster than most teams realize. A weekly cron job handles it without any manual effort:

0 3 * * 0 root docker system prune -f --filter "until=168h" >> /var/log/docker-prune.log 2>&1

Alert on the service state. At minimum, alert on

systemctl is-active docker

returning anything other than

active

. If you're running Prometheus with node_exporter, the systemd collector exposes

node_systemd_unit_state{name="docker.service",state="active"}

as a metric you can alert on directly — no custom scripting needed.

Validate Docker after kernel upgrades in staging. A kernel upgrade that flips the system from cgroup v1 to v2 will break Docker if the cgroup driver isn't set to

systemd

. Catching this on a staging host before rolling it to production costs minutes. Catching it in production at 2am costs much more. Make "does Docker start cleanly after kernel upgrade" a standard validation step in your patching runbook.

Docker daemon failures aren't mysterious. They're almost always caused by one of the issues above, and the journal logs will tell you which one within the first few lines. The real discipline is reading the error before you start changing things. Build that habit and you'll cut your mean time to resolution dramatically.

Docker Daemon Not Starting

Symptoms

Cause 1: Overlay Filesystem Error

Why It Happens

How to Identify It

How to Fix It

Cause 2: iptables Conflict

Why It Happens

How to Identify It

How to Fix It

Cause 3: Cgroup v2 Issue

Why It Happens

How to Identify It

How to Fix It

Cause 4: Storage Driver Misconfigured

Why It Happens

How to Identify It

How to Fix It

Cause 5: Corrupted Docker State

Why It Happens

How to Identify It

How to Fix It

Cause 6: Stale or Broken Docker Socket

Why It Happens

How to Identify It

How to Fix It

Cause 7: Disk Space Exhaustion

Why It Happens

How to Identify It

How to Fix It

Prevention

Frequently Asked Questions

Why does Docker fail to start after a kernel upgrade?

How do I fix 'error initializing graphdriver: driver not supported'?

What should I do if /var/lib/docker is corrupted?

Why does Docker fail with 'iptables: No chain/target/match by that name'?

How do I check if disk space is causing Docker to fail?

Related Articles