InfraRunBook
    Back to articles

    Docker Daemon Not Starting

    Docker
    Published: Apr 20, 2026
    Updated: Apr 20, 2026

    A practical troubleshooting guide for engineers dealing with a Docker daemon that refuses to start, covering overlay filesystem errors, iptables conflicts, cgroup v2 mismatches, storage driver issues, and corrupted state.

    Docker Daemon Not Starting

    Symptoms

    You run

    systemctl start docker
    and it hangs, then fails. Or maybe Docker was running fine until a kernel update, a reboot, or someone edited
    /etc/docker/daemon.json
    . Either way, the daemon is down and nothing Docker-related works.

    Here's what you typically see on the surface:

    • systemctl status docker
      shows Active: failed or spins in activating (start) indefinitely
    • /var/run/docker.sock
      either doesn't exist or sits there dead with no process behind it
    • docker ps
      returns: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
    • Dependent services — docker-compose stacks, container monitoring agents, CI runners — are all dead alongside it

    The journal logs are your first stop. Don't spend time guessing. Pull the logs and read the actual error before you touch anything else:

    journalctl -u docker.service --no-pager -n 80

    Almost every case I've worked through was diagnosed within the first ten lines of that output. The categories below map directly to the errors you'll actually see there. Read the error, match the pattern, fix the cause.


    Cause 1: Overlay Filesystem Error

    Why It Happens

    Docker's default storage driver is

    overlay2
    , which depends on the
    overlay
    kernel module. If that module isn't loaded — or if the underlying filesystem on
    /var/lib/docker
    doesn't support overlayfs — the daemon won't start. This comes up more often than you'd think. I've seen it happen after in-place OS upgrades where the kernel changed but
    /etc/modules
    wasn't preserved, and after migrations to a new storage backend where someone formatted
    /var/lib/docker
    on XFS without enabling
    ftype=1
    . Both cases look like a storage driver failure on the surface, but the root cause is one layer lower.

    How to Identify It

    In your journal output you'll see something like this:

    Apr 20 09:14:22 sw-infrarunbook-01 dockerd[3812]: time="2026-04-20T09:14:22.441Z" level=error msg="failed to start daemon" error="error initializing graphdriver: driver not supported"
    Apr 20 09:14:22 sw-infrarunbook-01 dockerd[3812]: time="2026-04-20T09:14:22.441Z" level=error msg="[graphdriver] prior storage driver overlay2 failed: driver not supported"

    Or, if the XFS d_type support is missing:

    Apr 20 09:14:22 sw-infrarunbook-01 dockerd[3812]: time="2026-04-20T09:14:22.510Z" level=error msg="failed to start daemon" error="error initializing graphdriver: overlay2: the backing xfs filesystem is formatted without d_type support, which leads to incorrect behavior. Reformat the filesystem with ftype=1 to enable d_type support."

    Check whether the kernel module is loaded:

    lsmod | grep overlay

    No output means the module isn't loaded. If you're on XFS, also verify d_type support:

    xfs_info /var/lib/docker | grep ftype

    You want

    ftype=1
    . If it shows
    ftype=0
    , that's your problem and it can't be fixed without reformatting.

    How to Fix It

    If the module is just not loaded, load it immediately and make it persistent across reboots:

    # Load it now
    modprobe overlay
    
    # Persist across reboots
    echo "overlay" >> /etc/modules
    
    # Then restart Docker
    systemctl start docker

    If the XFS filesystem was formatted without

    ftype=1
    , you have to reformat it — there's no in-place fix. Back up anything you need from
    /var/lib/docker
    (pulled images will need to be re-pulled anyway), unmount the volume, reformat with
    mkfs.xfs -n ftype=1 /dev/sdX
    , remount, and start Docker fresh. If you're on ext4, this specific issue won't apply — ext4 supports d_type natively.


    Cause 2: iptables Conflict

    Why It Happens

    Docker manages its own iptables rules to handle container networking — NAT for outbound traffic, forwarding between containers, port mapping. When the host is also running firewalld, nftables, or another network management tool that owns the same chains, conflicts arise. The most disruptive scenario is modern Linux distributions — RHEL 9, Debian 11+, Ubuntu 22.04+ — where

    iptables
    now points to
    iptables-nft
    by default, but Docker is still writing legacy iptables rules via the old backend. The two sets of rules don't coexist cleanly, and the daemon fails during network controller initialization.

    How to Identify It

    The journal output looks like this:

    Apr 20 09:31:05 sw-infrarunbook-01 dockerd[4102]: time="2026-04-20T09:31:05.821Z" level=warning msg="could not change the host's network settings: could not create ip table rule in docker-forward"
    Apr 20 09:31:05 sw-infrarunbook-01 dockerd[4102]: time="2026-04-20T09:31:05.901Z" level=error msg="failed to start daemon" error="network controller initialization failed: error creating default \"bridge\" network: Failed to Setup IP tables: Unable to enable SKIP DNAT rule: (iptables failed: iptables --wait -t nat -I DOCKER -i docker0 -j RETURN: iptables: No chain/target/match by that name."

    Check which iptables backend is active and whether firewalld is in the picture:

    update-alternatives --display iptables
    systemctl is-active firewalld

    How to Fix It

    The cleanest fix on systems where

    iptables-nft
    is the default is to switch to
    iptables-legacy
    , which Docker handles reliably:

    update-alternatives --set iptables /usr/sbin/iptables-legacy
    update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy
    systemctl restart docker

    If you need to keep

    iptables-nft
    and firewalld is the conflict, configure Docker's bridge interface in a trusted firewalld zone:

    firewall-cmd --permanent --zone=trusted --add-interface=docker0
    firewall-cmd --reload
    systemctl restart docker

    There's also the option of setting

    "iptables": false
    in
    /etc/docker/daemon.json
    to tell Docker to skip managing iptables entirely, but only do that if you're prepared to write and maintain all the NAT and forwarding rules yourself. That's rarely worth it outside of very specialized environments.


    Cause 3: Cgroup v2 Issue

    Why It Happens

    Linux kernels since 5.2 support cgroup v2 (the unified hierarchy), and many distributions have made it the default. Docker and containerd work fine with cgroup v2, but only when both are configured to use the

    systemd
    cgroup driver. The classic failure scenario: Docker was installed on a cgroup v1 system, the kernel was upgraded, the system booted into cgroup v2 mode, and now Docker's internal configuration refers to a cgroup structure that no longer exists as expected. The daemon tries to initialize cgroup paths that simply aren't there.

    I've also hit this when running Docker inside a VM or LXC container where the host's cgroup configuration doesn't match what the guest expects — particularly in nested virtualization setups.

    How to Identify It

    First, check which cgroup version is active:

    stat -fc %T /sys/fs/cgroup/

    If it returns

    cgroup2fs
    , you're on v2. If it returns
    tmpfs
    , you're on v1. Then look at the journal:

    Apr 20 10:02:11 sw-infrarunbook-01 dockerd[4451]: time="2026-04-20T10:02:11.200Z" level=error msg="failed to start daemon" error="Devices cgroup isn't mounted"
    Apr 20 10:02:11 sw-infrarunbook-01 containerd[4389]: time="2026-04-20T10:02:11.198Z" level=error msg="failed to handle event" error="failed to get OOM score for pid 4451: failed to read /proc/4451/oom_score_adj: no such process"

    Check what cgroup driver is currently configured in both Docker and containerd:

    cat /etc/docker/daemon.json
    grep -A5 'runc' /etc/containerd/config.toml | grep -i cgroup

    How to Fix It

    On a cgroup v2 system with systemd as init, set the cgroup driver to

    systemd
    in
    /etc/docker/daemon.json
    :

    {
      "exec-opts": ["native.cgroupdriver=systemd"]
    }

    In

    /etc/containerd/config.toml
    , enable the systemd cgroup driver for the runc runtime:

    [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
      SystemdCgroup = true

    Always restart containerd before Docker — Docker's runtime depends on containerd being configured correctly first:

    systemctl restart containerd
    systemctl restart docker

    Rolling back to cgroup v1 by adding

    systemd.unified_cgroup_hierarchy=0
    to your GRUB kernel parameters is technically possible, but it's a band-aid. Fix the driver configuration instead. Cgroup v2 is where the ecosystem is going, and fighting it costs more effort over time than embracing it now.


    Cause 4: Storage Driver Misconfigured

    Why It Happens

    Docker supports multiple storage drivers:

    overlay2
    ,
    devicemapper
    ,
    btrfs
    ,
    zfs
    , and
    vfs
    . The daemon reads its storage driver from
    /etc/docker/daemon.json
    , and if that config specifies a driver that isn't available — because the required kernel module is absent, a binary dependency isn't installed, or the filesystem doesn't support it — the daemon fails to initialize the graph driver and exits.

    In my experience, this comes up most often when someone copies a

    daemon.json
    from one server to another without checking that the target has the same capabilities. It also surfaces when the storage driver config is technically valid, but doesn't match the existing data under
    /var/lib/docker
    . If you change the storage driver after Docker has been running, existing image layers become inaccessible and the daemon may refuse to start or come up in a broken state.

    How to Identify It

    The journal error is usually explicit about what failed:

    Apr 20 10:45:33 sw-infrarunbook-01 dockerd[5012]: time="2026-04-20T10:45:33.812Z" level=error msg="failed to start daemon" error="error initializing graphdriver: prior storage driver devicemapper failed: driver not supported"
    Apr 20 10:45:33 sw-infrarunbook-01 dockerd[5012]: time="2026-04-20T10:45:33.901Z" level=error msg="failed to start daemon" error="error initializing graphdriver: unknown graphdriver: btrfs"

    Check what's configured versus what's actually on disk:

    cat /etc/docker/daemon.json
    ls /var/lib/docker/

    The subdirectory inside

    /var/lib/docker
    named after the driver —
    overlay2/
    ,
    devicemapper/
    ,
    btrfs/
    — tells you what driver wrote the existing state. If that doesn't match what's in
    daemon.json
    , you have a mismatch.

    How to Fix It

    If the configured driver is wrong and you want to use

    overlay2
    , update
    /etc/docker/daemon.json
    :

    {
      "storage-driver": "overlay2"
    }

    Always validate the JSON syntax before restarting — a malformed

    daemon.json
    is one of the most common self-inflicted Docker failures and gives a completely opaque error:

    python3 -m json.tool /etc/docker/daemon.json

    If you're switching drivers and want to preserve your existing images, export them first with

    docker save
    , clear
    /var/lib/docker
    , update the config, start Docker, and re-import with
    docker load
    . There's no lossless in-place driver migration — the layer formats are incompatible across drivers.


    Cause 5: Corrupted Docker State

    Why It Happens

    Docker maintains its own internal state database under

    /var/lib/docker
    . This includes layer metadata, container state, network configuration, and volume references — most of it stored in boltDB files. If the daemon is killed mid-write — power loss, hard reboot, an OOM killer that takes out
    dockerd
    at the wrong moment — you end up with corrupted boltDB files, half-written image layers, or a broken containerd content store. I've also seen this happen when a host ran out of inodes rather than disk space, causing Docker to write partial or garbage data into its metadata files before hitting the inode ceiling.

    How to Identify It

    The journal will have errors referencing database operations failing or metadata that can't be parsed:

    Apr 20 11:12:44 sw-infrarunbook-01 dockerd[5890]: time="2026-04-20T11:12:44.111Z" level=error msg="failed to start daemon" error="error loading config file: unexpected end of JSON input"
    Apr 20 11:12:44 sw-infrarunbook-01 dockerd[5890]: time="2026-04-20T11:12:44.330Z" level=error msg="containerd: deleting container" error="bolt: invalid argument"
    Apr 20 11:12:44 sw-infrarunbook-01 dockerd[5890]: time="2026-04-20T11:12:44.401Z" level=error msg="failed to start daemon" error="failed to create new content store: bolt DB /var/lib/docker/containerd/daemon/io.containerd.metadata.v1.bolt/meta.db: invalid database"

    The key phrases:

    bolt: invalid argument
    ,
    invalid database
    ,
    unexpected end of JSON input
    when reading Docker's own files, and
    failed to load container
    during startup. Also check inode exhaustion — it's easy to miss:

    df -ih /var/lib/docker

    If inode usage is at 100%, Docker can't create new metadata entries and will silently corrupt what it can't finish writing.

    How to Fix It

    The reliable fix is to stop both services, back up anything irreplaceable, and clear the state directory:

    systemctl stop docker
    systemctl stop containerd
    
    # Back up daemon config
    cp /etc/docker/daemon.json /root/daemon.json.bak
    
    # Remove corrupted state
    rm -rf /var/lib/docker
    rm -rf /var/lib/containerd
    
    systemctl start containerd
    systemctl start docker

    Be clear with yourself about what this wipes: all pulled images, all stopped containers, and any named volumes stored at the default location. Images can be re-pulled. If you have stateful named volumes that weren't backed up externally, they're gone. In production environments running workloads from an orchestrator, this is usually fine — the orchestrator will reschedule containers and pull images fresh. But audit your volumes before you delete anything.

    A more surgical approach involves deleting only the corrupted boltDB files while leaving image layer directories intact, but pinpointing exactly which files are corrupted is time-consuming and error-prone. In most cases, starting clean is faster and more reliable than trying to salvage a partially broken state tree.


    Cause 6: Stale or Broken Docker Socket

    Why It Happens

    Docker listens on a Unix socket at

    /var/run/docker.sock
    . The daemon creates this socket at startup, and it should be owned by
    root:docker
    with mode
    0660
    . Sometimes this breaks: a security hardening script changed the socket permissions, someone manually modified it during debugging, or an incomplete previous startup left behind a stale socket file that the new process can't overwrite because it's owned differently. The daemon sees the address as already in use and refuses to bind.

    How to Identify It

    ls -la /var/run/docker.sock

    Normal output looks like this:

    srw-rw---- 1 root docker 0 Apr 20 09:00 /var/run/docker.sock

    If the ownership or permissions are wrong, or if a socket file exists while the daemon is stopped, you'll see journal errors like:

    Apr 20 11:44:01 sw-infrarunbook-01 dockerd[6201]: time="2026-04-20T11:44:01.301Z" level=error msg="failed to start daemon" error="can't create unix socket /var/run/docker.sock: listen unix /var/run/docker.sock: bind: address already in use"

    How to Fix It

    If a stale socket file is blocking the bind, remove it and start fresh — Docker will recreate it cleanly:

    rm -f /var/run/docker.sock
    systemctl start docker

    If permissions are wrong after Docker is running, correct them directly:

    chown root:docker /var/run/docker.sock
    chmod 660 /var/run/docker.sock

    Cause 7: Disk Space Exhaustion

    Why It Happens

    This one feels obvious, but I've watched experienced engineers overlook it for far too long. Docker writes a lot of data to

    /var/lib/docker
    — image layers, container writable layers, build cache, and container log files. When the partition hosting that directory fills up, Docker can't write its state files or update metadata, and the daemon either refuses to start or crashes shortly after. The same problem hits when you've exhausted inodes without running out of raw disk space, which is especially common on hosts running many small containers that each create dozens of small files.

    How to Identify It

    # Check disk space
    df -h /var/lib/docker
    
    # Check inode usage
    df -ih /var/lib/docker
    
    # If Docker is partially functional
    docker system df

    Journal output for a full disk often surfaces as generic write errors rather than a clean "disk full" message:

    Apr 20 12:01:33 sw-infrarunbook-01 dockerd[6601]: time="2026-04-20T12:01:33.402Z" level=error msg="Handler for POST /v1.44/containers/create returned error: write /var/lib/docker/overlay2/abc123/merged/tmp: no space left on device"

    How to Fix It

    If Docker is partially up, prune unused data first:

    docker system prune -af --volumes

    If the daemon won't start at all, identify what's consuming space and free it manually before attempting a restart:

    du -sh /var/lib/docker/overlay2/* | sort -rh | head -20

    Once you have enough headroom to start the daemon, run

    docker system prune
    to clean up properly. Long-term, mount
    /var/lib/docker
    on a dedicated volume, and cap container log sizes in
    daemon.json
    so you don't hit this again.


    Prevention

    Most Docker daemon failures are entirely preventable. Here's what I put in place on every host running Docker in production.

    Pin kernel modules at boot. Add

    overlay
    to
    /etc/modules
    so it always loads regardless of what the initrd does during a kernel upgrade. It takes five seconds and eliminates an entire class of startup failures.

    Validate daemon.json before applying changes. Run

    python3 -m json.tool /etc/docker/daemon.json
    after every edit. JSON syntax errors are silent until Docker reads the file at startup, and a missing comma will take the daemon down just as effectively as a kernel bug. Make validation part of your change procedure, not an afterthought.

    Put /var/lib/docker on a dedicated volume. This protects the root filesystem from being filled by Docker's data and makes capacity expansion trivial. In any cloud environment, attach a separate block device and mount it at

    /var/lib/docker
    before Docker is ever installed or started on the host.

    Configure log rotation in daemon.json. Without it, container logs grow unbounded. This is one of the most common causes of disk exhaustion on long-running hosts. Always include this in your baseline configuration:

    {
      "log-driver": "json-file",
      "log-opts": {
        "max-size": "50m",
        "max-file": "5"
      }
    }

    Schedule regular pruning. Unused images, stopped containers, and dangling build cache accumulate faster than most teams realize. A weekly cron job handles it without any manual effort:

    0 3 * * 0 root docker system prune -f --filter "until=168h" >> /var/log/docker-prune.log 2>&1

    Alert on the service state. At minimum, alert on

    systemctl is-active docker
    returning anything other than
    active
    . If you're running Prometheus with node_exporter, the systemd collector exposes
    node_systemd_unit_state{name="docker.service",state="active"}
    as a metric you can alert on directly — no custom scripting needed.

    Validate Docker after kernel upgrades in staging. A kernel upgrade that flips the system from cgroup v1 to v2 will break Docker if the cgroup driver isn't set to

    systemd
    . Catching this on a staging host before rolling it to production costs minutes. Catching it in production at 2am costs much more. Make "does Docker start cleanly after kernel upgrade" a standard validation step in your patching runbook.

    Docker daemon failures aren't mysterious. They're almost always caused by one of the issues above, and the journal logs will tell you which one within the first few lines. The real discipline is reading the error before you start changing things. Build that habit and you'll cut your mean time to resolution dramatically.

    Frequently Asked Questions

    Why does Docker fail to start after a kernel upgrade?

    Kernel upgrades are a common cause of Docker failures because they can change the cgroup version (v1 to v2), unload kernel modules like overlay that weren't pinned in /etc/modules, or shift the default iptables backend. After any kernel upgrade, check journalctl -u docker.service for the specific error, verify the overlay module is loaded with lsmod | grep overlay, and confirm your cgroup driver matches the running cgroup version with stat -fc %T /sys/fs/cgroup/.

    How do I fix 'error initializing graphdriver: driver not supported'?

    This error means the storage driver specified in /etc/docker/daemon.json isn't available on this host. Check that the overlay kernel module is loaded with lsmod | grep overlay and load it with modprobe overlay if not. If you're on XFS, verify d_type support with xfs_info /var/lib/docker | grep ftype — you need ftype=1. Also confirm that the driver configured in daemon.json matches the data already on disk under /var/lib/docker.

    What should I do if /var/lib/docker is corrupted?

    Stop both Docker and containerd with systemctl stop docker && systemctl stop containerd. Back up any named volumes with data you can't recreate. Then remove the corrupted state directories: rm -rf /var/lib/docker && rm -rf /var/lib/containerd. Start containerd first, then Docker. Images will need to be re-pulled and containers will need to be recreated, but the daemon will start cleanly from a fresh state.

    Why does Docker fail with 'iptables: No chain/target/match by that name'?

    This is an iptables backend conflict. Modern distributions use iptables-nft by default, but Docker writes legacy iptables rules. Switch the system to iptables-legacy with: update-alternatives --set iptables /usr/sbin/iptables-legacy && update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy, then restart Docker. If firewalld is also running, add the docker0 interface to a trusted zone with firewall-cmd --permanent --zone=trusted --add-interface=docker0.

    How do I check if disk space is causing Docker to fail?

    Run df -h /var/lib/docker to check raw disk usage and df -ih /var/lib/docker to check inode usage — both can cause failures. If Docker is partially running, docker system df shows exactly how much space images, containers, and volumes are consuming. If the disk is full and Docker won't start, free space manually by removing large overlay directories, then start Docker and run docker system prune to clean up properly.

    Related Articles