Docker Disk Space Usage Growing

Symptoms

You log in to sw-infrarunbook-01 and

df -h

shows

/var/lib/docker

sitting at 94% utilization. Maybe a

docker pull

just failed mid-layer with

write /var/lib/docker/overlay2/...: no space left on device

. Containers are refusing to start. A build pipeline that was green yesterday is now dying. Your monitoring alert fired at 3 AM and you need to recover fast.

Docker disk usage is one of those problems that creeps up slowly and then hits you all at once. The daemon doesn't aggressively reclaim space on its own — it's designed to cache for speed and leave cleanup to the operator. If you haven't built cleanup into your workflow, the disk will fill. Every time.

Common things you'll see when this happens:

docker pull
fails with
write /var/lib/docker/overlay2/...: no space left on device
Container fails to start with
Error response from daemon: mkdir ...: no space left on device
Builds die mid-layer during a
RUN
step with no obvious error in the Dockerfile
df -h
shows
/
or a dedicated Docker partition at 90% or higher
du -sh /var/lib/docker/*
shows several gigabytes spread across
overlay2
,
containers
, and
volumes

Before chasing individual causes, start with Docker's own accounting command. It gives you a breakdown by category and shows you exactly where to focus:

infrarunbook-admin@sw-infrarunbook-01:~$ docker system df
TYPE            TOTAL     ACTIVE    SIZE      RECLAIMABLE
Images          47        12        18.4GB    14.1GB (76%)
Containers      23        6         1.2GB     980MB (81%)
Local Volumes   31        8         9.7GB     6.3GB (64%)
Build Cache     186       0         3.2GB     3.2GB

That output tells a story. Over 33 GB is sitting on disk right now, and the vast majority of it is reclaimable. Let's go through each root cause systematically.

Root Cause 1: Unused Images Not Cleaned Up

This is the most common offender on any long-running Docker host. Every time you pull a new version of an image, run a build, or have a CI pipeline push updated tags, the old image layers stay on disk. Docker doesn't delete them automatically. The old layers are kept because Docker doesn't know whether another image or container might still reference them — but in practice, on a busy host, most of those layers are completely orphaned.

There are two categories to care about. Dangling images are untagged intermediate images — the kind produced when you rebuild an image and the old one loses its tag. Unreferenced images are fully tagged images that no running or stopped container is actually using. In my experience, a host that's been running a daily build pipeline for a few months can accumulate 30–50 image versions. That's easily 15–25 GB of layer data sitting idle.

How to Identify

infrarunbook-admin@sw-infrarunbook-01:~$ docker images
REPOSITORY                                  TAG     IMAGE ID       CREATED        SIZE
registry.solvethenetwork.com/app/api        v1.42   a3b4c5d6e7f8   2 hours ago    1.1GB
registry.solvethenetwork.com/app/api        v1.41   9f8e7d6c5b4a   2 days ago     1.1GB
registry.solvethenetwork.com/app/api        v1.40   1a2b3c4d5e6f   5 days ago     1.0GB
<none>                                      <none>  deadbeef1234   6 days ago     980MB
nginx                                       1.25    abcdef123456   1 week ago     192MB
nginx                                       1.24    fedcba654321   3 weeks ago    190MB

infrarunbook-admin@sw-infrarunbook-01:~$ docker images -f dangling=true
REPOSITORY   TAG       IMAGE ID       CREATED       SIZE
<none>       <none>    deadbeef1234   6 days ago    980MB

How to Fix

To remove only dangling images:

infrarunbook-admin@sw-infrarunbook-01:~$ docker image prune
WARNING! This will remove all dangling images.
Are you sure you want to continue? [y/N] y
Deleted Images:
deleted: sha256:deadbeef1234...

Total reclaimed space: 980MB

The more useful option — removing all images not referenced by any container, running or stopped:

infrarunbook-admin@sw-infrarunbook-01:~$ docker image prune -a
WARNING! This will remove all images without at least one container associated to them.
Are you sure you want to continue? [y/N] y
Deleted Images:
untagged: registry.solvethenetwork.com/app/api:v1.40
deleted: sha256:1a2b3c4d5e6f...
untagged: nginx:1.24
deleted: sha256:fedcba654321...

Total reclaimed space: 14.1GB

If you need to be selective — for example, keeping images from the last 24 hours — you can filter by age:

docker image prune -a --filter until=24h

Root Cause 2: Orphaned Volumes

Docker volumes are intentionally persistent. When a container is removed, its volume is not — that's by design, so you don't lose your database data just because a container restarted. But this means every

docker rm

without the

-v

flag leaves a volume behind. Over time, especially on hosts running docker-compose stacks that get torn down and rebuilt regularly, orphaned volumes accumulate quietly.

I've seen this happen constantly with teams that run short-lived compose stacks for testing or staging. They do

docker-compose up -d

, test something, then

docker-compose down

— not realizing that

down

does not remove named volumes by default. Do that a hundred times over a few months and you have a hundred orphaned volumes, some of which might be holding gigabytes of database files that will never be read again.

How to Identify

infrarunbook-admin@sw-infrarunbook-01:~$ docker volume ls
DRIVER    VOLUME NAME
local     postgres_data_20240301
local     postgres_data_20240315
local     postgres_data_20240401
local     redis_cache_old
local     app_uploads_backup
local     7f3a1b2c4d5e6f7a8b9c0d1e2f3a4b5c

infrarunbook-admin@sw-infrarunbook-01:~$ docker volume ls -f dangling=true
DRIVER    VOLUME NAME
local     postgres_data_20240301
local     postgres_data_20240315
local     redis_cache_old
local     7f3a1b2c4d5e6f7a8b9c0d1e2f3a4b5c

infrarunbook-admin@sw-infrarunbook-01:~$ du -sh /var/lib/docker/volumes/*
4.1G    /var/lib/docker/volumes/postgres_data_20240301
4.0G    /var/lib/docker/volumes/postgres_data_20240315
210M    /var/lib/docker/volumes/redis_cache_old
12K     /var/lib/docker/volumes/7f3a1b2c4d5e6f7a8b9c0d1e2f3a4b5c

How to Fix

Before pruning volumes, verify the dangling ones are genuinely unused. Check what containers were using them and whether that data has been backed up or migrated. Named volumes that look like

postgres_data_20240301

should be treated carefully — old doesn't mean unneeded. Once you're certain:

infrarunbook-admin@sw-infrarunbook-01:~$ docker volume prune
WARNING! This will remove anonymous local volumes not used by at least one container.
Are you sure you want to continue? [y/N] y
Deleted Volumes:
redis_cache_old
7f3a1b2c4d5e6f7a8b9c0d1e2f3a4b5c

Total reclaimed space: 210MB

To remove specific named volumes you've verified as safe:

infrarunbook-admin@sw-infrarunbook-01:~$ docker volume rm postgres_data_20240301 postgres_data_20240315
postgres_data_20240301
postgres_data_20240315

Going forward, use

docker-compose down -v

whenever you want volumes cleaned up along with the stack. Build that habit into your teardown scripts from the start.

Root Cause 3: Build Cache Not Pruned

The Docker build cache is where intermediate image layers live during and after a build. Docker keeps these around so subsequent builds can reuse layers that haven't changed — this is what makes rebuilds fast when only your application code changes. The cache isn't free, though. On an active build host, it can quietly grow to 20–40 GB because it doesn't show up prominently in

docker images

BuildKit, which is the default builder since Docker 23.x, maintains its own separate cache on top of the classic layer cache. If you're running BuildKit builds — which is likely if you're on a modern Docker version — you need to account for both when investigating disk usage.

How to Identify

infrarunbook-admin@sw-infrarunbook-01:~$ docker system df
TYPE            TOTAL     ACTIVE    SIZE      RECLAIMABLE
Images          47        12        18.4GB    14.1GB (76%)
Containers      23        6         1.2GB     980MB (81%)
Local Volumes   31        8         9.7GB     6.3GB (64%)
Build Cache     186       0         3.2GB     3.2GB

infrarunbook-admin@sw-infrarunbook-01:~$ docker builder du
ID                                   RECLAIMABLE   SIZE        LAST ACCESSED
s3b4hk9f5qlxf2a1c7d8e9               true          1.2GB       2 hours ago
l1m2n3o4p5q6r7s8t9u0v1               true          890MB       6 hours ago
w2x3y4z5a6b7c8d9e0f1g2               true          780MB       1 day ago
...
Total: 3.2GB

How to Fix

To prune only dangling build cache — layers with no references to current images:

infrarunbook-admin@sw-infrarunbook-01:~$ docker builder prune
WARNING! This will remove all dangling build cache.
Are you sure you want to continue? [y/N] y
Deleted build cache objects:
s3b4hk9f5qlxf2a1c7d8e9
l1m2n3o4p5q6r7s8t9u0v1

Total reclaimed space: 2.09GB

To wipe the entire build cache — including layers that could theoretically speed up future builds:

infrarunbook-admin@sw-infrarunbook-01:~$ docker builder prune --all
WARNING! This will remove all build cache.
Are you sure you want to continue? [y/N] y

Total reclaimed space: 3.2GB

The tradeoff is that your next build will be slower — all layers pull fresh. On a CI host that rebuilds from scratch on every run anyway, this is no loss at all. On a developer workstation where you rebuild frequently, be more selective with

--filter until=24h

to preserve recent cache entries.

Root Cause 4: Container Logs Not Rotated

By default, Docker's

json-file

log driver writes container stdout and stderr to

/var/lib/docker/containers/<container-id>/<container-id>-json.log

with no size limit and no rotation. A container that logs aggressively — a web server recording every HTTP request, a service with a runaway debug logger, or an application caught in an error loop — will write indefinitely until the disk is full.

I've seen this take down production hosts. A single Java service that had its log level accidentally set to DEBUG wrote 40 GB in under 18 hours. The container appeared perfectly healthy from a process standpoint. The disk did not.

How to Identify

infrarunbook-admin@sw-infrarunbook-01:~$ du -sh /var/lib/docker/containers/*/*-json.log | sort -rh | head -10
38G     /var/lib/docker/containers/a1b2c3d4e5f6.../a1b2c3d4e5f6...-json.log
2.1G    /var/lib/docker/containers/7f8e9d0c1b2a.../7f8e9d0c1b2a...-json.log
450M    /var/lib/docker/containers/3c4d5e6f7a8b.../3c4d5e6f7a8b...-json.log

infrarunbook-admin@sw-infrarunbook-01:~$ docker ps --format "{{.ID}} {{.Names}}"
a1b2c3d4e5f6 api-service
7f8e9d0c1b2a nginx-proxy
3c4d5e6f7a8b worker

You can also look up the log path for a specific container directly:

infrarunbook-admin@sw-infrarunbook-01:~$ docker inspect --format='{{.LogPath}}' api-service
/var/lib/docker/containers/a1b2c3d4e5f6.../a1b2c3d4e5f6...-json.log

infrarunbook-admin@sw-infrarunbook-01:~$ ls -lh /var/lib/docker/containers/a1b2c3d4e5f6.../
total 38G
-rw-r----- 1 root root 38G Apr 18 03:22 a1b2c3d4e5f6...-json.log

How to Fix

For immediate disk recovery, truncate the log file without restarting the container. Don't delete it — the container process holds the file descriptor open and the space won't be freed until the process releases the handle:

infrarunbook-admin@sw-infrarunbook-01:~$ truncate -s 0 /var/lib/docker/containers/a1b2c3d4e5f6.../a1b2c3d4e5f6...-json.log

Then fix the root cause. Configure log rotation globally in

/etc/docker/daemon.json

{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m",
    "max-file": "5"
  }
}

Restart the Docker daemon to apply. Note this restarts all containers, so plan accordingly:

infrarunbook-admin@sw-infrarunbook-01:~$ systemctl restart docker

You can also set log options per-container in your compose file, which takes precedence over the daemon default and is useful when specific services need tighter or looser limits:

services:
  api-service:
    image: registry.solvethenetwork.com/app/api:v1.42
    logging:
      driver: json-file
      options:
        max-size: "100m"
        max-file: "5"

Existing containers don't inherit daemon.json changes retroactively. You need to recreate them for new log settings to apply — a rolling restart works fine if you're using Compose.

Root Cause 5: Overlay Filesystem Fragmentation and Inode Exhaustion

The

overlay2

storage driver maintains a directory per image layer and per container writable layer under

/var/lib/docker/overlay2/

. On a host that has created and destroyed many containers over time, two distinct and often misdiagnosed problems emerge: filesystem fragmentation where allocated blocks are scattered inefficiently, and inode exhaustion where the filesystem runs out of directory entries even though block space appears available.

The inode problem is the one that catches people off guard. You run

docker pull

and it fails. You check

df -h

and see 40% free space. The host looks fine. Then you check

df -i

How to Identify

infrarunbook-admin@sw-infrarunbook-01:~$ df -h /var/lib/docker
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        200G  120G   80G  60% /

infrarunbook-admin@sw-infrarunbook-01:~$ df -i /var/lib/docker
Filesystem      Inodes   IUsed    IFree  IUse% Mounted on
/dev/sda1     12582912 12580344    2568   100% /

infrarunbook-admin@sw-infrarunbook-01:~$ ls /var/lib/docker/overlay2 | wc -l
18453

There it is. 100% inode usage, over 18,000 overlay2 directories, and plenty of block space. Docker can't create any new directories — every new container or layer creation fails. The host is effectively out of disk from Docker's perspective even though

df -h

looks fine.

For the deleted-but-open-file problem — where disk usage appears high but you can't account for it with

du

infrarunbook-admin@sw-infrarunbook-01:~$ lsof | grep deleted | grep docker
dockerd  1234  root  45u  REG  8,1  4294967296  1234567 /var/lib/docker/containers/a1b2c3.../a1b2c3...-json.log (deleted)

infrarunbook-admin@sw-infrarunbook-01:~$ df -h /var/lib/docker
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        200G  198G  2.0G  99% /

The log file was deleted from the filesystem, but the container process still has it open. The kernel won't free those blocks until the file descriptor is closed — meaning until that container stops or restarts. This is a classic discrepancy between

df

and

du

How to Fix

For inode exhaustion, the primary fix is removing unused Docker objects to free directory entries. A full system prune is often the fastest path to recovery:

infrarunbook-admin@sw-infrarunbook-01:~$ docker system prune -a --volumes
WARNING! This will remove:
  - all stopped containers
  - all networks not used by at least one container
  - all volumes not used by at least one container
  - all images without at least one container associated to them
  - all build cache

Are you sure you want to continue? [y/N] y

Total reclaimed space: 31.4GB

infrarunbook-admin@sw-infrarunbook-01:~$ df -i /var/lib/docker
Filesystem      Inodes   IUsed    IFree  IUse% Mounted on
/dev/sda1     12582912  421083 12161829    4% /

For the deleted-open-file scenario, restart the offending container to release its file descriptors:

infrarunbook-admin@sw-infrarunbook-01:~$ docker restart api-service

If you're persistently hitting inode limits, consider moving

/var/lib/docker

to a dedicated filesystem formatted with a higher inode density. With

mkfs.ext4

, the

-i

flag controls bytes-per-inode — a smaller value like

-i 4096

gives you more inodes for the same block count, at the cost of slightly less usable space.

Root Cause 6: Exited Containers Accumulating

Every stopped container retains its writable layer on disk until it's explicitly removed. This writable layer exists even if the container wrote nothing at all during its lifetime — it's allocated at container creation and holds any filesystem changes the container made. On a host running many short-lived jobs, cron containers, one-off migrations, or test runners, these writable layers pile up fast and silently.

How to Identify

infrarunbook-admin@sw-infrarunbook-01:~$ docker ps -a --filter status=exited
CONTAINER ID   IMAGE                                           COMMAND          CREATED       STATUS
b1c2d3e4f5a6   registry.solvethenetwork.com/app/migrate:v12   "./migrate up"   2 days ago    Exited (0) 2 days ago
c2d3e4f5a6b7   registry.solvethenetwork.com/app/migrate:v11   "./migrate up"   4 days ago    Exited (0) 4 days ago
...

infrarunbook-admin@sw-infrarunbook-01:~$ docker ps -a --filter status=exited | wc -l
89

How to Fix

infrarunbook-admin@sw-infrarunbook-01:~$ docker container prune
WARNING! This will remove all stopped containers.
Are you sure you want to continue? [y/N] y
Deleted Containers:
b1c2d3e4f5a6...
c2d3e4f5a6b7...
...

Total reclaimed space: 1.2GB

For any container you run as a one-off job, pass

--rm

docker run

so the container is removed automatically when it exits. This single habit eliminates the accumulation entirely:

infrarunbook-admin@sw-infrarunbook-01:~$ docker run --rm registry.solvethenetwork.com/app/migrate:v12 ./migrate up

Prevention

Set daemon-level log rotation before you run a single container in production. Put this in

/etc/docker/daemon.json

on every Docker host during provisioning. A 100 MB limit with five rotated files gives you 500 MB of log retention per container — more than enough for debugging, not enough to kill a disk:

{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m",
    "max-file": "5"
  }
}

Schedule a nightly prune job. A systemd timer or cron entry that runs

docker system prune -f

at 3 AM keeps things from accumulating. The

-f

skips the confirmation prompt for automated execution. If your workload allows removing unused images too, run the more aggressive variant:

# /etc/cron.d/docker-cleanup
0 3 * * * root docker system prune -af --volumes >> /var/log/docker-prune.log 2>&1

Always use

--rm

for one-off containers. Any container running a job and exiting — migrations, backups, test runners, data imports — should be launched with

docker run --rm

. Make it a team standard. It costs nothing and eliminates an entire category of disk accumulation with zero operational overhead.

Tag cleanup in your CI/CD pipeline. After pushing a new image and deploying it, have your pipeline explicitly prune old images on the build host. Don't rely on manual cleanup. A post-deploy

docker image prune -af

step takes a few seconds and keeps the build host lean indefinitely.

Use

docker-compose down -v

for ephemeral stacks. When tearing down compose stacks that you don't intend to reuse — test environments, staging stacks spun up for a review — always include

-v

. Build it into your teardown scripts from day one and it's never a problem.

Put Docker on its own partition. If you're provisioning new hosts, move

/var/lib/docker

to a dedicated LVM volume or block device. When Docker fills its disk, the host OS, SSH daemon, and system logs are all unaffected. Recovery becomes a Docker problem, not a full host recovery problem. You can also resize that volume independently without touching the root filesystem — a much lower-stakes operation at 3 AM.

Monitor with alert thresholds. Set alerts at 70% and 85% disk utilization on the Docker host's filesystem. 70% is your early warning — plenty of time to schedule a prune during business hours. 85% is your page-me-now threshold. Don't wait for 100%, because by then you're already in recovery mode.

Docker disk usage growing is a solved problem. It just requires intentional habits and a bit of automation. The daemon won't clean up after itself, so the operator has to. Build the prune job, set the log rotation, use

--rm

, and this stops being an incident.

Docker Disk Space Usage Growing

Symptoms

Root Cause 1: Unused Images Not Cleaned Up

How to Identify

How to Fix

Root Cause 2: Orphaned Volumes

How to Identify

How to Fix

Root Cause 3: Build Cache Not Pruned

How to Identify

How to Fix

Root Cause 4: Container Logs Not Rotated

How to Identify

How to Fix

Root Cause 5: Overlay Filesystem Fragmentation and Inode Exhaustion

How to Identify

How to Fix

Root Cause 6: Exited Containers Accumulating

How to Identify

How to Fix

Prevention

Frequently Asked Questions

What is the fastest way to recover disk space from Docker right now?

Why does 'df' show high disk usage but 'du /var/lib/docker' shows much less?

How do I prevent Docker from filling the disk on a CI build server?

My Docker host has plenty of block space but containers still fail to create. What's wrong?

Does configuring log rotation in daemon.json apply to existing containers?

Related Articles