Docker Volumes and Persistent Storage...

What Docker Volumes Actually Are

If you've spent more than a few hours with Docker, you've already run into the fundamental problem: containers are ephemeral. The moment a container dies, everything written inside its writable layer goes with it. That's by design — it's what makes containers reproducible and clean. But most real applications need to persist data somewhere. That's where volumes come in.

Docker gives you three ways to mount storage into a container: named volumes, bind mounts, and tmpfs mounts. Each solves a different problem, and choosing the wrong one will bite you eventually. Named volumes are managed entirely by Docker and stored under

/var/lib/docker/volumes/

on the host. Bind mounts map a specific host path directly into the container. tmpfs mounts exist only in memory and never touch disk — useful for secrets or scratch space you explicitly don't want persisted.

Most people use "volume" loosely to mean any of these, but in Docker's internal model, a volume specifically refers to the named, Docker-managed kind. That distinction matters when you're reading the docs, scripting automation, or debugging a production incident at 2am.

How Docker Volumes Work Under the Hood

When you create a named volume — either explicitly with

docker volume create

or implicitly by declaring one in a Compose file — Docker creates a directory under

/var/lib/docker/volumes/<volume-name>/_data

. The Docker daemon owns this directory. Your container gets it mounted at whatever path you specify, and from the container's perspective it's just a filesystem path. The container has no idea it's talking to a managed volume.

docker volume create pg-data
docker run -d \
  --name postgres \
  -v pg-data:/var/lib/postgresql/data \
  -e POSTGRES_PASSWORD=changeme \
  postgres:16

The

_data

subdirectory is the actual mount point. Docker wraps it in that structure so it can store volume metadata alongside the data. If you peek at

/var/lib/docker/volumes/pg-data/_data

on the host as root, you'll see the raw PostgreSQL data directory sitting right there — completely accessible from the host, even when no container is using it.

This is one of the most important things to understand: volumes outlive containers. Stopping or removing a container does not delete its volumes unless you explicitly pass

-v

docker rm

or run

docker volume rm

separately. In my experience, a fair number of "mystery disk space" tickets on Docker hosts come down to orphaned volumes accumulating over weeks of development or CI runs.

Bind mounts work differently. There's no Docker metadata layer — the host path is mounted directly into the container using the kernel's bind mount mechanism. Docker is essentially a thin wrapper calling

mount --bind

. This gives you direct access to host filesystem paths, which is powerful but comes with tradeoffs around portability and permissions that named volumes don't have.

docker run -d \
  --name nginx \
  -v /srv/solvethenetwork/www:/usr/share/nginx/html:ro \
  -p 80:80 \
  nginx:stable

The

:ro

flag makes the mount read-only inside the container. Always use this for config files or static content that a container shouldn't be modifying — it's a cheap safety net that has saved me from more than one accidental overwrite.

Volume Drivers

By default, Docker uses the

local

volume driver, which stores data on the host's local filesystem. But volume drivers are pluggable. You can use drivers that back volumes with NFS, AWS EBS, GlusterFS, Ceph, or any other storage backend you can reach from the host. This is what makes Docker volumes useful beyond single-node development setups.

docker volume create \
  --driver local \
  --opt type=nfs \
  --opt o=addr=192.168.10.50,rw \
  --opt device=:/exports/app-data \
  nfs-app-data

Here we're creating a volume backed by an NFS share at

192.168.10.50

. Containers that mount

nfs-app-data

will read and write directly to the NFS export. The container doesn't need to know anything about NFS — it just sees a directory. This pattern is common in environments running standalone Docker hosts where you need shared storage accessible from multiple machines without the full complexity of Kubernetes persistent volumes.

Why Persistent Storage Actually Matters in Practice

The pitch for containers often emphasizes immutability and statelessness. Great for stateless services. But the moment you run a database, a message broker, a build cache, or anything that accumulates state over time, you need a real persistence strategy baked in from day one.

I've seen teams early in their Docker adoption store everything in the container's writable layer. Things work fine during development. Then someone does their first

docker-compose down

and loses all their local test data. Then they rebuild the image and lose it again. The mental model of "the container is the environment" hasn't fully clicked yet — specifically, that the container's filesystem is not the application's long-term storage.

Beyond development convenience, persistent storage is a production correctness requirement. A PostgreSQL instance that loses its data directory on container restart isn't a database — it's an expensive way to generate empty tables. Redis in AOF or RDB persistence mode needs its data files to survive restarts. Elasticsearch indices represent hours of indexing work that shouldn't vanish because someone ran

docker restart

Volume performance also matters more than people expect. The container's writable layer uses a union filesystem — overlay2 by default on modern Linux kernels. This introduces overhead for write-heavy workloads because every write goes through copy-on-write mechanics. Databases particularly suffer when their data files go through overlay2 rather than a direct volume mount. Using a named volume or bind mount bypasses the union filesystem entirely and gives the container direct access to the underlying storage, which is measurably better I/O for anything doing serious sequential or random writes.

Real-World Examples

PostgreSQL on a Single Docker Host

This is probably the most common pattern I see for small internal tools and staging environments running on a single VM. On a host like

sw-infrarunbook-01

, you want the database to survive container restarts and image upgrades without losing data.

version: "3.9"

services:
  db:
    image: postgres:16
    restart: unless-stopped
    environment:
      POSTGRES_USER: infrarunbook-admin
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_DB: appdb
    volumes:
      - pg-data:/var/lib/postgresql/data
    networks:
      - backend

volumes:
  pg-data:
    driver: local

networks:
  backend:
    driver: bridge

The named volume

pg-data

is declared at the bottom of the Compose file. Docker creates it on first run if it doesn't already exist. Bring the stack down with

docker-compose down

and back up — data is intact. Destroy and recreate the container — data is intact. Upgrade the Postgres image from 15 to 16 with the appropriate migration steps — data is intact. The volume persists independently of any container's lifecycle, which is exactly the contract you want.

Nginx with Bind-Mounted Static Content

For serving static content managed outside Docker — deployed by a CI pipeline or rsync'd from a build server — a bind mount makes more sense than a named volume because Docker shouldn't own that content's lifecycle.

version: "3.9"

services:
  web:
    image: nginx:stable-alpine
    restart: unless-stopped
    ports:
      - "192.168.10.100:80:80"
      - "192.168.10.100:443:443"
    volumes:
      - /srv/solvethenetwork/www:/usr/share/nginx/html:ro
      - /srv/solvethenetwork/nginx-conf:/etc/nginx/conf.d:ro
      - /srv/solvethenetwork/certs:/etc/nginx/certs:ro
      - nginx-logs:/var/log/nginx

volumes:
  nginx-logs:

Notice the deliberate mix: bind mounts for the content, config, and certs (managed by external processes and pipelines), and a named volume for logs (which Docker should manage and which nothing external needs to write to). The ports are also bound to a specific RFC 1918 address rather than all interfaces — a small but meaningful security practice when the host has multiple network interfaces exposed to different network segments.

Inspecting and Backing Up Volumes

One thing that trips people up: how do you back up a named volume? You can't just copy the

_data

directory while a database is actively writing to it and expect a consistent backup. The idiomatic Docker approach is a temporary container that mounts the volume and streams a tarball out:

# Create a backup
docker run --rm \
  -v pg-data:/data:ro \
  -v /backup:/backup \
  busybox \
  tar czf /backup/pg-data-$(date +%Y%m%d).tar.gz -C /data .

# Restore from backup
docker run --rm \
  -v pg-data:/data \
  -v /backup:/backup \
  busybox \
  tar xzf /backup/pg-data-20260401.tar.gz -C /data

This works because you're using a throwaway container purely as a tool to access the volume. The

busybox

image is tiny and has everything you need for basic archiving. For databases specifically, you'll want to use the database's own dump tool (

pg_dump

mysqldump

) rather than raw filesystem copies to guarantee transactional consistency — but for migrating volume contents between hosts or doing a quick snapshot, the busybox pattern is clean and portable.

When something goes wrong and you need to know where a volume actually lives on the host,

docker volume inspect

is your first call:

docker volume inspect pg-data

[
    {
        "CreatedAt": "2026-04-01T14:32:10Z",
        "Driver": "local",
        "Labels": {},
        "Mountpoint": "/var/lib/docker/volumes/pg-data/_data",
        "Name": "pg-data",
        "Options": {},
        "Scope": "local"
    }
]

The

Mountpoint

field tells you exactly where the data lives. From there you can inspect permissions, verify data is being written, or run an emergency filesystem-level backup if the container itself is broken.

Common Misconceptions

Named Volumes Are Safer Than Bind Mounts

This one's more nuanced than it sounds. Named volumes are more portable — you don't rely on a specific host path existing — and they're cleaner to reference in Compose files. But they're not inherently more durable or protected. Both live on the host filesystem. Both can be accidentally deleted. Both will be gone if the host's storage fails without off-host backups in place. A named volume is not a backup. It's not replicated. It's just a directory Docker knows about.

Removing a Container Removes Its Volumes

This misconception causes two opposite kinds of pain. Developers who expect cleanup to be automatic end up with gigabytes of orphaned volumes on their machines after a few months of active work. Run

docker volume ls

on a machine that's been used for development for a while — the list is almost always longer than anyone expects.

On the flip side, operations engineers sometimes assume removing a container wipes its data, which would be catastrophic for a production database. By default,

docker rm

leaves all volumes intact. You have to explicitly pass

-v

or run

docker volume rm

separately. That's the correct default behavior, but it isn't obvious until someone is surprised by it in one direction or the other.

To clean up dangling volumes — volumes not attached to any container — use:

docker volume prune

Don't run this blindly on a server hosting multiple services. A "dangling" volume might belong to a stopped container you intend to restart. On a dedicated development machine it's generally safe. On a shared host, check first.

Volumes Solve the Multi-Host Problem

Local volumes only exist on the host where they were created. If you're running multiple Docker hosts and expecting containers on different nodes to share a volume, local volumes won't work. You need either a volume driver backed by network storage (NFS, Ceph, a cloud block storage API), or you need to move to an orchestrator like Kubernetes where persistent volume claims abstract this properly.

I've seen this surface specifically with Docker Swarm deployments. Someone defines a service with a named volume, scales it across three nodes, then wonders why the application only behaves consistently when the container lands on one particular node. The volume exists independently on each node — it's not replicated or shared. This is a fundamental architectural constraint, not a Docker bug, and understanding it early saves a lot of debugging time later.

tmpfs Mounts Persist Across Restarts

tmpfs mounts are in memory only. They don't survive container restarts, and they don't survive host reboots. They're appropriate for sensitive runtime data you explicitly want cleared — session tokens, decrypted secrets, scratch space for in-flight processing. Using tmpfs for application state you care about is a data loss incident waiting to happen, and it will happen at the worst possible time.

Putting It Together

The mental model that makes all of this click: think of Docker volumes as external disks you plug into containers. The disk exists independently of any container. Containers attach to it, detach from it, get replaced while the disk retains all its data. That disk can be local storage, NFS, a cloud block device, or whatever your volume driver supports — the container doesn't care.

Pick named volumes when you want Docker to manage the storage location and lifecycle — databases, application state, generated artifacts. Pick bind mounts when you have existing host paths that need to be accessible inside a container — config files, static assets, logs that external tooling needs to consume. Pick tmpfs when you need fast ephemeral storage that must never touch disk.

Get your volume strategy right early. Retrofitting persistent storage into an existing deployment is doable but tedious, and doing it while services are actively running in production adds unnecessary risk to what should be a non-event. Define your volumes explicitly in Compose files, document which ones are critical for backup, and periodically audit what's accumulating on your Docker hosts. Disk space surprises at 3am are avoidable with a few minutes of planning upfront.

Docker Volumes and Persistent Storage Explained

What Docker Volumes Actually Are

How Docker Volumes Work Under the Hood

Volume Drivers

Why Persistent Storage Actually Matters in Practice

Real-World Examples

PostgreSQL on a Single Docker Host

Nginx with Bind-Mounted Static Content

Inspecting and Backing Up Volumes

Common Misconceptions

Named Volumes Are Safer Than Bind Mounts

Removing a Container Removes Its Volumes

Volumes Solve the Multi-Host Problem

tmpfs Mounts Persist Across Restarts

Putting It Together

Related Articles

Frequently Asked Questions

What is the difference between a Docker named volume and a bind mount?

Does removing a Docker container delete its volumes?

How do I back up a Docker named volume?

Can Docker volumes be shared across multiple hosts?

Why should databases use volumes instead of the container's writable layer?

Related Articles