InfraRunBook
    Back to articles

    Docker Volumes and Persistent Storage Explained

    Docker
    Published: Apr 8, 2026
    Updated: Apr 8, 2026

    A practical runbook covering Docker named volumes, bind mounts, and tmpfs — how each works under the hood, when to use them, and real-world Compose examples for production-grade persistence.

    Docker Volumes and Persistent Storage Explained

    What Docker Volumes Actually Are

    If you've spent more than a few hours with Docker, you've already run into the fundamental problem: containers are ephemeral. The moment a container dies, everything written inside its writable layer goes with it. That's by design — it's what makes containers reproducible and clean. But most real applications need to persist data somewhere. That's where volumes come in.

    Docker gives you three ways to mount storage into a container: named volumes, bind mounts, and tmpfs mounts. Each solves a different problem, and choosing the wrong one will bite you eventually. Named volumes are managed entirely by Docker and stored under

    /var/lib/docker/volumes/
    on the host. Bind mounts map a specific host path directly into the container. tmpfs mounts exist only in memory and never touch disk — useful for secrets or scratch space you explicitly don't want persisted.

    Most people use "volume" loosely to mean any of these, but in Docker's internal model, a volume specifically refers to the named, Docker-managed kind. That distinction matters when you're reading the docs, scripting automation, or debugging a production incident at 2am.

    How Docker Volumes Work Under the Hood

    When you create a named volume — either explicitly with

    docker volume create
    or implicitly by declaring one in a Compose file — Docker creates a directory under
    /var/lib/docker/volumes/<volume-name>/_data
    . The Docker daemon owns this directory. Your container gets it mounted at whatever path you specify, and from the container's perspective it's just a filesystem path. The container has no idea it's talking to a managed volume.

    docker volume create pg-data
    docker run -d \
      --name postgres \
      -v pg-data:/var/lib/postgresql/data \
      -e POSTGRES_PASSWORD=changeme \
      postgres:16

    The

    _data
    subdirectory is the actual mount point. Docker wraps it in that structure so it can store volume metadata alongside the data. If you peek at
    /var/lib/docker/volumes/pg-data/_data
    on the host as root, you'll see the raw PostgreSQL data directory sitting right there — completely accessible from the host, even when no container is using it.

    This is one of the most important things to understand: volumes outlive containers. Stopping or removing a container does not delete its volumes unless you explicitly pass

    -v
    to
    docker rm
    or run
    docker volume rm
    separately. In my experience, a fair number of "mystery disk space" tickets on Docker hosts come down to orphaned volumes accumulating over weeks of development or CI runs.

    Bind mounts work differently. There's no Docker metadata layer — the host path is mounted directly into the container using the kernel's bind mount mechanism. Docker is essentially a thin wrapper calling

    mount --bind
    . This gives you direct access to host filesystem paths, which is powerful but comes with tradeoffs around portability and permissions that named volumes don't have.

    docker run -d \
      --name nginx \
      -v /srv/solvethenetwork/www:/usr/share/nginx/html:ro \
      -p 80:80 \
      nginx:stable

    The

    :ro
    flag makes the mount read-only inside the container. Always use this for config files or static content that a container shouldn't be modifying — it's a cheap safety net that has saved me from more than one accidental overwrite.

    Volume Drivers

    By default, Docker uses the

    local
    volume driver, which stores data on the host's local filesystem. But volume drivers are pluggable. You can use drivers that back volumes with NFS, AWS EBS, GlusterFS, Ceph, or any other storage backend you can reach from the host. This is what makes Docker volumes useful beyond single-node development setups.

    docker volume create \
      --driver local \
      --opt type=nfs \
      --opt o=addr=192.168.10.50,rw \
      --opt device=:/exports/app-data \
      nfs-app-data

    Here we're creating a volume backed by an NFS share at

    192.168.10.50
    . Containers that mount
    nfs-app-data
    will read and write directly to the NFS export. The container doesn't need to know anything about NFS — it just sees a directory. This pattern is common in environments running standalone Docker hosts where you need shared storage accessible from multiple machines without the full complexity of Kubernetes persistent volumes.

    Why Persistent Storage Actually Matters in Practice

    The pitch for containers often emphasizes immutability and statelessness. Great for stateless services. But the moment you run a database, a message broker, a build cache, or anything that accumulates state over time, you need a real persistence strategy baked in from day one.

    I've seen teams early in their Docker adoption store everything in the container's writable layer. Things work fine during development. Then someone does their first

    docker-compose down
    and loses all their local test data. Then they rebuild the image and lose it again. The mental model of "the container is the environment" hasn't fully clicked yet — specifically, that the container's filesystem is not the application's long-term storage.

    Beyond development convenience, persistent storage is a production correctness requirement. A PostgreSQL instance that loses its data directory on container restart isn't a database — it's an expensive way to generate empty tables. Redis in AOF or RDB persistence mode needs its data files to survive restarts. Elasticsearch indices represent hours of indexing work that shouldn't vanish because someone ran

    docker restart
    .

    Volume performance also matters more than people expect. The container's writable layer uses a union filesystem — overlay2 by default on modern Linux kernels. This introduces overhead for write-heavy workloads because every write goes through copy-on-write mechanics. Databases particularly suffer when their data files go through overlay2 rather than a direct volume mount. Using a named volume or bind mount bypasses the union filesystem entirely and gives the container direct access to the underlying storage, which is measurably better I/O for anything doing serious sequential or random writes.

    Real-World Examples

    PostgreSQL on a Single Docker Host

    This is probably the most common pattern I see for small internal tools and staging environments running on a single VM. On a host like

    sw-infrarunbook-01
    , you want the database to survive container restarts and image upgrades without losing data.

    version: "3.9"
    
    services:
      db:
        image: postgres:16
        restart: unless-stopped
        environment:
          POSTGRES_USER: infrarunbook-admin
          POSTGRES_PASSWORD: ${DB_PASSWORD}
          POSTGRES_DB: appdb
        volumes:
          - pg-data:/var/lib/postgresql/data
        networks:
          - backend
    
    volumes:
      pg-data:
        driver: local
    
    networks:
      backend:
        driver: bridge

    The named volume

    pg-data
    is declared at the bottom of the Compose file. Docker creates it on first run if it doesn't already exist. Bring the stack down with
    docker-compose down
    and back up — data is intact. Destroy and recreate the container — data is intact. Upgrade the Postgres image from 15 to 16 with the appropriate migration steps — data is intact. The volume persists independently of any container's lifecycle, which is exactly the contract you want.

    Nginx with Bind-Mounted Static Content

    For serving static content managed outside Docker — deployed by a CI pipeline or rsync'd from a build server — a bind mount makes more sense than a named volume because Docker shouldn't own that content's lifecycle.

    version: "3.9"
    
    services:
      web:
        image: nginx:stable-alpine
        restart: unless-stopped
        ports:
          - "192.168.10.100:80:80"
          - "192.168.10.100:443:443"
        volumes:
          - /srv/solvethenetwork/www:/usr/share/nginx/html:ro
          - /srv/solvethenetwork/nginx-conf:/etc/nginx/conf.d:ro
          - /srv/solvethenetwork/certs:/etc/nginx/certs:ro
          - nginx-logs:/var/log/nginx
    
    volumes:
      nginx-logs:

    Notice the deliberate mix: bind mounts for the content, config, and certs (managed by external processes and pipelines), and a named volume for logs (which Docker should manage and which nothing external needs to write to). The ports are also bound to a specific RFC 1918 address rather than all interfaces — a small but meaningful security practice when the host has multiple network interfaces exposed to different network segments.

    Inspecting and Backing Up Volumes

    One thing that trips people up: how do you back up a named volume? You can't just copy the

    _data
    directory while a database is actively writing to it and expect a consistent backup. The idiomatic Docker approach is a temporary container that mounts the volume and streams a tarball out:

    # Create a backup
    docker run --rm \
      -v pg-data:/data:ro \
      -v /backup:/backup \
      busybox \
      tar czf /backup/pg-data-$(date +%Y%m%d).tar.gz -C /data .
    
    # Restore from backup
    docker run --rm \
      -v pg-data:/data \
      -v /backup:/backup \
      busybox \
      tar xzf /backup/pg-data-20260401.tar.gz -C /data

    This works because you're using a throwaway container purely as a tool to access the volume. The

    busybox
    image is tiny and has everything you need for basic archiving. For databases specifically, you'll want to use the database's own dump tool (
    pg_dump
    ,
    mysqldump
    ) rather than raw filesystem copies to guarantee transactional consistency — but for migrating volume contents between hosts or doing a quick snapshot, the busybox pattern is clean and portable.

    When something goes wrong and you need to know where a volume actually lives on the host,

    docker volume inspect
    is your first call:

    docker volume inspect pg-data
    [
        {
            "CreatedAt": "2026-04-01T14:32:10Z",
            "Driver": "local",
            "Labels": {},
            "Mountpoint": "/var/lib/docker/volumes/pg-data/_data",
            "Name": "pg-data",
            "Options": {},
            "Scope": "local"
        }
    ]

    The

    Mountpoint
    field tells you exactly where the data lives. From there you can inspect permissions, verify data is being written, or run an emergency filesystem-level backup if the container itself is broken.

    Common Misconceptions

    Named Volumes Are Safer Than Bind Mounts

    This one's more nuanced than it sounds. Named volumes are more portable — you don't rely on a specific host path existing — and they're cleaner to reference in Compose files. But they're not inherently more durable or protected. Both live on the host filesystem. Both can be accidentally deleted. Both will be gone if the host's storage fails without off-host backups in place. A named volume is not a backup. It's not replicated. It's just a directory Docker knows about.

    Removing a Container Removes Its Volumes

    This misconception causes two opposite kinds of pain. Developers who expect cleanup to be automatic end up with gigabytes of orphaned volumes on their machines after a few months of active work. Run

    docker volume ls
    on a machine that's been used for development for a while — the list is almost always longer than anyone expects.

    On the flip side, operations engineers sometimes assume removing a container wipes its data, which would be catastrophic for a production database. By default,

    docker rm
    leaves all volumes intact. You have to explicitly pass
    -v
    or run
    docker volume rm
    separately. That's the correct default behavior, but it isn't obvious until someone is surprised by it in one direction or the other.

    To clean up dangling volumes — volumes not attached to any container — use:

    docker volume prune

    Don't run this blindly on a server hosting multiple services. A "dangling" volume might belong to a stopped container you intend to restart. On a dedicated development machine it's generally safe. On a shared host, check first.

    Volumes Solve the Multi-Host Problem

    Local volumes only exist on the host where they were created. If you're running multiple Docker hosts and expecting containers on different nodes to share a volume, local volumes won't work. You need either a volume driver backed by network storage (NFS, Ceph, a cloud block storage API), or you need to move to an orchestrator like Kubernetes where persistent volume claims abstract this properly.

    I've seen this surface specifically with Docker Swarm deployments. Someone defines a service with a named volume, scales it across three nodes, then wonders why the application only behaves consistently when the container lands on one particular node. The volume exists independently on each node — it's not replicated or shared. This is a fundamental architectural constraint, not a Docker bug, and understanding it early saves a lot of debugging time later.

    tmpfs Mounts Persist Across Restarts

    tmpfs mounts are in memory only. They don't survive container restarts, and they don't survive host reboots. They're appropriate for sensitive runtime data you explicitly want cleared — session tokens, decrypted secrets, scratch space for in-flight processing. Using tmpfs for application state you care about is a data loss incident waiting to happen, and it will happen at the worst possible time.


    Putting It Together

    The mental model that makes all of this click: think of Docker volumes as external disks you plug into containers. The disk exists independently of any container. Containers attach to it, detach from it, get replaced while the disk retains all its data. That disk can be local storage, NFS, a cloud block device, or whatever your volume driver supports — the container doesn't care.

    Pick named volumes when you want Docker to manage the storage location and lifecycle — databases, application state, generated artifacts. Pick bind mounts when you have existing host paths that need to be accessible inside a container — config files, static assets, logs that external tooling needs to consume. Pick tmpfs when you need fast ephemeral storage that must never touch disk.

    Get your volume strategy right early. Retrofitting persistent storage into an existing deployment is doable but tedious, and doing it while services are actively running in production adds unnecessary risk to what should be a non-event. Define your volumes explicitly in Compose files, document which ones are critical for backup, and periodically audit what's accumulating on your Docker hosts. Disk space surprises at 3am are avoidable with a few minutes of planning upfront.

    Frequently Asked Questions

    What is the difference between a Docker named volume and a bind mount?

    A named volume is created and managed by Docker, stored under /var/lib/docker/volumes/ on the host. A bind mount maps a specific host directory directly into the container using the kernel's bind mount mechanism. Named volumes are more portable and suitable for database data or application state. Bind mounts are better when an external process (like a CI pipeline or config management tool) owns the content and needs to manage it independently of Docker.

    Does removing a Docker container delete its volumes?

    No. By default, docker rm leaves all associated volumes intact. You must explicitly pass the -v flag to docker rm, or manually run docker volume rm, to delete a volume. This protects against accidental data loss but can lead to orphaned volumes accumulating on a host over time. Use docker volume prune to clean up volumes not attached to any container.

    How do I back up a Docker named volume?

    The recommended approach is to spin up a temporary container that mounts both the volume and a backup directory on the host, then run tar inside it to create an archive. For example: docker run --rm -v pg-data:/data:ro -v /backup:/backup busybox tar czf /backup/pg-data-backup.tar.gz -C /data . For databases, prefer using the database's native dump tool (pg_dump, mysqldump) to ensure a consistent backup rather than copying raw data files.

    Can Docker volumes be shared across multiple hosts?

    Not with the default local volume driver. Local volumes exist only on the host where they were created. To share storage across multiple Docker hosts, you need to use a volume driver that backs storage with a network filesystem such as NFS, Ceph, or a cloud block storage provider. Alternatively, use an orchestrator like Kubernetes, which has first-class support for shared persistent volumes via PersistentVolumeClaims.

    Why should databases use volumes instead of the container's writable layer?

    The container's writable layer uses a union filesystem (overlay2) which adds copy-on-write overhead to every write operation. For write-heavy workloads like databases, this overhead is measurable and can significantly degrade I/O performance. Mounting a volume — named or bind — bypasses the union filesystem and gives the container direct access to underlying storage, resulting in better throughput and lower latency for database operations.

    Related Articles