What Docker Compose Is
Docker Compose is a tool for defining and running multi-container Docker applications. Instead of manually starting each container with a long
docker runcommand and managing the network links between them yourself, you describe the entire application stack in a single YAML file — typically named
docker-compose.yml— and bring it all up with one command.
In my experience, the moment teams stop managing containers by hand and commit to a Compose file, their deployment reliability jumps significantly. It's not magic. It's just that Compose forces you to be explicit about your dependencies, networks, and volumes in one place. That explicitness is what makes a stack reproducible.
At its core, Compose is a declarative layer on top of the Docker Engine API. It reads your YAML definition, figures out what needs to be created — networks, volumes, containers — and calls the appropriate Docker APIs in the right order. The tool ships with Docker Desktop and is also available as a standalone plugin (
docker compose) since Compose V2 replaced the original Python-based
docker-composebinary. If you're still using the hyphenated version, migrate. The V2 plugin is faster, actively maintained, and is what the rest of the ecosystem assumes.
How Docker Compose Works
The Compose File Structure
The
docker-compose.ymlfile is the single source of truth for your stack. It has three top-level keys that matter most: services, networks, and volumes. Services define the containers. Networks wire them together. Volumes give them persistent storage.
Here's a realistic example — a web application stack running on sw-infrarunbook-01 with a Python Flask frontend, a PostgreSQL database, and an Nginx reverse proxy:
version: "3.9"
services:
nginx:
image: nginx:1.25-alpine
container_name: nginx-proxy
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
- ./nginx/certs:/etc/nginx/certs:ro
depends_on:
- web
networks:
- frontend
web:
build:
context: ./app
dockerfile: Dockerfile
container_name: flask-web
environment:
- FLASK_ENV=production
- DATABASE_URL=postgresql://infrarunbook-admin:securepass@db:5432/appdb
depends_on:
db:
condition: service_healthy
networks:
- frontend
- backend
restart: unless-stopped
db:
image: postgres:15-alpine
container_name: postgres-db
environment:
POSTGRES_USER: infrarunbook-admin
POSTGRES_PASSWORD: securepass
POSTGRES_DB: appdb
volumes:
- pgdata:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U infrarunbook-admin -d appdb"]
interval: 10s
timeout: 5s
retries: 5
networks:
- backend
networks:
frontend:
driver: bridge
backend:
driver: bridge
internal: true
volumes:
pgdata:
driver: local
Notice a few deliberate choices here. The
backendnetwork is marked
internal: true, which means containers on that network have no outbound internet access. The database is only reachable from inside the stack — it can't be reached from outside, and it can't initiate outbound connections. That's a solid default security posture for a database tier and costs you nothing to implement upfront.
Service Discovery and DNS
One of the most useful things Compose gives you out of the box is automatic DNS resolution between services. Within the same Compose project, a service named
dbis reachable by other services on that network simply as
db. You don't have to hard-code IP addresses, look up container IDs, or fiddle with
/etc/hostsfiles.
Docker implements this through an embedded DNS server running at 127.0.0.11 inside every container. When the Flask app resolves
db:5432, Docker's internal DNS intercepts that query and returns the current IP of the
dbcontainer. If a container is restarted and gets a new internal IP, service discovery keeps working transparently — you never need to update a connection string.
The Dependency and Health Check Model
The
depends_onkey controls startup order. But there's an important distinction that trips people up constantly: depends_on only waits for a container to start, not for the service inside it to be ready. That's why the example above uses
condition: service_healthycombined with a
healthcheckblock on the database. Without this, Flask will try to connect to PostgreSQL before Postgres has finished its initialization — and it'll fail on startup.
I've seen this exact issue cause flapping deployments in environments where the database takes more than a couple of seconds to be ready. The symptoms are confusing: the app container exits with a connection error, Docker restarts it, sometimes it races past the initialization window and works, sometimes it doesn't. The fix is always the same — define a real healthcheck on the dependency and use
condition: service_healthy.
Volumes and State Persistence
Volumes are how you keep data alive across container restarts. In the example above,
pgdatais a named volume managed by Docker. Even if you run
docker compose down, the volume persists on disk. Only
docker compose down -vremoves it. Bind mounts, by contrast, map a host directory directly into the container — useful for config files and TLS certificates where you want the host to own the files, but you manage the directory lifecycle yourself. Both have their place; the distinction is ownership.
Why This Architecture Matters
The real value of multi-container architecture isn't just container isolation — it's the separation of concerns that isolation enforces. Your web tier, your database tier, and your reverse proxy have completely different scaling characteristics, different restart policies, different secrets, and different upgrade cadences. Packaging all of that into a single container is a pattern I'd encourage you to avoid. It creates fat, opaque images that are hard to debug and impossible to scale selectively.
With Compose, you can scale the web tier independently while leaving the database untouched. You can rebuild and replace the application image without touching the Nginx config. You can roll out a certificate rotation on Nginx without restarting Flask. Each service becomes an independently operable unit — and the Compose file is living documentation of how they're supposed to connect. That documentation value alone is worth the investment.
For teams running on a single host — a small VPS, a dedicated bare-metal box at 10.10.10.50 — Compose is often exactly the right level of orchestration complexity. Kubernetes solves different problems at significantly higher operational cost. Don't reach for Kubernetes just because it sounds more serious. If your traffic fits on one host and your team is small, Compose on a well-configured VM is a defensible and maintainable production choice. Match the tool to the actual problem in front of you.
Real-World Examples
Adding a Redis Cache Service
Let's say you need to add Redis for session caching. Here's how you extend the existing stack with a new service entry:
redis:
image: redis:7-alpine
container_name: redis-cache
command: redis-server --requirepass redissecret --maxmemory 256mb --maxmemory-policy allkeys-lru
volumes:
- redisdata:/data
networks:
- backend
restart: unless-stopped
healthcheck:
test: ["CMD", "redis-cli", "--auth", "redissecret", "ping"]
interval: 10s
timeout: 3s
retries: 3
You'd add
redisdatato the top-level volumes block, add
rediswith
condition: service_healthyto the web service's
depends_on, and update the Flask environment to include
REDIS_URL=redis://:redissecret@redis:6379/0. Three targeted changes, and you've added a caching layer to a running stack. That's the composability this architecture is designed for.
Running a Background Worker
A common pattern is running a Celery worker alongside a Flask application. Both services share the same application image — they just run different commands. This is one of those patterns that looks obvious in hindsight but takes a while to arrive at organically.
worker:
build:
context: ./app
dockerfile: Dockerfile
container_name: celery-worker
command: celery -A app.celery worker --loglevel=info --concurrency=4
environment:
- DATABASE_URL=postgresql://infrarunbook-admin:securepass@db:5432/appdb
- REDIS_URL=redis://:redissecret@redis:6379/0
depends_on:
db:
condition: service_healthy
redis:
condition: service_healthy
networks:
- backend
restart: unless-stopped
The worker image stays in sync with the web image automatically — they're built from the same Dockerfile. You're not maintaining separate codebases for what is essentially the same application running in a different mode. When you run
docker compose up -d --build, both get rebuilt from the same context.
Environment-Specific Overrides
Compose supports override files, which is how you handle environment differences without duplicating your entire stack definition. You maintain a base
docker-compose.ymland layer environment-specific settings on top with a named file you pass explicitly at invocation time.
# docker-compose.dev.yml — development overrides
version: "3.9"
services:
web:
environment:
- FLASK_ENV=development
- FLASK_DEBUG=1
volumes:
- ./app:/usr/src/app # live code reload
ports:
- "5000:5000" # expose Flask directly in dev
db:
ports:
- "5432:5432" # expose Postgres for local tooling
You invoke this with
docker compose -f docker-compose.yml -f docker-compose.dev.yml up. In production, you just run
docker compose up -dwithout the dev overlay. The base file never contains dev shortcuts like exposed database ports or live code mounts. That discipline prevents dev misconfigurations from accidentally reaching production — a failure mode I've had to explain to teams more than once.
Pinning Network Subnets on sw-infrarunbook-01
On sw-infrarunbook-01, I run several Compose projects alongside other services that use the 10.10.0.0/16 range. Docker's default behavior is to allocate bridge subnets from 172.16.0.0/12 automatically, but after a certain number of Compose projects, you'll hit subnet exhaustion or routing conflicts. You can lock a specific subnet for any network:
networks:
backend:
driver: bridge
internal: true
ipam:
driver: default
config:
- subnet: 172.28.0.0/24
gateway: 172.28.0.1
Pinning subnets explicitly is the right call in production environments where you care about network determinism. It also makes firewall rules predictable — you know exactly what CIDR to reference in your iptables rules without having to inspect Docker's runtime state.
Common Misconceptions
"Docker Compose is not for production"
This is the one I push back on most often. The origin of this belief is a combination of old documentation and Kubernetes marketing. Docker Compose is used in production — by small teams, startups, and infrastructure engineers who understand what they're deploying. It lacks automatic failover, cross-host scheduling, and rolling update orchestration out of the box. Those are real gaps. But if your application runs happily on a single host with
restart: unless-stoppedpolicies and a process supervisor like systemd watching Docker itself, Compose is a perfectly legitimate production tool. Know what you need before you rule something out.
"depends_on guarantees the service is ready"
Already touched on this, but it deserves its own call-out because it's such a reliable source of late-night incidents.
depends_onwithout a
conditiononly waits for the container process to start — not for the application inside it to be accepting connections. Always pair it with a meaningful healthcheck on the dependency if you need startup-ordering guarantees. A healthcheck that just checks if a port is open is better than nothing, but a healthcheck that runs an actual readiness probe is better still.
"All containers in a Compose project share the same network"
By default, Compose creates a single default network and attaches all services to it. Every service can talk to every other service unless you explicitly segment them. This is convenient for small stacks but becomes a security liability as your stack grows. In my experience, it's better to define your networks explicitly from the start — a frontend network for public-facing services, a backend network for internal services, and mark database-tier networks as
internal: true. Retrofitting network segmentation onto an existing Compose file is messy work; getting it right from the beginning takes five minutes.
"Rebuilding an image automatically updates a running container"
Running
docker compose buildrebuilds the image locally, but it does not restart or recreate the container. You need
docker compose up -d --buildto rebuild and recreate affected containers in one step. There's also a subtlety worth knowing: when you run
docker compose up, Compose compares the current container's configuration against what's in the Compose file. If nothing relevant has changed, it won't recreate the container even if the underlying image has changed on disk. This preserves container state intentionally, but it surprises engineers who expect an image rebuild to automatically roll out.
"Named volumes are backed up automatically"
They're not. A named volume is just a directory on the Docker host under
/var/lib/docker/volumes/. It persists across container restarts and even
docker compose down, but it has no backup, no replication, and no snapshot capability built in. For anything stateful in production — your Postgres data volume, your Redis persistence data — you need to implement a backup strategy yourself. That might mean a cron job on sw-infrarunbook-01 running
pg_dumpinto a mounted backup directory, or a dedicated backup container in your Compose stack that handles it on a schedule. Volume persistence does not equal data safety. Never conflate the two.
Docker Compose is one of those tools that rewards the time you invest in understanding its internals. The YAML syntax is approachable, but how networks, volumes, healthchecks, and service dependencies interact is where the real depth lives. Get those right and you'll have a stack that's reliable, reproducible, and easy for teammates to reason about. Get them wrong and you'll be chasing startup race conditions and mysterious network failures at 2am — which is about as fun as it sounds.
Start with explicit networks. Always healthcheck your stateful services. Use override files for environment-specific configuration. Keep secrets out of your Compose files —
.envfiles or Docker secrets are the right pattern for credentials. None of this is complicated, but it all requires deliberate choices up front. The stacks that hold up under pressure are the ones where those choices were made before the first deployment, not in response to the first outage.
