What Are Deployments and ReplicaSets?
If you've spent any time running workloads in Kubernetes, you've used a Deployment. It's arguably the most common workload primitive in the ecosystem — and yet I find that a surprisingly large number of engineers interact with Deployments every day without really understanding what's happening underneath. The ReplicaSet sitting below the Deployment is usually invisible, treated as an implementation detail not worth thinking about. That's a mistake.
A ReplicaSet is the controller responsible for ensuring that a specified number of Pod replicas are running at any given time. Give it a template, tell it you want three copies, and it will create and maintain exactly three Pods matching that template. If one dies, it spawns a replacement. If there are too many, it terminates the extras. Simple, declarative, ruthlessly reliable.
A Deployment wraps ReplicaSets and adds the ability to manage changes over time. It owns the lifecycle of one or more ReplicaSets and orchestrates rolling updates, rollbacks, and pauses. When you update a Deployment's Pod template, the Deployment controller doesn't modify the existing ReplicaSet — it creates a brand new one and gradually shifts load from the old to the new. The old ReplicaSet isn't deleted immediately; it's kept around with its replica count scaled to zero to support rollbacks.
Think of it this way: a ReplicaSet is a snapshot of your desired state at a point in time. A Deployment is the manager that decides which snapshot should be active and how to transition between them.
How It Works Under the Hood
Let's walk through what actually happens when you apply a Deployment manifest. Here's a typical Deployment for a web application:
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-frontend
namespace: production
labels:
app: web-frontend
team: platform
spec:
replicas: 3
selector:
matchLabels:
app: web-frontend
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
template:
metadata:
labels:
app: web-frontend
spec:
containers:
- name: frontend
image: registry.solvethenetwork.com/web-frontend:v1.4.2
ports:
- containerPort: 8080
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "500m"
readinessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
When you run
kubectl applyon this manifest, here's the chain of events. The API server accepts the Deployment object and stores it in etcd. The Deployment controller, which is part of
kube-controller-manager, is watching for Deployment events. It picks up the new object and computes the desired state: three Pods running the
web-frontend:v1.4.2image.
The Deployment controller creates a ReplicaSet with a name like
web-frontend-6d8f9b7c4— that hash suffix is derived from the Pod template spec. The ReplicaSet controller then takes over, creating three Pods to satisfy the replica count. The Pods get scheduled, pulled, and started. At this point you have one Deployment, one ReplicaSet, and three Pods.
Now let's update the image to
v1.5.0:
kubectl set image deployment/web-frontend \
frontend=registry.solvethenetwork.com/web-frontend:v1.5.0 \
-n production
The Deployment controller detects that the Pod template has changed. Because the Pod template hash will differ, it creates a new ReplicaSet — let's say
web-frontend-9c3a1e2b7. With
maxSurge: 1and
maxUnavailable: 0, the rollout proceeds like this: scale the new ReplicaSet up to 1 Pod, wait for it to become Ready, scale the old ReplicaSet down by 1. Repeat until the new ReplicaSet has 3 Pods and the old one has 0. The old ReplicaSet still exists in the cluster but sits dormant — this is your rollback anchor.
You can inspect both ReplicaSets directly:
kubectl get replicasets -n production -l app=web-frontend
NAME DESIRED CURRENT READY AGE
web-frontend-6d8f9b7c4 0 0 0 2d
web-frontend-9c3a1e2b7 3 3 3 5m
That older ReplicaSet is what makes
kubectl rollout undoso fast. Instead of re-pulling images and recreating everything from scratch, the Deployment controller swaps which ReplicaSet is active. The old image is already cached on your nodes. In my experience, a rollback completes in under 30 seconds for most workloads — which is exactly why you should practice rollbacks before you need them in an incident.
The Selector Immutability Problem
One thing that trips up engineers who are new to Deployments: the
spec.selectorfield is immutable after creation. Once you've defined which labels a Deployment uses to match its Pods, you cannot change them. I've seen this cause real pain during label refactoring efforts — someone wants to add a
versionlabel to the selector and suddenly
kubectl applyreturns a validation error. The only way out is to delete and recreate the Deployment, which means a gap in availability unless you plan carefully.
The reason this constraint exists goes back to the ReplicaSet ownership model. A ReplicaSet uses its selector to claim Pods. Changing the selector mid-flight would cause the controller to lose track of which Pods it owns, leading to orphaned Pods or runaway scaling. The immutability is a feature, not an oversight — it protects you from that class of chaos.
Why It Matters: Deployment Strategy Choices
Kubernetes gives you two built-in deployment strategies: RollingUpdate and Recreate. The default is RollingUpdate, and for most stateless services it's the right choice. Recreate scales the old ReplicaSet to zero before bringing up the new one, which means deliberate downtime. I only reach for Recreate when I'm dealing with a workload that absolutely cannot have two versions running simultaneously — say, a job that holds an exclusive database lock or a singleton process that would corrupt shared state if two copies ran concurrently.
The two knobs on RollingUpdate —
maxSurgeand
maxUnavailable— are worth tuning deliberately rather than accepting the defaults.
maxSurgecontrols how many extra Pods above the desired replica count can exist during the rollout.
maxUnavailablecontrols how many Pods can be out of service simultaneously. The default for both is 25%, which is usually fine. For latency-sensitive services I'll often set
maxUnavailable: 0to ensure no capacity is lost during the transition. For large fleets with hundreds of replicas, setting
maxSurge: 50%and
maxUnavailable: 25%can dramatically speed up rollouts at the cost of temporarily over-provisioning capacity.
Revision History and revisionHistoryLimit
By default, Kubernetes keeps the last 10 dormant ReplicaSets for each Deployment. That's 10 rollback points. You can tune this with
spec.revisionHistoryLimit. In practice, I usually drop this to 3 or 5 for clusters with many Deployments — keeping 10 old ReplicaSets around for every workload adds up in etcd storage and clutters
kubectl get rsoutput considerably.
spec:
revisionHistoryLimit: 3
replicas: 3
selector:
matchLabels:
app: web-frontend
Be careful going too low. Setting it to 0 means no rollback history at all. Setting it to 1 means you can roll back exactly one step — fine if you only care about undoing the most recent change, but blind to anything older. For anything running in production that you care about, 3 is a reasonable floor.
Real-World Example: Manual Canary with Multiple Deployments
Kubernetes Deployments don't natively support canary releases — that's what Argo Rollouts or Flagger are for. But you can approximate a canary pattern manually using two Deployments that share the same Service selector, and I've used this approach more times than I'd like to admit when I needed a quick-and-dirty canary without adding progressive delivery tooling to the stack.
The setup: you have a Service that selects Pods with the label
app: api-gateway. You have your stable Deployment with 9 replicas and a canary Deployment with 1 replica, both serving Pods that carry that label. The Service load-balances across all 10 Pods, so roughly 10% of traffic hits the canary version.
# Stable deployment - 9 replicas
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-gateway-stable
namespace: production
spec:
replicas: 9
selector:
matchLabels:
app: api-gateway
track: stable
template:
metadata:
labels:
app: api-gateway
track: stable
spec:
containers:
- name: api-gateway
image: registry.solvethenetwork.com/api-gateway:v2.8.1
---
# Canary deployment - 1 replica
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-gateway-canary
namespace: production
spec:
replicas: 1
selector:
matchLabels:
app: api-gateway
track: canary
template:
metadata:
labels:
app: api-gateway
track: canary
spec:
containers:
- name: api-gateway
image: registry.solvethenetwork.com/api-gateway:v2.9.0-rc1
# Service selects across both tracks via shared label
apiVersion: v1
kind: Service
metadata:
name: api-gateway
namespace: production
spec:
selector:
app: api-gateway
ports:
- port: 80
targetPort: 8080
This pattern is coarse-grained — your traffic split is controlled by replica counts, not by request headers or weights. But for batch services or internal APIs where a 10% replica-based split is acceptable, it works and it requires zero additional tooling. When you're ready to promote, scale the canary up to 9, scale the stable down to 0, then do a proper rollout on the stable Deployment to the new image and delete the canary.
Real-World Example: Pausing and Resuming a Rollout
One underused Deployment feature is the ability to pause a rollout mid-flight. This is genuinely useful for staged deployments where you want to push to a subset of Pods, check your observability dashboards, and then continue — all without additional tooling.
# Trigger the rollout
kubectl set image deployment/web-frontend \
frontend=registry.solvethenetwork.com/web-frontend:v1.5.0 \
-n production
# Immediately pause it
kubectl rollout pause deployment/web-frontend -n production
# Check the current state - some pods on new version, some on old
kubectl rollout status deployment/web-frontend -n production
# Check error rates and latency in your observability stack, then resume
kubectl rollout resume deployment/web-frontend -n production
During the pause, the Deployment controller stops creating new Pods on the new ReplicaSet. The old and new ReplicaSets coexist in whatever ratio they were at when you hit pause. This gives you a real traffic split while you validate behavior in production, without configuring anything special. I've used this to catch a bad release before it fully rolled out more than once.
Debugging Stuck Rollouts
A Deployment that's stuck mid-rollout is one of the most common support scenarios I deal with. The new Pods never become Ready, the rollout stalls, and everyone is watching the same terminal output waiting for something to change. The first thing to check is the Pod events:
kubectl describe pod -n production -l app=web-frontend | grep -A 20 Events
In my experience it's usually one of four things: a failing readiness probe (the app is starting but reporting unhealthy), an image pull error (wrong registry credentials or a tag that doesn't exist), a resource quota exhaustion (the namespace is out of CPU and new Pods can't be scheduled), or a misconfigured liveness probe that's killing the Pod before it finishes initializing. The events output will tell you which one almost immediately.
For rollout-level context, the Deployment's status conditions are the canonical source of truth:
kubectl get deployment web-frontend -n production -o yaml | grep -A 30 "status:"
The
Progressingcondition shows whether the rollout is actively moving forward and includes a human-readable reason string when it's stalled. The
Availablecondition reports how many replicas are Ready and serving. If
Progressingcarries reason
ProgressDeadlineExceeded, the Deployment has given up waiting — check
spec.progressDeadlineSeconds, which defaults to 600 seconds. I lower this to 120 or 180 for most services so that a stalled rollout fails loudly and fast rather than sitting in limbo for ten minutes while on-call waits.
spec:
minReadySeconds: 30
progressDeadlineSeconds: 180
replicas: 3
selector:
matchLabels:
app: web-frontend
The HPA and Deployment Interaction
Attaching a HorizontalPodAutoscaler to a Deployment is the standard pattern for autoscaling stateless workloads. The HPA watches metrics — CPU utilization, memory, or custom metrics via the Metrics API — and adjusts
spec.replicason the target Deployment. The Deployment controller propagates that change to the active ReplicaSet. During a rollout, the HPA detaches from the old ReplicaSet and attaches to the new one seamlessly.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-frontend-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-frontend
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
There's a subtle drift problem that catches people off guard. If you have
spec.replicashardcoded in your Deployment manifest and you're also running an HPA, every
kubectl applyon that manifest resets the replica count to whatever is in the file — potentially overriding what the HPA set based on real traffic load. The solution is to remove
spec.replicasfrom your Deployment manifest entirely when an HPA is managing it, or to use server-side apply with field management so the HPA's writes don't get overwritten by your CI pipeline's
kubectl apply.
Common Misconceptions
The biggest one I hear: "A Deployment manages Pods directly." It doesn't. A Deployment manages ReplicaSets. ReplicaSets manage Pods. This indirect relationship matters when you're debugging — if you delete a ReplicaSet that's owned by a Deployment, the Deployment controller will just recreate it immediately. You can't permanently remove an active ReplicaSet without deleting the Deployment or adjusting the revision history.
Second: "Deleting a Pod created by a ReplicaSet deletes it permanently." The ReplicaSet controller's entire job is to maintain the desired replica count. Delete one of its Pods and it creates a replacement within seconds. The only way to permanently reduce the Pod count is to scale the Deployment down.
Third — and this one has caused real incidents — is the belief that rolling back a Deployment means re-deploying the old image. It doesn't. It means activating the old ReplicaSet, which still has its spec baked in and whose image is almost certainly already cached on the nodes. Rollbacks are fast precisely because no image pulling is required. I've seen engineers budget five minutes for a rollback during an incident, only to discover it completes in under 30 seconds. That's a good surprise, but you should know it ahead of time.
Fourth: the assumption that the Deployment's
spec.replicasis always authoritative. As covered above, when an HPA is attached, the HPA owns that field. Write your manifests accordingly — define your scaling boundaries in the HPA, not as a hardcoded replica count in the Deployment.
Deployments and ReplicaSets together form the backbone of how Kubernetes manages stateless workloads. Understanding their relationship — not just how to write a Deployment YAML, but why the two-layer abstraction exists and how the controllers interact — is what separates engineers who can operate Kubernetes clusters confidently from those who are perpetually surprised by what happens when they run
kubectl apply. The ReplicaSet isn't just an implementation detail to scroll past. It's the durable record of every version of your workload that Kubernetes keeps on hand, waiting to be activated the moment you need it.
