Kubernetes Deployments and ReplicaSets...

What Are Deployments and ReplicaSets?

If you've spent any time running workloads in Kubernetes, you've used a Deployment. It's arguably the most common workload primitive in the ecosystem — and yet I find that a surprisingly large number of engineers interact with Deployments every day without really understanding what's happening underneath. The ReplicaSet sitting below the Deployment is usually invisible, treated as an implementation detail not worth thinking about. That's a mistake.

A ReplicaSet is the controller responsible for ensuring that a specified number of Pod replicas are running at any given time. Give it a template, tell it you want three copies, and it will create and maintain exactly three Pods matching that template. If one dies, it spawns a replacement. If there are too many, it terminates the extras. Simple, declarative, ruthlessly reliable.

A Deployment wraps ReplicaSets and adds the ability to manage changes over time. It owns the lifecycle of one or more ReplicaSets and orchestrates rolling updates, rollbacks, and pauses. When you update a Deployment's Pod template, the Deployment controller doesn't modify the existing ReplicaSet — it creates a brand new one and gradually shifts load from the old to the new. The old ReplicaSet isn't deleted immediately; it's kept around with its replica count scaled to zero to support rollbacks.

Think of it this way: a ReplicaSet is a snapshot of your desired state at a point in time. A Deployment is the manager that decides which snapshot should be active and how to transition between them.

How It Works Under the Hood

Let's walk through what actually happens when you apply a Deployment manifest. Here's a typical Deployment for a web application:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-frontend
  namespace: production
  labels:
    app: web-frontend
    team: platform
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-frontend
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  template:
    metadata:
      labels:
        app: web-frontend
    spec:
      containers:
      - name: frontend
        image: registry.solvethenetwork.com/web-frontend:v1.4.2
        ports:
        - containerPort: 8080
        resources:
          requests:
            memory: "128Mi"
            cpu: "100m"
          limits:
            memory: "256Mi"
            cpu: "500m"
        readinessProbe:
          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 10

When you run

kubectl apply

on this manifest, here's the chain of events. The API server accepts the Deployment object and stores it in etcd. The Deployment controller, which is part of

kube-controller-manager

, is watching for Deployment events. It picks up the new object and computes the desired state: three Pods running the

web-frontend:v1.4.2

image.

The Deployment controller creates a ReplicaSet with a name like

web-frontend-6d8f9b7c4

— that hash suffix is derived from the Pod template spec. The ReplicaSet controller then takes over, creating three Pods to satisfy the replica count. The Pods get scheduled, pulled, and started. At this point you have one Deployment, one ReplicaSet, and three Pods.

Now let's update the image to

v1.5.0

kubectl set image deployment/web-frontend \
  frontend=registry.solvethenetwork.com/web-frontend:v1.5.0 \
  -n production

The Deployment controller detects that the Pod template has changed. Because the Pod template hash will differ, it creates a new ReplicaSet — let's say

web-frontend-9c3a1e2b7

. With

maxSurge: 1

and

maxUnavailable: 0

, the rollout proceeds like this: scale the new ReplicaSet up to 1 Pod, wait for it to become Ready, scale the old ReplicaSet down by 1. Repeat until the new ReplicaSet has 3 Pods and the old one has 0. The old ReplicaSet still exists in the cluster but sits dormant — this is your rollback anchor.

You can inspect both ReplicaSets directly:

kubectl get replicasets -n production -l app=web-frontend

NAME                         DESIRED   CURRENT   READY   AGE
web-frontend-6d8f9b7c4       0         0         0       2d
web-frontend-9c3a1e2b7       3         3         3       5m

That older ReplicaSet is what makes

kubectl rollout undo

so fast. Instead of re-pulling images and recreating everything from scratch, the Deployment controller swaps which ReplicaSet is active. The old image is already cached on your nodes. In my experience, a rollback completes in under 30 seconds for most workloads — which is exactly why you should practice rollbacks before you need them in an incident.

The Selector Immutability Problem

One thing that trips up engineers who are new to Deployments: the

spec.selector

field is immutable after creation. Once you've defined which labels a Deployment uses to match its Pods, you cannot change them. I've seen this cause real pain during label refactoring efforts — someone wants to add a

version

label to the selector and suddenly

kubectl apply

returns a validation error. The only way out is to delete and recreate the Deployment, which means a gap in availability unless you plan carefully.

The reason this constraint exists goes back to the ReplicaSet ownership model. A ReplicaSet uses its selector to claim Pods. Changing the selector mid-flight would cause the controller to lose track of which Pods it owns, leading to orphaned Pods or runaway scaling. The immutability is a feature, not an oversight — it protects you from that class of chaos.

Why It Matters: Deployment Strategy Choices

Kubernetes gives you two built-in deployment strategies: RollingUpdate and Recreate. The default is RollingUpdate, and for most stateless services it's the right choice. Recreate scales the old ReplicaSet to zero before bringing up the new one, which means deliberate downtime. I only reach for Recreate when I'm dealing with a workload that absolutely cannot have two versions running simultaneously — say, a job that holds an exclusive database lock or a singleton process that would corrupt shared state if two copies ran concurrently.

The two knobs on RollingUpdate —

maxSurge

and

maxUnavailable

— are worth tuning deliberately rather than accepting the defaults.

maxSurge

controls how many extra Pods above the desired replica count can exist during the rollout.

maxUnavailable

controls how many Pods can be out of service simultaneously. The default for both is 25%, which is usually fine. For latency-sensitive services I'll often set

maxUnavailable: 0

to ensure no capacity is lost during the transition. For large fleets with hundreds of replicas, setting

maxSurge: 50%

and

maxUnavailable: 25%

can dramatically speed up rollouts at the cost of temporarily over-provisioning capacity.

Revision History and revisionHistoryLimit

By default, Kubernetes keeps the last 10 dormant ReplicaSets for each Deployment. That's 10 rollback points. You can tune this with

spec.revisionHistoryLimit

. In practice, I usually drop this to 3 or 5 for clusters with many Deployments — keeping 10 old ReplicaSets around for every workload adds up in etcd storage and clutters

kubectl get rs

output considerably.

spec:
  revisionHistoryLimit: 3
  replicas: 3
  selector:
    matchLabels:
      app: web-frontend

Be careful going too low. Setting it to 0 means no rollback history at all. Setting it to 1 means you can roll back exactly one step — fine if you only care about undoing the most recent change, but blind to anything older. For anything running in production that you care about, 3 is a reasonable floor.

Real-World Example: Manual Canary with Multiple Deployments

Kubernetes Deployments don't natively support canary releases — that's what Argo Rollouts or Flagger are for. But you can approximate a canary pattern manually using two Deployments that share the same Service selector, and I've used this approach more times than I'd like to admit when I needed a quick-and-dirty canary without adding progressive delivery tooling to the stack.

The setup: you have a Service that selects Pods with the label

app: api-gateway

. You have your stable Deployment with 9 replicas and a canary Deployment with 1 replica, both serving Pods that carry that label. The Service load-balances across all 10 Pods, so roughly 10% of traffic hits the canary version.

# Stable deployment - 9 replicas
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-gateway-stable
  namespace: production
spec:
  replicas: 9
  selector:
    matchLabels:
      app: api-gateway
      track: stable
  template:
    metadata:
      labels:
        app: api-gateway
        track: stable
    spec:
      containers:
      - name: api-gateway
        image: registry.solvethenetwork.com/api-gateway:v2.8.1
---
# Canary deployment - 1 replica
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-gateway-canary
  namespace: production
spec:
  replicas: 1
  selector:
    matchLabels:
      app: api-gateway
      track: canary
  template:
    metadata:
      labels:
        app: api-gateway
        track: canary
    spec:
      containers:
      - name: api-gateway
        image: registry.solvethenetwork.com/api-gateway:v2.9.0-rc1

# Service selects across both tracks via shared label
apiVersion: v1
kind: Service
metadata:
  name: api-gateway
  namespace: production
spec:
  selector:
    app: api-gateway
  ports:
  - port: 80
    targetPort: 8080

This pattern is coarse-grained — your traffic split is controlled by replica counts, not by request headers or weights. But for batch services or internal APIs where a 10% replica-based split is acceptable, it works and it requires zero additional tooling. When you're ready to promote, scale the canary up to 9, scale the stable down to 0, then do a proper rollout on the stable Deployment to the new image and delete the canary.

Real-World Example: Pausing and Resuming a Rollout

One underused Deployment feature is the ability to pause a rollout mid-flight. This is genuinely useful for staged deployments where you want to push to a subset of Pods, check your observability dashboards, and then continue — all without additional tooling.

# Trigger the rollout
kubectl set image deployment/web-frontend \
  frontend=registry.solvethenetwork.com/web-frontend:v1.5.0 \
  -n production

# Immediately pause it
kubectl rollout pause deployment/web-frontend -n production

# Check the current state - some pods on new version, some on old
kubectl rollout status deployment/web-frontend -n production

# Check error rates and latency in your observability stack, then resume
kubectl rollout resume deployment/web-frontend -n production

During the pause, the Deployment controller stops creating new Pods on the new ReplicaSet. The old and new ReplicaSets coexist in whatever ratio they were at when you hit pause. This gives you a real traffic split while you validate behavior in production, without configuring anything special. I've used this to catch a bad release before it fully rolled out more than once.

Debugging Stuck Rollouts

A Deployment that's stuck mid-rollout is one of the most common support scenarios I deal with. The new Pods never become Ready, the rollout stalls, and everyone is watching the same terminal output waiting for something to change. The first thing to check is the Pod events:

kubectl describe pod -n production -l app=web-frontend | grep -A 20 Events

In my experience it's usually one of four things: a failing readiness probe (the app is starting but reporting unhealthy), an image pull error (wrong registry credentials or a tag that doesn't exist), a resource quota exhaustion (the namespace is out of CPU and new Pods can't be scheduled), or a misconfigured liveness probe that's killing the Pod before it finishes initializing. The events output will tell you which one almost immediately.

For rollout-level context, the Deployment's status conditions are the canonical source of truth:

kubectl get deployment web-frontend -n production -o yaml | grep -A 30 "status:"

The

Progressing

condition shows whether the rollout is actively moving forward and includes a human-readable reason string when it's stalled. The

Available

condition reports how many replicas are Ready and serving. If

Progressing

carries reason

ProgressDeadlineExceeded

, the Deployment has given up waiting — check

spec.progressDeadlineSeconds

, which defaults to 600 seconds. I lower this to 120 or 180 for most services so that a stalled rollout fails loudly and fast rather than sitting in limbo for ten minutes while on-call waits.

spec:
  minReadySeconds: 30
  progressDeadlineSeconds: 180
  replicas: 3
  selector:
    matchLabels:
      app: web-frontend

The HPA and Deployment Interaction

Attaching a HorizontalPodAutoscaler to a Deployment is the standard pattern for autoscaling stateless workloads. The HPA watches metrics — CPU utilization, memory, or custom metrics via the Metrics API — and adjusts

spec.replicas

on the target Deployment. The Deployment controller propagates that change to the active ReplicaSet. During a rollout, the HPA detaches from the old ReplicaSet and attaches to the new one seamlessly.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-frontend-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-frontend
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

There's a subtle drift problem that catches people off guard. If you have

spec.replicas

hardcoded in your Deployment manifest and you're also running an HPA, every

kubectl apply

on that manifest resets the replica count to whatever is in the file — potentially overriding what the HPA set based on real traffic load. The solution is to remove

spec.replicas

from your Deployment manifest entirely when an HPA is managing it, or to use server-side apply with field management so the HPA's writes don't get overwritten by your CI pipeline's

kubectl apply

Common Misconceptions

The biggest one I hear: "A Deployment manages Pods directly." It doesn't. A Deployment manages ReplicaSets. ReplicaSets manage Pods. This indirect relationship matters when you're debugging — if you delete a ReplicaSet that's owned by a Deployment, the Deployment controller will just recreate it immediately. You can't permanently remove an active ReplicaSet without deleting the Deployment or adjusting the revision history.

Second: "Deleting a Pod created by a ReplicaSet deletes it permanently." The ReplicaSet controller's entire job is to maintain the desired replica count. Delete one of its Pods and it creates a replacement within seconds. The only way to permanently reduce the Pod count is to scale the Deployment down.

Third — and this one has caused real incidents — is the belief that rolling back a Deployment means re-deploying the old image. It doesn't. It means activating the old ReplicaSet, which still has its spec baked in and whose image is almost certainly already cached on the nodes. Rollbacks are fast precisely because no image pulling is required. I've seen engineers budget five minutes for a rollback during an incident, only to discover it completes in under 30 seconds. That's a good surprise, but you should know it ahead of time.

Fourth: the assumption that the Deployment's

spec.replicas

is always authoritative. As covered above, when an HPA is attached, the HPA owns that field. Write your manifests accordingly — define your scaling boundaries in the HPA, not as a hardcoded replica count in the Deployment.

Deployments and ReplicaSets together form the backbone of how Kubernetes manages stateless workloads. Understanding their relationship — not just how to write a Deployment YAML, but why the two-layer abstraction exists and how the controllers interact — is what separates engineers who can operate Kubernetes clusters confidently from those who are perpetually surprised by what happens when they run

kubectl apply

. The ReplicaSet isn't just an implementation detail to scroll past. It's the durable record of every version of your workload that Kubernetes keeps on hand, waiting to be activated the moment you need it.

Kubernetes Deployments and ReplicaSets Explained

What Are Deployments and ReplicaSets?

How It Works Under the Hood

The Selector Immutability Problem

Why It Matters: Deployment Strategy Choices

Revision History and revisionHistoryLimit

Real-World Example: Manual Canary with Multiple Deployments

Real-World Example: Pausing and Resuming a Rollout

Debugging Stuck Rollouts

The HPA and Deployment Interaction

Common Misconceptions

Related Articles

Frequently Asked Questions

What is the difference between a Deployment and a ReplicaSet in Kubernetes?

Why does Kubernetes keep old ReplicaSets after a Deployment update?

What happens if I delete a Pod managed by a ReplicaSet?

Can I change the selector of a Kubernetes Deployment after it has been created?

What is maxSurge and maxUnavailable in a Kubernetes Deployment?

How does a HorizontalPodAutoscaler interact with a Deployment?

Related Articles