Prerequisites
Before you touch a single subnet CIDR, make sure you have the right access in place. You'll need an AWS IAM user or role with permissions to create VPC resources — at minimum
ec2:*scoped to the relevant actions, or the AWS-managed
AdministratorAccesspolicy if this is a lab or dev account. You'll also need the AWS CLI installed and configured with a named profile. I always work with named profiles rather than the default credential chain, especially across multi-account environments.
Configure your profile now if you haven't:
aws configure --profile infrarunbook-admin
AWS Access Key ID [None]: AKIA...
AWS Secret Access Key [None]: ****
Default region name [None]: us-east-1
Default output format [None]: json
The most important decision you'll make before running a single command is your VPC CIDR block. You can't change the primary CIDR after creation without tearing down and rebuilding the VPC. For this guide we'll use 10.10.0.0/16 — 65,536 addresses, more than enough headroom to carve out subnets per availability zone for compute, databases, and managed services. Make sure this range doesn't conflict with any on-premises networks or other VPCs you might peer with later. In my experience, CIDR overlap is one of the most time-consuming problems to diagnose after the fact because connectivity breaks in non-obvious ways depending on route precedence.
You'll also need Terraform installed (v1.5+) if you want to follow the IaC path. This guide covers both the CLI approach — because understanding the underlying API calls builds intuition — and a Terraform configuration you can actually commit to a repo.
Step-by-Step Setup
1. Create the VPC
Start by creating the VPC itself. Tag it properly from the start. Tags are how you'll find this thing in cost explorer, CloudTrail, and Config rules six months from now.
aws ec2 create-vpc \
--cidr-block 10.10.0.0/16 \
--tag-specifications 'ResourceType=vpc,Tags=[{Key=Name,Value=infrarunbook-vpc},{Key=Environment,Value=production}]' \
--profile infrarunbook-admin \
--region us-east-1
Pull the
VpcIdfrom the JSON output and store it as a shell variable immediately. You'll reference it in every command that follows.
VPC_ID=vpc-0abc1234def56789a
Now enable DNS hostname support on the VPC. Without this, instances won't resolve each other by name, and several AWS managed services — RDS, EFS, PrivateLink endpoints — behave incorrectly or fail entirely.
aws ec2 modify-vpc-attribute \
--vpc-id $VPC_ID \
--enable-dns-hostnames \
--profile infrarunbook-admin
aws ec2 modify-vpc-attribute \
--vpc-id $VPC_ID \
--enable-dns-support \
--profile infrarunbook-admin
2. Create and Attach the Internet Gateway
The Internet Gateway is the on-ramp and off-ramp between your public subnets and the internet. There's no state to manage here and no bandwidth constraints — it scales automatically. Create it and attach it to the VPC before you do anything else.
aws ec2 create-internet-gateway \
--tag-specifications 'ResourceType=internet-gateway,Tags=[{Key=Name,Value=infrarunbook-igw}]' \
--profile infrarunbook-admin
IGW_ID=igw-0abc1234def56789b
aws ec2 attach-internet-gateway \
--internet-gateway-id $IGW_ID \
--vpc-id $VPC_ID \
--profile infrarunbook-admin
3. Create the Subnets
We're building across two availability zones — us-east-1a and us-east-1b. Each AZ gets one public subnet and one private subnet, giving you redundancy for most workloads without the overhead of managing a third AZ.
Here's the addressing layout:
- Public subnet A: 10.10.1.0/24 — us-east-1a
- Public subnet B: 10.10.2.0/24 — us-east-1b
- Private subnet A: 10.10.10.0/24 — us-east-1a
- Private subnet B: 10.10.20.0/24 — us-east-1b
Public subnets use low third-octet values (1, 2) and private subnets use higher ones (10, 20). It's a simple convention, but when you're reading a security group rule or a route at 2am it matters — you should know immediately whether an address is public-facing or internal just by looking at it.
# Public subnets
aws ec2 create-subnet \
--vpc-id $VPC_ID \
--cidr-block 10.10.1.0/24 \
--availability-zone us-east-1a \
--tag-specifications 'ResourceType=subnet,Tags=[{Key=Name,Value=infrarunbook-public-1a},{Key=Tier,Value=public}]' \
--profile infrarunbook-admin
aws ec2 create-subnet \
--vpc-id $VPC_ID \
--cidr-block 10.10.2.0/24 \
--availability-zone us-east-1b \
--tag-specifications 'ResourceType=subnet,Tags=[{Key=Name,Value=infrarunbook-public-1b},{Key=Tier,Value=public}]' \
--profile infrarunbook-admin
# Private subnets
aws ec2 create-subnet \
--vpc-id $VPC_ID \
--cidr-block 10.10.10.0/24 \
--availability-zone us-east-1a \
--tag-specifications 'ResourceType=subnet,Tags=[{Key=Name,Value=infrarunbook-private-1a},{Key=Tier,Value=private}]' \
--profile infrarunbook-admin
aws ec2 create-subnet \
--vpc-id $VPC_ID \
--cidr-block 10.10.20.0/24 \
--availability-zone us-east-1b \
--tag-specifications 'ResourceType=subnet,Tags=[{Key=Name,Value=infrarunbook-private-1b},{Key=Tier,Value=private}]' \
--profile infrarunbook-admin
Capture the subnet IDs and store them:
PUB_SUBNET_1A=subnet-0pub1111aaaa
PUB_SUBNET_1B=subnet-0pub2222bbbb
PRIV_SUBNET_1A=subnet-0priv3333cccc
PRIV_SUBNET_1B=subnet-0priv4444dddd
Enable auto-assign public IPs on the public subnets. This ensures any instance launched into these subnets — your bastion hosts, load balancers, NAT gateways — automatically gets a routable IP without you having to request one manually at launch time.
aws ec2 modify-subnet-attribute \
--subnet-id $PUB_SUBNET_1A \
--map-public-ip-on-launch \
--profile infrarunbook-admin
aws ec2 modify-subnet-attribute \
--subnet-id $PUB_SUBNET_1B \
--map-public-ip-on-launch \
--profile infrarunbook-admin
4. Deploy the NAT Gateway
Private subnet instances need a path out to the internet for package updates, AWS API calls, container image pulls, and similar outbound-only traffic. The NAT Gateway handles that — it lives in a public subnet and masquerades outbound connections from the private subnets behind a static Elastic IP. The key word here is outbound-only. Nothing on the internet can initiate a connection to your private instances through a NAT Gateway.
First, allocate an Elastic IP for the NAT Gateway:
aws ec2 allocate-address \
--domain vpc \
--tag-specifications 'ResourceType=elastic-ip,Tags=[{Key=Name,Value=infrarunbook-nat-eip}]' \
--profile infrarunbook-admin
EIP_ALLOC=eipalloc-0abc1234def56789c
Create the NAT Gateway in the public subnet for us-east-1a. It must be in a public subnet — a point I'll revisit in the common mistakes section because it catches people more often than you'd expect.
aws ec2 create-nat-gateway \
--subnet-id $PUB_SUBNET_1A \
--allocation-id $EIP_ALLOC \
--tag-specifications 'ResourceType=natgateway,Tags=[{Key=Name,Value=infrarunbook-nat-1a}]' \
--profile infrarunbook-admin
NAT_GW_ID=nat-0abc1234def56789d
NAT Gateways take a couple of minutes to become available. Wait before proceeding:
aws ec2 wait nat-gateway-available \
--nat-gateway-ids $NAT_GW_ID \
--profile infrarunbook-admin
For a true high-availability setup in production, you'd deploy a second NAT Gateway in us-east-1b and route each private subnet through its own AZ-local NAT Gateway. That way, an AZ failure doesn't cut off internet access for private instances in the surviving AZ. For staging or dev, a single NAT Gateway is cost-effective and acceptable.
5. Configure Route Tables
Route tables are where the public-versus-private distinction actually becomes enforced at the network level. The public route table has a default route pointing to the Internet Gateway. The private route table has a default route pointing to the NAT Gateway. That's the entire conceptual difference — the rest is just plumbing.
# Create the public route table
aws ec2 create-route-table \
--vpc-id $VPC_ID \
--tag-specifications 'ResourceType=route-table,Tags=[{Key=Name,Value=infrarunbook-rtb-public}]' \
--profile infrarunbook-admin
PUBLIC_RTB=rtb-0pub5555eeee
# Add the default route to the IGW
aws ec2 create-route \
--route-table-id $PUBLIC_RTB \
--destination-cidr-block 0.0.0.0/0 \
--gateway-id $IGW_ID \
--profile infrarunbook-admin
# Associate both public subnets
aws ec2 associate-route-table \
--route-table-id $PUBLIC_RTB \
--subnet-id $PUB_SUBNET_1A \
--profile infrarunbook-admin
aws ec2 associate-route-table \
--route-table-id $PUBLIC_RTB \
--subnet-id $PUB_SUBNET_1B \
--profile infrarunbook-admin
# Create the private route table
aws ec2 create-route-table \
--vpc-id $VPC_ID \
--tag-specifications 'ResourceType=route-table,Tags=[{Key=Name,Value=infrarunbook-rtb-private}]' \
--profile infrarunbook-admin
PRIVATE_RTB=rtb-0priv6666ffff
# Add the default route to the NAT Gateway
aws ec2 create-route \
--route-table-id $PRIVATE_RTB \
--destination-cidr-block 0.0.0.0/0 \
--nat-gateway-id $NAT_GW_ID \
--profile infrarunbook-admin
# Associate both private subnets
aws ec2 associate-route-table \
--route-table-id $PRIVATE_RTB \
--subnet-id $PRIV_SUBNET_1A \
--profile infrarunbook-admin
aws ec2 associate-route-table \
--route-table-id $PRIVATE_RTB \
--subnet-id $PRIV_SUBNET_1B \
--profile infrarunbook-admin
Full Configuration Example
Here's the complete Terraform configuration that builds everything above. I find it useful to have both the CLI walkthrough (for building mental models) and a Terraform config (for managing the lifecycle in production). This is something you can drop into a
vpc.tffile and own properly with state.
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
provider "aws" {
region = "us-east-1"
profile = "infrarunbook-admin"
}
# VPC
resource "aws_vpc" "main" {
cidr_block = "10.10.0.0/16"
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "infrarunbook-vpc"
Environment = "production"
}
}
# Internet Gateway
resource "aws_internet_gateway" "main" {
vpc_id = aws_vpc.main.id
tags = {
Name = "infrarunbook-igw"
}
}
# Public Subnets
resource "aws_subnet" "public_1a" {
vpc_id = aws_vpc.main.id
cidr_block = "10.10.1.0/24"
availability_zone = "us-east-1a"
map_public_ip_on_launch = true
tags = {
Name = "infrarunbook-public-1a"
Tier = "public"
}
}
resource "aws_subnet" "public_1b" {
vpc_id = aws_vpc.main.id
cidr_block = "10.10.2.0/24"
availability_zone = "us-east-1b"
map_public_ip_on_launch = true
tags = {
Name = "infrarunbook-public-1b"
Tier = "public"
}
}
# Private Subnets
resource "aws_subnet" "private_1a" {
vpc_id = aws_vpc.main.id
cidr_block = "10.10.10.0/24"
availability_zone = "us-east-1a"
tags = {
Name = "infrarunbook-private-1a"
Tier = "private"
}
}
resource "aws_subnet" "private_1b" {
vpc_id = aws_vpc.main.id
cidr_block = "10.10.20.0/24"
availability_zone = "us-east-1b"
tags = {
Name = "infrarunbook-private-1b"
Tier = "private"
}
}
# Elastic IP for NAT Gateway
resource "aws_eip" "nat_1a" {
domain = "vpc"
tags = {
Name = "infrarunbook-nat-eip"
}
}
# NAT Gateway — must be in the public subnet
resource "aws_nat_gateway" "main" {
allocation_id = aws_eip.nat_1a.id
subnet_id = aws_subnet.public_1a.id
tags = {
Name = "infrarunbook-nat-1a"
}
depends_on = [aws_internet_gateway.main]
}
# Public Route Table
resource "aws_route_table" "public" {
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.main.id
}
tags = {
Name = "infrarunbook-rtb-public"
}
}
resource "aws_route_table_association" "public_1a" {
subnet_id = aws_subnet.public_1a.id
route_table_id = aws_route_table.public.id
}
resource "aws_route_table_association" "public_1b" {
subnet_id = aws_subnet.public_1b.id
route_table_id = aws_route_table.public.id
}
# Private Route Table
resource "aws_route_table" "private" {
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
nat_gateway_id = aws_nat_gateway.main.id
}
tags = {
Name = "infrarunbook-rtb-private"
}
}
resource "aws_route_table_association" "private_1a" {
subnet_id = aws_subnet.private_1a.id
route_table_id = aws_route_table.private.id
}
resource "aws_route_table_association" "private_1b" {
subnet_id = aws_subnet.private_1b.id
route_table_id = aws_route_table.private.id
}
# Lock down the default security group
resource "aws_default_security_group" "default" {
vpc_id = aws_vpc.main.id
tags = {
Name = "infrarunbook-sg-default-locked"
}
}
Deploy it with:
terraform init
terraform plan -out=infrarunbook-vpc.tfplan
terraform apply infrarunbook-vpc.tfplan
Verification Steps
Don't assume it worked because Terraform printed "Apply complete." Verify the actual routing behavior before you start deploying workloads into this network.
Check the Internet Gateway attachment
aws ec2 describe-internet-gateways \
--filters "Name=attachment.vpc-id,Values=$VPC_ID" \
--query 'InternetGateways[].{ID:InternetGatewayId,State:Attachments[0].State}' \
--output table \
--profile infrarunbook-admin
You want to see
attached. If it shows
detached, run the
attach-internet-gatewaycommand again — something interrupted it.
Verify route table entries and associations
aws ec2 describe-route-tables \
--filters "Name=vpc-id,Values=$VPC_ID" \
--query 'RouteTables[].{Name:Tags[?Key==`Name`].Value|[0],Routes:Routes[?DestinationCidrBlock==`0.0.0.0/0`]}' \
--output json \
--profile infrarunbook-admin
The public route table's default route should reference your IGW ID. The private route table's default route should reference your NAT Gateway ID. If either points somewhere unexpected — or worse, points to nothing — your subnet associations are wrong.
Test outbound connectivity from a private instance
Launch a t3.micro into a private subnet with no public IP assigned. SSH into a bastion or jump box in the public subnet first, then hop across to the private instance. From there, test outbound internet access:
# On sw-infrarunbook-01 (bastion in 10.10.1.0/24)
ssh -i ~/.ssh/infrarunbook-admin.pem ec2-user@10.10.10.55
# From the private instance at 10.10.10.55
curl -s https://checkip.amazonaws.com
The IP address returned should match your NAT Gateway's Elastic IP — not the private instance's RFC 1918 address. If the curl hangs, it's almost always a route table association problem or the NAT Gateway isn't yet in the
availablestate.
Confirm private instances have no public IP
aws ec2 describe-instances \
--filters "Name=subnet-id,Values=$PRIV_SUBNET_1A" \
--query 'Reservations[].Instances[].{ID:InstanceId,PublicIP:PublicIpAddress,PrivateIP:PrivateIpAddress}' \
--output table \
--profile infrarunbook-admin
The
PublicIPcolumn should be
Nonefor every instance in a private subnet. If you see a public IP, you've accidentally enabled
map_public_ip_on_launchon a subnet you intended to keep private. Fix it at the subnet attribute level and re-launch any affected instances.
Common Mistakes
I've helped a lot of teams wire up their first VPC, and the same class of problems keeps showing up. Here's what to watch for.
Not explicitly associating subnets with route tables
This is the top offender. When you create a subnet, AWS automatically associates it with the VPC's main route table. If your main route table happens to have a default route to the Internet Gateway — which it often does after people configure things in a non-deliberate order — then your supposedly private subnets are actually publicly routable. Always explicitly associate every subnet with the correct route table and verify the associations rather than trusting the default behavior.
Placing the NAT Gateway in a private subnet
The NAT Gateway must live in a public subnet. It sounds counterintuitive, but the NAT Gateway itself needs a path to the internet — via the Internet Gateway — to actually forward traffic outbound. Placing it in a private subnet puts it in a routing loop and it'll end up in a failed state. Public subnet, always, with the IGW-backed route table association confirmed.
Missing the depends_on for NAT Gateway in Terraform
Terraform doesn't always infer the dependency between the NAT Gateway and the Internet Gateway attachment. In my experience, skipping the explicit
depends_on = [aws_internet_gateway.main]on your NAT Gateway resource leads to occasional race conditions where the NAT Gateway is provisioned before the IGW finishes attaching to the VPC. The result is a NAT Gateway stuck in a failed or pending state that you then have to delete and recreate. Add the explicit dependency — it costs nothing and prevents a frustrating redeploy.
Leaving the default security group open
Every VPC comes with a default security group that allows all traffic between any resources that share it. If you don't explicitly restrict it, instances may end up in that group by accident — particularly if someone launches a resource through the console without specifying a group. Lock down the default security group by removing all inbound and outbound rules. Use purpose-built, named security groups for every workload. This is a security posture issue, not just a housekeeping preference.
Single NAT Gateway in a multi-AZ production environment
A single NAT Gateway is an availability risk in production. If the AZ hosting it goes down, every private subnet instance in every other AZ loses outbound internet access simultaneously. The cost of a second NAT Gateway is roughly $32/month plus data transfer charges — almost always worth it for production traffic. For staging and dev environments, a single NAT Gateway is a perfectly reasonable tradeoff.
CIDR blocks that conflict with future peers
Once a VPC is created, you cannot change the primary CIDR block without a full rebuild. If you later try to peer this VPC with another one that uses an overlapping range — or connect it to an on-premises network via Direct Connect — the peer simply won't work and AWS will reject the peering configuration. Plan your address space in a spreadsheet or use AWS IPAM if you're running multiple accounts. The five minutes of planning saves days of re-architecture.
A VPC built deliberately the first time — with intentional CIDR planning, explicit route table associations, and a locked-down default security group — saves you from having to re-architect under production load. Spend the extra thirty minutes verifying routing behavior before workloads land. The fundamentals here don't change; get them right once and you won't revisit them.
