What Is the TCP Three-Way Handshake?
The Transmission Control Protocol (TCP) is a connection-oriented transport layer protocol defined in RFC 793. Before any application data is exchanged between two hosts, TCP requires a formal connection establishment procedure known as the three-way handshake. This mechanism ensures both endpoints are reachable, agree on initial sequence numbers, and negotiate fundamental connection parameters before the first byte of payload is ever transmitted.
The three-way handshake is the foundation of reliable, ordered, and error-checked communication. Every HTTPS session to solvethenetwork.com, every SSH login to sw-infrarunbook-01 as infrarunbook-admin, and every database query sent from 192.168.10.50 to 10.0.1.20 begins with this exact exchange. Understanding it deeply is not optional for infrastructure engineers — it is the baseline for diagnosing latency anomalies, firewall misconfigurations, half-open connection accumulation, and denial-of-service conditions.
How the Three-Way Handshake Works
The handshake involves three distinct packet exchanges between a client and a server. Each packet carries specific TCP control flags and sequence numbers that establish synchronized state on both sides of the connection.
Step 1 — SYN (Synchronize)
The client initiates the connection by sending a TCP segment with the SYN flag set. This segment carries the client's Initial Sequence Number (ISN), a randomly generated 32-bit value chosen to reduce the risk of sequence number collision or connection hijacking. The client transitions from the CLOSED state to the SYN_SENT state and starts a retransmission timer.
Client (192.168.1.100:54321) --> Server (10.0.1.20:443)
Flags : SYN
Seq : 1000
Ack : 0
Window: 65535
Options: mss 1460, sackOK, TS val 100000 ecr 0, nop, wscale 7Step 2 — SYN-ACK (Synchronize-Acknowledge)
The server receives the SYN, records the client's ISN, and responds with a segment carrying both the SYN flag and the ACK flag simultaneously. The ACK number is set to the client's ISN plus one, acknowledging receipt of the SYN byte. The server also includes its own ISN in the Seq field. The server moves from the LISTEN state to the SYN_RECEIVED state and stores the half-open connection in the SYN backlog queue.
Server (10.0.1.20:443) --> Client (192.168.1.100:54321)
Flags : SYN, ACK
Seq : 5000
Ack : 1001
Window: 64240
Options: mss 1460, sackOK, TS val 200000 ecr 100000, nop, wscale 7Step 3 — ACK (Acknowledge)
The client receives the SYN-ACK, records the server's ISN, and sends a final ACK segment. The ACK number is set to the server's ISN plus one. No application data is required in this segment, though TCP Fast Open can include data here. Upon sending this ACK, the client transitions to the ESTABLISHED state. When the server receives this final ACK, it moves the connection from the SYN backlog to the accept queue and transitions to ESTABLISHED. The connection is now ready for bidirectional data transfer.
Client (192.168.1.100:54321) --> Server (10.0.1.20:443)
Flags : ACK
Seq : 1001
Ack : 5001
Window: 65535TCP Connection States Explained
TCP is a state machine. At any point in the lifecycle of a connection, each endpoint is in exactly one well-defined state. The complete set of TCP states is defined in RFC 793 and is critical for correctly interpreting output from tools like
ss,
netstat, and Wireshark packet captures.
Full State Lifecycle
--- Connection Establishment ---
CLOSED --> LISTEN (server binds socket and calls listen())
CLOSED --> SYN_SENT (client sends SYN)
LISTEN --> SYN_RECEIVED (server receives SYN, sends SYN-ACK)
SYN_SENT --> ESTABLISHED (client receives SYN-ACK, sends ACK)
SYN_RECEIVED --> ESTABLISHED (server receives final ACK)
--- Data Transfer ---
ESTABLISHED <--> ESTABLISHED
--- Teardown: Active Closer (initiates FIN) ---
ESTABLISHED --> FIN_WAIT_1 (sends FIN)
FIN_WAIT_1 --> FIN_WAIT_2 (receives ACK of FIN)
FIN_WAIT_2 --> TIME_WAIT (receives FIN from peer, sends final ACK)
TIME_WAIT --> CLOSED (waits 2xMSL, then socket is released)
--- Teardown: Passive Closer (receives FIN first) ---
ESTABLISHED --> CLOSE_WAIT (receives FIN, sends ACK)
CLOSE_WAIT --> LAST_ACK (application calls close(), sends FIN)
LAST_ACK --> CLOSED (receives final ACK)Key States for Infrastructure Engineers
- LISTEN — The server socket is bound and waiting for incoming SYN segments. A process such as nginx on sw-infrarunbook-01 will show this state on port 443.
- SYN_RECEIVED — A SYN has been received and a SYN-ACK has been sent, but the final ACK has not yet arrived. Connections accumulate here during SYN flood attacks and when the accept queue is full.
- ESTABLISHED — The connection is fully open and data flows bidirectionally. The normal operating state for active connections.
- TIME_WAIT — The local side completed an active close and waits to ensure the final ACK reached the peer. On Linux this lasts 60 seconds. High TIME_WAIT counts are normal and expected on busy servers.
- CLOSE_WAIT — The remote peer sent a FIN. The local application has acknowledged it but has not yet called
close()
. Accumulation of CLOSE_WAIT sockets always indicates an application-layer bug. - FIN_WAIT_2 — The local FIN was acknowledged but the peer's FIN has not arrived yet. Can persist indefinitely if the peer application delays closing; Linux enforces a timeout via
net.ipv4.tcp_fin_timeout
.
Why the Three-Way Handshake Matters
The handshake is not ceremonial overhead — it simultaneously achieves several goals that are impossible to accomplish in fewer exchanges:
- Bidirectional reachability verification — A two-step exchange only confirms one direction. The third step (client ACK) proves the server's SYN-ACK successfully reached the client, confirming the return path is functional.
- ISN synchronization on both sides — Each endpoint exchanges and acknowledges the other's initial sequence number. This establishes the ordering reference for all subsequent data segments and retransmissions.
- Receive window advertisement — Both sides advertise their initial receive window, establishing the flow control baseline before any data is sent.
- TCP option negotiation — Maximum Segment Size (MSS), window scaling (RFC 1323), Selective Acknowledgment (SACK, RFC 2018), and timestamps are negotiated exclusively in SYN and SYN-ACK segments. They cannot be added or changed once the connection is established.
Critical operational note: TCP options absent from the SYN and SYN-ACK packets cannot be used for the lifetime of that connection, regardless of what either host kernel supports. A firewall or middlebox that strips the window scaling option from SYN packets permanently caps TCP throughput for every connection it touches — often manifesting as mysterious performance degradation that disappears when the firewall is bypassed.
Observing the Handshake in Practice
On sw-infrarunbook-01, the following tools give direct visibility into handshake mechanics and connection state. All examples use RFC 1918 addressing consistent with a real infrastructure environment.
Capturing the Handshake with tcpdump
# Logged in as infrarunbook-admin on sw-infrarunbook-01
# Capture only handshake and teardown packets to 10.0.1.20:443
tcpdump -i eth0 -nn 'host 10.0.1.20 and port 443 and (tcp[tcpflags] & (tcp-syn|tcp-ack|tcp-fin|tcp-rst) != 0)'
# Representative output:
10:42:01.001234 IP 192.168.1.100.54321 > 10.0.1.20.443: Flags [S],
seq 1234567890, win 65535,
options [mss 1460,sackOK,TS val 100000 ecr 0,nop,wscale 7], length 0
10:42:01.002100 IP 10.0.1.20.443 > 192.168.1.100.54321: Flags [S.],
seq 9876543210, ack 1234567891, win 64240,
options [mss 1460,sackOK,TS val 200000 ecr 100000,nop,wscale 7], length 0
10:42:01.002300 IP 192.168.1.100.54321 > 10.0.1.20.443: Flags [.],
ack 9876543211, win 512, length 0Flag notation in tcpdump:
[S]= SYN,
[S.]= SYN-ACK,
[.]= pure ACK,
[F.]= FIN-ACK,
[R]= RST,
[P.]= PSH-ACK (data segment).
Checking Connection States with ss
# View all TCP connections with state on sw-infrarunbook-01
ss -tan
# Output (abbreviated):
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 0.0.0.0:443 0.0.0.0:*
ESTABLISHED 0 0 10.0.1.20:443 192.168.1.100:54321
TIME-WAIT 0 0 10.0.1.20:443 192.168.1.200:51234
SYN-RECV 0 0 10.0.1.20:443 172.16.0.5:60001
CLOSE-WAIT 0 0 10.0.1.20:443 192.168.2.10:49812Counting States for Capacity Planning
# Summarize all TCP states on sw-infrarunbook-01
ss -tan | awk 'NR>1 {print $1}' | sort | uniq -c | sort -rn
# Healthy output on a moderately loaded web server:
1842 ESTABLISHED
214 TIME-WAIT
3 LISTEN
1 SYN-RECVThe SYN Backlog and Half-Open Connections
When a SYN arrives at a listening socket, the kernel must hold connection state while awaiting the final ACK. Linux maintains two separate queues for this process:
- SYN backlog (incomplete queue) — Holds connections in
SYN_RECEIVED
state. Its depth is governed bynet.ipv4.tcp_max_syn_backlog
. - Accept queue (complete queue) — Holds fully established connections waiting to be retrieved by the application via
accept()
. Governed by the application'slisten()
backlog parameter, capped atnet.core.somaxconn
.
SYN cookies are the primary defense against SYN flood exhaustion. When the SYN backlog fills, the kernel encodes connection state cryptographically into the ISN of the SYN-ACK rather than allocating a state table entry. This makes SYN processing stateless. If the final ACK arrives later with the encoded cookie, the kernel reconstructs the connection state and moves it directly to ESTABLISHED without ever having used the backlog.
# Check and tune SYN protection on sw-infrarunbook-01
sysctl net.ipv4.tcp_syncookies
sysctl net.ipv4.tcp_max_syn_backlog
sysctl net.core.somaxconn
# Recommended production values:
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_max_syn_backlog = 4096
net.core.somaxconn = 1024TCP Connection Teardown: The Four-Way Exchange
While connection establishment uses three segments, normal teardown requires four because each direction is closed independently. Either side may initiate the close by sending a FIN, which signals it has no more data to send while leaving the opposite direction open (half-close).
Client (192.168.1.100) --> Server (10.0.1.20)
Flags: FIN, ACK | Seq: 2000 | Ack: 6000
# Client: ESTABLISHED --> FIN_WAIT_1
Server (10.0.1.20) --> Client (192.168.1.100)
Flags: ACK | Seq: 6000 | Ack: 2001
# Client: FIN_WAIT_1 --> FIN_WAIT_2
# Server: ESTABLISHED --> CLOSE_WAIT
Server (10.0.1.20) --> Client (192.168.1.100)
Flags: FIN, ACK | Seq: 6000 | Ack: 2001
# Server: CLOSE_WAIT --> LAST_ACK
Client (192.168.1.100) --> Server (10.0.1.20)
Flags: ACK | Seq: 2001 | Ack: 6001
# Client: FIN_WAIT_2 --> TIME_WAIT (waits 60s)
# Server: LAST_ACK --> CLOSEDThe TIME_WAIT state ensures the final ACK reaches the server. If that ACK is lost in transit, the server will retransmit its FIN. The client, still in TIME_WAIT with its four-tuple remembered, can re-send the ACK. Without TIME_WAIT, a new connection on the same port tuple could receive the stale FIN retransmission and be immediately reset.
Real-World Infrastructure Scenarios
Scenario 1: Asymmetric Routing Breaking Stateful Firewalls
Consider sw-infrarunbook-01 at 10.0.1.20 sitting behind a pair of stateful firewalls in an active-active configuration. If the SYN from 192.168.1.100 traverses Firewall A (which creates a state entry) but the SYN-ACK returns via Firewall B (which has no state for this connection), Firewall B drops the SYN-ACK. The client retransmits SYN up to
tcp_syn_retriestimes (Linux default: 6, with exponential backoff reaching over 2 minutes total) before failing. The symptom is connections that never complete establishment, with the client perpetually in SYN_SENT.
Scenario 2: MTU Black Hole After Successful Handshake
The MSS value in the SYN is derived from the interface MTU minus 40 bytes for IPv4 TCP headers. If a transit path has a lower MTU and ICMP Type 3 Code 4 (Fragmentation Needed) messages are blocked by a firewall, large TCP segments are silently dropped after the handshake succeeds. Small packets like SYN (typically under 100 bytes) pass through, so the connection establishes successfully. Data transfer then stalls immediately when the first full-sized segment is sent. This is a Path MTU Discovery (PMTUD) black hole. Workarounds include enabling TCP MSS clamping on the router interface or using
ip tcp adjust-mss.
Scenario 3: TIME_WAIT Port Exhaustion on High-Throughput Services
A service on sw-infrarunbook-01 making many short-lived outbound connections to 10.0.2.50 can exhaust the ephemeral port range. With the default Linux range of 32768–60999 (28,232 ports) and a 60-second TIME_WAIT, sustained connection rates above approximately 470 per second will cause
connect()to fail with
EADDRNOTAVAIL.
# Diagnose port exhaustion on sw-infrarunbook-01
ss -tan state time-wait | wc -l
sysctl net.ipv4.ip_local_port_range
# Mitigation 1: Expand the ephemeral range
sysctl -w net.ipv4.ip_local_port_range="1024 65535"
# Mitigation 2: Enable TIME_WAIT reuse for outbound connections
# Safe only when TCP timestamps are enabled on both sides
sysctl -w net.ipv4.tcp_tw_reuse=1
# Verify TCP timestamps are active
sysctl net.ipv4.tcp_timestampsCommon Misconceptions
Misconception 1: RST and FIN Are Equivalent
A TCP RST segment is an abrupt, immediate connection abort — it discards any data buffered in the kernel and destroys all state on both sides. An application receiving a RST gets an
ECONNRESETerror, not a clean EOF. A FIN is a graceful half-close that allows buffered data to drain before the connection winds down. RST is generated when a segment arrives for a non-existent connection, when a socket is closed with
SO_LINGERtimeout of zero, or when a firewall actively rejects a connection.
Misconception 2: The Handshake Adds Significant Latency on LANs
The handshake adds exactly one round-trip time (RTT) to connection establishment. On a LAN between sw-infrarunbook-01 at 10.0.1.20 and a client at 192.168.1.100 with a 0.5ms RTT, this overhead is immeasurable in practice. The concern is valid for high-latency WAN paths, where a 150ms RTT means 150ms of overhead before the first data byte. This is the fundamental motivation behind TLS session resumption, HTTP/2 multiplexing, HTTP keep-alive, and QUIC — all designed to eliminate or amortize the cost of repeated handshakes.
Misconception 3: High TIME_WAIT Counts Indicate a Problem
TIME_WAIT is not a bug — it is a protocol correctness guarantee. A web server processing thousands of short-lived HTTP/1.0 requests will accumulate TIME_WAIT entries proportional to its request rate. This is expected behavior. The concern arises only when TIME_WAIT entries exhaust port space or memory. A server with 15,000 TIME_WAIT entries and an ephemeral range of 60,000 ports is operating within normal parameters.
Misconception 4: SYN-ACK Proves the Application Is Running
The SYN-ACK is generated by the kernel TCP stack, not by the application. A server can respond with SYN-ACK even if the application process has crashed or stopped processing requests, as long as the listening socket remains open in the kernel. Connection establishment succeeding to port 443 on 10.0.1.20 only proves the kernel is alive and the socket is still bound — not that nginx is processing requests. Application health must be verified at Layer 7.
Misconception 5: TCP Options Can Be Changed Mid-Connection
TCP options negotiated during the handshake — window scaling multiplier, SACK support, timestamp echo — are immutable for the lifetime of the connection. A load balancer or NAT device that strips window scaling from SYN packets permanently caps the bandwidth-delay product for every connection it handles, potentially reducing throughput to a fraction of the available bandwidth on high-latency paths.
Frequently Asked Questions
Q: Why does TCP use three steps instead of two to establish a connection?
A: Two steps only confirm one direction of reachability. The SYN proves the client reached the server. The SYN-ACK proves the server can respond. But without the third step, the server cannot confirm its SYN-ACK reached the client — meaning the server's ISN has not been acknowledged and cannot be used as a reliable reference. The third step proves bidirectionality and simultaneously acknowledges the server's ISN, synchronizing both sides before data flows.
Q: What happens if the final ACK of the three-way handshake is lost in transit?
A: The server remains in SYN_RECEIVED and retransmits the SYN-ACK after a timeout (with exponential backoff). The client, having already sent the ACK and moved to ESTABLISHED, will likely send application data. When the server receives a data segment with an ACK that matches its expected sequence, it processes the embedded acknowledgment and advances to ESTABLISHED. This self-healing behavior is by design — TCP does not require a separate mechanism to recover from a lost final ACK.
Q: What is TCP Fast Open and how does it bypass the handshake overhead?
A: TCP Fast Open (TFO, RFC 7413) allows application data to be included in the SYN packet on repeat connections to the same server. On the first connection, the server issues a cryptographic cookie in the SYN-ACK. The client stores it. On subsequent connections, the client sends this cookie plus data in the SYN itself, and the server can process the data before the handshake completes. This saves one full RTT for repeat connections. Enable on Linux with
net.ipv4.tcp_fastopen = 3(client and server mode).
Q: How do I identify a SYN flood attack in progress on sw-infrarunbook-01?
A: A SYN flood manifests as an abnormally large number of SYN_RECEIVED connections combined with a high SYN packet arrival rate from many source addresses. Use
ss -tan state syn-recv | wc -lto count half-open connections. Inspect
/proc/net/netstatfor rising
TCPSynRetransand
ListenDropscounters. If SYN cookies are not already active, enable them immediately:
sysctl -w net.ipv4.tcp_syncookies=1. Rate-limit SYN packets at the perimeter using stateless firewall rules targeting the SYN flag without ACK.
Q: Why do connections get stuck in CLOSE_WAIT and how is it resolved?
A: CLOSE_WAIT means the remote peer sent a FIN (it is done sending data), the kernel acknowledged it, but the local application has not yet called
close()on the socket. The kernel is waiting for the application. This is exclusively an application-layer defect — the application is failing to detect EOF on the socket and close it, typically due to a code bug, blocked I/O operation, or resource leak. The permanent fix is in the application. To identify the offending process, run
ss -tanp state close-waitand inspect the process column.
Q: What is the difference between tcp_tw_reuse and the removed tcp_tw_recycle?
A:
net.ipv4.tcp_tw_reuse=1allows the kernel to reuse a TIME_WAIT socket for a new outbound connection when TCP timestamps (RFC 1323) are enabled on both sides — the timestamps allow the kernel to verify that no segments from the old connection remain in flight. This is safe.
net.ipv4.tcp_tw_recyclewas an aggressive option that reduced TIME_WAIT duration using per-host timestamp tracking, but it broke connections from clients behind NAT by discarding packets with timestamps that appeared to go backwards. It was removed entirely from the Linux kernel in version 4.12 and must never be referenced in tuning guides for modern systems.
Q: Does the three-way handshake provide security against attackers?
A: Minimally. Randomized ISNs (mandated by RFC 6528) prevent blind sequence number injection by making it computationally infeasible to forge in-window segments without observing actual traffic. SYN cookies defend against resource exhaustion. However, the handshake provides no authentication, no encryption, and no integrity verification for application data. An on-path attacker who can observe packets can read ISNs and inject or hijack the connection. These protections are provided by TLS, which establishes its own authenticated and encrypted channel on top of the established TCP connection.
Q: What does a RST received in response to a SYN indicate?
A: A RST in response to a SYN means the server actively refused the connection. Common causes: no process is listening on the target port, a host firewall rule (iptables
REJECTwith TCP RST) is configured, or a load balancer has removed the backend but the client has stale DNS. This is distinct from a SYN that receives no response at all — silence indicates a firewall
DROPrule, packet loss, or the host being unreachable. RST means the port is reachable but closed; timeout means the path is broken or filtered.
Q: How does the accept queue relate to the handshake and what happens when it fills?
A: After the three-way handshake completes, the kernel places the established connection into the accept queue, where it waits for the application to call
accept(). The queue depth is limited by
min(listen_backlog, net.core.somaxconn). When the accept queue is full, the kernel drops incoming SYN packets entirely (the client sees a timeout, not a RST). From the client's perspective, connections appear to time out even though the server port is open and the server is alive. Monitor accept queue depth with
ss -lnt— a non-zero
Recv-Qvalue on a LISTEN socket indicates backlog pressure and an application that cannot keep up with incoming connections.
Q: Can I observe TCP connection states without root privileges on sw-infrarunbook-01?
A: Yes, for your own process's connections. The
ss -tnand
netstat -tncommands show connection states without elevated privileges. However, capturing raw packets with
tcpdumprequires root or the
CAP_NET_RAWcapability. To see socket-to-process mappings (
ss -tanpor
netstat -tanp), you must be root or own the process. For application-level tracing,
strace -e connect,accept,close -p PIDtraces socket syscalls for a specific process without requiring raw network access.
Q: What is the significance of the MSL (Maximum Segment Lifetime) in TIME_WAIT?
A: MSL is the theoretical maximum time any TCP segment can remain in the network before being discarded — defined as 2 minutes in RFC 793, though modern networks typically route segments away in milliseconds. TIME_WAIT lasts 2×MSL to guarantee that all segments from the old connection have expired before the port tuple can be reused. This prevents a new connection from receiving a stale duplicate segment from the previous connection, which could corrupt the data stream. On Linux, MSL is effectively hardcoded at 30 seconds in the kernel (
TCP_TIMEWAIT_LEN), making TIME_WAIT last 60 seconds. This value cannot be changed without recompiling the kernel.
