Overview
Every HAProxy deployment eventually faces the same question: what happens when backend servers get overwhelmed? Misconfigured connection limits let traffic pile up silently until sockets are exhausted. Poorly tuned timeouts leave connections open far longer than necessary, consuming file descriptors and memory. A server brought back online abruptly receives full production load immediately, often crashing again before caches warm or connection pools stabilize.
HAProxy provides a layered system of connection controls — global maxconn, frontend maxconn, backend maxconn, per-server maxconn, queue depth, and timeout directives — that together determine exactly how connections are accepted, queued, forwarded, and cleaned up. This article covers production tuning at every layer of that hierarchy, including queue behavior under saturation, slow-start ramp-up for recovering servers, and graceful draining strategies for zero-downtime deploys.
The HAProxy Connection Limit Hierarchy
HAProxy enforces connection limits at four distinct levels:
- Global maxconn — absolute ceiling on simultaneous connections across the entire HAProxy process
- Frontend maxconn — per-listener cap controlling how many connections a single bind point accepts from clients
- Backend maxconn — aggregate cap on simultaneous connections forwarded to all servers in a backend combined
- Server maxconn — per-server concurrency limit; connections exceeding this value are queued at HAProxy
These limits are independent and additive. A frontend may accept 30,000 connections while each individual backend server is capped at 250 concurrent requests. Excess requests wait in per-server queues inside HAProxy, not back at the kernel TCP stack. Understanding this separation is critical for diagnosing saturation.
Global maxconn and File Descriptor Tuning
The global section sets the absolute process-level ceiling:
global
maxconn 50000
ulimit-n 102400
user haproxy
group haproxy
daemon
stats socket /var/run/haproxy/admin.sock mode 660 level admin expose-fd listenersWhen the global maxconn ceiling is reached, HAProxy stops calling
accept()on its listening sockets. New TCP connections accumulate in the kernel's accept backlog until an existing connection closes and HAProxy resumes accepting. If the backlog also fills, the kernel silently drops incoming SYN packets.
ulimit-nmust be large enough to support all connections simultaneously. Each connection requires at least two file descriptors (one for the client socket, one for the server socket), plus additional descriptors for the stats socket, logging pipes, and control channels. A safe formula is
(maxconn * 2) + 64. Verify the actual running limit with:
cat /proc/$(pidof haproxy)/limits | grep "open files"If HAProxy is managed by systemd, set
LimitNOFILEin the unit override file rather than relying on
ulimit-nin haproxy.cfg — systemd applies its own resource limits independently:
# /etc/systemd/system/haproxy.service.d/limits.conf
[Service]
LimitNOFILE=131072Frontend maxconn and Kernel Backlog
The frontend maxconn directive limits concurrent established connections from clients to the HAProxy listener:
frontend http_in
bind 10.10.1.10:80 backlog 16384
maxconn 20000
default_backend web_serversOnce the frontend maxconn limit is reached, HAProxy stops accepting new connections. Incoming SYNs queue in the kernel's TCP accept backlog up to the value specified on the
binddirective. Beyond that, the kernel drops SYNs or resets connections depending on socket settings.
On Linux, the kernel enforces its own cap on the backlog value. Tune the following sysctls to allow large backlog values to take effect:
sysctl -w net.core.somaxconn=32768
sysctl -w net.ipv4.tcp_max_syn_backlog=32768Setting a large backlog (16384 or higher) gives HAProxy a buffer during traffic bursts. During a spike, connections accumulate in the kernel queue rather than being dropped outright while HAProxy drains its active connection count.
Per-Server maxconn and Queue Behavior
The most operationally important connection limit is per-server maxconn. When a server hits its limit, HAProxy queues additional requests internally rather than forwarding them:
backend web_servers
balance leastconn
timeout queue 30s
server web01 10.10.1.21:8080 maxconn 200 check inter 3s fall 2 rise 3
server web02 10.10.1.22:8080 maxconn 200 check inter 3s fall 2 rise 3
server web03 10.10.1.23:8080 maxconn 200 check inter 3s fall 2 rise 3When
web01holds 200 active connections, the 201st request enters a server-level queue inside HAProxy. HAProxy dispatches it to
web01as soon as a slot opens. If
option redispatchis enabled and the queue timeout expires before a slot opens, HAProxy may route the request to another server instead of returning a 503.
Backend-level maxconn without a per-server value caps aggregate concurrency across all servers combined:
backend api_servers
maxconn 600
server api01 10.10.1.31:9000 check
server api02 10.10.1.32:9000 check
server api03 10.10.1.33:9000 checkHere, no single server is individually capped but the backend will not forward more than 600 connections total regardless of how many servers exist.
timeout queue and Queue Depth Monitoring
timeout queuedefines how long a connection waits in the per-server queue before HAProxy abandons it with a 503. This is a critical tuning knob for application SLA compliance:
backend app_servers
timeout queue 20s
server app01 10.10.1.41:8080 maxconn 150 check
server app02 10.10.1.42:8080 maxconn 150 checkFor interactive HTTP APIs, 5–15 seconds is typically the upper bound of user tolerance. For asynchronous background job processors, 60–120 seconds may be acceptable. Never leave
timeout queueunset — it defaults to inheriting from
timeout connect, which is usually far too short and can cause unexpected 503 bursts during transient load spikes.
Monitor live queue depth via the Runtime API socket:
echo "show stat" | socat stdio /var/run/haproxy/admin.sock | \
cut -d',' -f1,2,3,4,6 | column -t -s','The key CSV fields are
qcur(current queue depth) and
qmax(maximum queue depth observed since last reset). Any sustained non-zero
qcurindicates backend server saturation and should trigger scaling or alerting.
option redispatch and Retries
option redispatchallows HAProxy to re-route a connection to a different server if the originally selected server's queue times out or if connection attempts fail after exhausting retries:
backend web_servers
option redispatch
retries 3
timeout queue 20s
server web01 10.10.1.21:8080 maxconn 150 check
server web02 10.10.1.22:8080 maxconn 150 check
server web03 10.10.1.23:8080 maxconn 150 checkretriescontrols how many times HAProxy retries a failed TCP connection attempt to a specific server before treating the attempt as failed. With
option redispatch, after exhausting retries, HAProxy selects a different server from the pool rather than returning an error to the client.
Exercise caution with
option redispatchfor non-idempotent POST or PUT requests. If the initial server partially processed a request before the connection failed, redispatching to another server may cause duplicate processing. For stateless read-heavy APIs this is generally safe.
Timeout Directives: Full Reference
HAProxy maintains separate timeout clocks for each phase of a connection lifecycle. Misconfiguring any of these is a frequent source of production incidents involving half-open connections, zombie sessions, and file descriptor exhaustion.
defaults
timeout connect 5s
timeout client 30s
timeout server 60s
timeout queue 30s
timeout tunnel 1h
timeout http-request 10s
timeout http-keep-alive 5s- timeout connect: Time HAProxy allows to establish a TCP connection to a backend server. Set to 3–5 seconds for LAN backends. Fast failure here enables quick failover when a server is unreachable.
- timeout client: Inactivity timeout on the client-facing socket. If the client sends no data for this duration, HAProxy closes the connection. For long-polling or streaming, increase this value or override per-frontend.
- timeout server: Inactivity timeout on the backend server socket. If the server sends no data for this duration, HAProxy terminates the connection. Should be set longer than the slowest expected application response.
- timeout queue: Maximum wait time for a connection queued behind a full server maxconn. Expiry returns a 503 to the client.
- timeout tunnel: Replaces client and server timeouts once a tunnel is established (WebSocket, HTTP CONNECT). Must always be set explicitly — without it, tunneled connections can persist indefinitely, exhausting connection slots.
- timeout http-request: Maximum time to receive a complete HTTP request header from the client. Protects against slowloris-style attacks that open connections and trickle headers slowly. Values of 5–15 seconds are typical.
- timeout http-keep-alive: Idle time between HTTP requests on a persistent keep-alive connection. Short values (2–5 seconds) prevent server-side socket accumulation under high concurrency by recycling idle keep-alive connections quickly.
Slow-Start: Graduated Traffic Ramp-Up for Recovering Servers
When a server transitions from DOWN to UP after a health check failure, the
slowstartparameter prevents it from immediately receiving its full configured weight of traffic. Instead, HAProxy linearly ramps the server's effective weight from zero to its configured value over the specified time window:
backend app_servers
balance leastconn
server app01 10.10.1.41:8080 maxconn 200 check inter 3s fall 2 rise 3 slowstart 60s weight 100
server app02 10.10.1.42:8080 maxconn 200 check inter 3s fall 2 rise 3 slowstart 60s weight 100
server app03 10.10.1.43:8080 maxconn 200 check inter 3s fall 2 rise 3 slowstart 60s weight 100Over the 60-second slow-start window, a server with
weight 100will start at effective weight 0 and reach weight 100 gradually. This prevents a freshly restarted application instance — whose JVM is not yet warmed, whose database connection pool is not full, or whose in-memory cache is cold — from being immediately overwhelmed with the same traffic volume it was handling before it failed.
slowstartonly activates on UP transitions following a DOWN state. It does not apply when HAProxy first starts if the server is already healthy at startup. Inspect current effective weights with the Runtime API:
echo "show servers state app_servers" | \
socat stdio /var/run/haproxy/admin.sockThe
cur_weightcolumn shows the interpolated weight in real time as the slow-start ramp progresses.
Graceful Server Draining for Zero-Downtime Deployments
Before taking a backend server offline for a deployment or maintenance window, set it to DRAIN state via the Runtime API. In drain mode, HAProxy stops routing new requests to the server but allows all active sessions to complete naturally:
echo "set server app_servers/app01 state drain" | \
socat stdio /var/run/haproxy/admin.sockPoll the active session count until it reaches zero:
watch -n 2 'echo "show stat" | socat stdio /var/run/haproxy/admin.sock | \
awk -F"," "/app_servers.*app01/ {print \"scur="\$5\" qcur="\$3}"'Once both
scur(active sessions) and
qcur(queued connections) are zero, the server has fully drained and can be safely stopped or redeployed. After the operation, restore it to service:
echo "set server app_servers/app01 state ready" | \
socat stdio /var/run/haproxy/admin.sockIf
slowstartis configured, the server will ramp up gradually after returning to ready state — a natural pairing with drain-based deploys.
Weight Manipulation for Canary Deployments
HAProxy supports dynamic per-server weight changes at runtime without configuration reloads. This enables traffic splitting for canary or blue-green deployments:
# Bring up canary server with 10% of traffic
echo "set server app_servers/app04 weight 10" | \
socat stdio /var/run/haproxy/admin.sock
# Verify effective weights
echo "show servers state app_servers" | \
socat stdio /var/run/haproxy/admin.sockWith three servers at weight 100 and one canary at weight 10, approximately 1 in 31 requests routes to the canary. Gradually increment the canary's weight while monitoring error rates and latency. If the canary is stable, promote it to full weight and drain the old servers.
Complete Production Configuration Reference
The following configuration combines all the controls discussed in this article into a production-ready HAProxy setup for a web application backend:
global
log 127.0.0.1 local0 info
maxconn 60000
ulimit-n 122880
user haproxy
group haproxy
daemon
stats socket /var/run/haproxy/admin.sock mode 660 level admin expose-fd listeners
defaults
log global
mode http
option httplog
option dontlognull
option redispatch
retries 3
timeout connect 5s
timeout client 30s
timeout server 60s
timeout queue 30s
timeout http-request 10s
timeout http-keep-alive 5s
frontend https_in
bind 10.10.1.10:443 ssl crt /etc/haproxy/certs/solvethenetwork.com.pem \
alpn h2,http/1.1 backlog 16384
maxconn 30000
default_backend app_servers
backend app_servers
balance leastconn
option httpchk GET /health HTTP/1.1\r\nHost:\ solvethenetwork.com
http-check expect status 200
timeout queue 30s
server app01 10.10.1.41:8080 maxconn 250 check inter 3s fall 2 rise 3 slowstart 60s weight 100
server app02 10.10.1.42:8080 maxconn 250 check inter 3s fall 2 rise 3 slowstart 60s weight 100
server app03 10.10.1.43:8080 maxconn 250 check inter 3s fall 2 rise 3 slowstart 60s weight 100
server app04 10.10.1.44:8080 maxconn 250 check inter 3s fall 2 rise 3 slowstart 60s weight 100This configuration accepts up to 60,000 simultaneous connections globally, caps each application server at 250 concurrent requests before queuing, queues excess requests for up to 30 seconds before returning a 503, ramps recovered servers up over 60 seconds, and retries failed connections on alternate servers before giving up.
Key Metrics to Monitor in Production
Export the following HAProxy statistics metrics to your monitoring stack (Prometheus, Datadog, or similar) and alert on sustained anomalies:
- qcur > 0 on any server: active queue pressure; consider scaling horizontally
- scur approaching server maxconn: individual server saturation imminent
- dreq increasing: frontend maxconn ceiling being hit; connections dropped by HAProxy
- eresp increasing: backend returning more errors; correlate with application logs
- wredis increasing: connections being redispatched after queue timeout; indicates sustained backend overload
- conn_rate spiking: connection rate spike may indicate a traffic surge or a connection leak from clients
