HAProxy Connection Limits, Timeouts, and Queue Management: maxconn, Slow-Start, and Graceful Draining

Published: Mar 25, 2026

Updated: Apr 13, 2026

A production-focused deep dive into HAProxy connection limit hierarchy, timeout directives, queue behavior, slow-start ramp-up, and graceful server draining for zero-downtime deployments.

HAProxy Connection Limits, Timeouts, and Queue Management: maxconn, Slow-Start, and Graceful Draining

Overview

Every HAProxy deployment eventually faces the same question: what happens when backend servers get overwhelmed? Misconfigured connection limits let traffic pile up silently until sockets are exhausted. Poorly tuned timeouts leave connections open far longer than necessary, consuming file descriptors and memory. A server brought back online abruptly receives full production load immediately, often crashing again before caches warm or connection pools stabilize.

HAProxy provides a layered system of connection controls — global maxconn, frontend maxconn, backend maxconn, per-server maxconn, queue depth, and timeout directives — that together determine exactly how connections are accepted, queued, forwarded, and cleaned up. This article covers production tuning at every layer of that hierarchy, including queue behavior under saturation, slow-start ramp-up for recovering servers, and graceful draining strategies for zero-downtime deploys.

The HAProxy Connection Limit Hierarchy

HAProxy enforces connection limits at four distinct levels:

Global maxconn — absolute ceiling on simultaneous connections across the entire HAProxy process
Frontend maxconn — per-listener cap controlling how many connections a single bind point accepts from clients
Backend maxconn — aggregate cap on simultaneous connections forwarded to all servers in a backend combined
Server maxconn — per-server concurrency limit; connections exceeding this value are queued at HAProxy

These limits are independent and additive. A frontend may accept 30,000 connections while each individual backend server is capped at 250 concurrent requests. Excess requests wait in per-server queues inside HAProxy, not back at the kernel TCP stack. Understanding this separation is critical for diagnosing saturation.

Global maxconn and File Descriptor Tuning

The global section sets the absolute process-level ceiling:

global
    maxconn 50000
    ulimit-n 102400
    user haproxy
    group haproxy
    daemon
    stats socket /var/run/haproxy/admin.sock mode 660 level admin expose-fd listeners

When the global maxconn ceiling is reached, HAProxy stops calling

accept()

on its listening sockets. New TCP connections accumulate in the kernel's accept backlog until an existing connection closes and HAProxy resumes accepting. If the backlog also fills, the kernel silently drops incoming SYN packets.

ulimit-n

must be large enough to support all connections simultaneously. Each connection requires at least two file descriptors (one for the client socket, one for the server socket), plus additional descriptors for the stats socket, logging pipes, and control channels. A safe formula is

(maxconn * 2) + 64

. Verify the actual running limit with:

cat /proc/$(pidof haproxy)/limits | grep "open files"

If HAProxy is managed by systemd, set

LimitNOFILE

in the unit override file rather than relying on

ulimit-n

in haproxy.cfg — systemd applies its own resource limits independently:

# /etc/systemd/system/haproxy.service.d/limits.conf
[Service]
LimitNOFILE=131072

Frontend maxconn and Kernel Backlog

The frontend maxconn directive limits concurrent established connections from clients to the HAProxy listener:

frontend http_in
    bind 10.10.1.10:80 backlog 16384
    maxconn 20000
    default_backend web_servers

Once the frontend maxconn limit is reached, HAProxy stops accepting new connections. Incoming SYNs queue in the kernel's TCP accept backlog up to the value specified on the

bind

directive. Beyond that, the kernel drops SYNs or resets connections depending on socket settings.

On Linux, the kernel enforces its own cap on the backlog value. Tune the following sysctls to allow large backlog values to take effect:

sysctl -w net.core.somaxconn=32768
sysctl -w net.ipv4.tcp_max_syn_backlog=32768

Setting a large backlog (16384 or higher) gives HAProxy a buffer during traffic bursts. During a spike, connections accumulate in the kernel queue rather than being dropped outright while HAProxy drains its active connection count.

Per-Server maxconn and Queue Behavior

The most operationally important connection limit is per-server maxconn. When a server hits its limit, HAProxy queues additional requests internally rather than forwarding them:

backend web_servers
    balance leastconn
    timeout queue 30s
    server web01 10.10.1.21:8080 maxconn 200 check inter 3s fall 2 rise 3
    server web02 10.10.1.22:8080 maxconn 200 check inter 3s fall 2 rise 3
    server web03 10.10.1.23:8080 maxconn 200 check inter 3s fall 2 rise 3

When

web01

holds 200 active connections, the 201st request enters a server-level queue inside HAProxy. HAProxy dispatches it to

web01

as soon as a slot opens. If

option redispatch

is enabled and the queue timeout expires before a slot opens, HAProxy may route the request to another server instead of returning a 503.

Backend-level maxconn without a per-server value caps aggregate concurrency across all servers combined:

backend api_servers
    maxconn 600
    server api01 10.10.1.31:9000 check
    server api02 10.10.1.32:9000 check
    server api03 10.10.1.33:9000 check

Here, no single server is individually capped but the backend will not forward more than 600 connections total regardless of how many servers exist.

timeout queue and Queue Depth Monitoring

timeout queue

defines how long a connection waits in the per-server queue before HAProxy abandons it with a 503. This is a critical tuning knob for application SLA compliance:

backend app_servers
    timeout queue 20s
    server app01 10.10.1.41:8080 maxconn 150 check
    server app02 10.10.1.42:8080 maxconn 150 check

For interactive HTTP APIs, 5–15 seconds is typically the upper bound of user tolerance. For asynchronous background job processors, 60–120 seconds may be acceptable. Never leave

timeout queue

unset — it defaults to inheriting from

timeout connect

, which is usually far too short and can cause unexpected 503 bursts during transient load spikes.

Monitor live queue depth via the Runtime API socket:

echo "show stat" | socat stdio /var/run/haproxy/admin.sock | \
  cut -d',' -f1,2,3,4,6 | column -t -s','

The key CSV fields are

qcur

(current queue depth) and

qmax

(maximum queue depth observed since last reset). Any sustained non-zero

qcur

indicates backend server saturation and should trigger scaling or alerting.

option redispatch and Retries

option redispatch

allows HAProxy to re-route a connection to a different server if the originally selected server's queue times out or if connection attempts fail after exhausting retries:

backend web_servers
    option redispatch
    retries 3
    timeout queue 20s
    server web01 10.10.1.21:8080 maxconn 150 check
    server web02 10.10.1.22:8080 maxconn 150 check
    server web03 10.10.1.23:8080 maxconn 150 check

retries

controls how many times HAProxy retries a failed TCP connection attempt to a specific server before treating the attempt as failed. With

option redispatch

, after exhausting retries, HAProxy selects a different server from the pool rather than returning an error to the client.

Exercise caution with

option redispatch

for non-idempotent POST or PUT requests. If the initial server partially processed a request before the connection failed, redispatching to another server may cause duplicate processing. For stateless read-heavy APIs this is generally safe.

Timeout Directives: Full Reference

HAProxy maintains separate timeout clocks for each phase of a connection lifecycle. Misconfiguring any of these is a frequent source of production incidents involving half-open connections, zombie sessions, and file descriptor exhaustion.

defaults
    timeout connect      5s
    timeout client      30s
    timeout server      60s
    timeout queue       30s
    timeout tunnel      1h
    timeout http-request 10s
    timeout http-keep-alive 5s

timeout connect: Time HAProxy allows to establish a TCP connection to a backend server. Set to 3–5 seconds for LAN backends. Fast failure here enables quick failover when a server is unreachable.
timeout client: Inactivity timeout on the client-facing socket. If the client sends no data for this duration, HAProxy closes the connection. For long-polling or streaming, increase this value or override per-frontend.
timeout server: Inactivity timeout on the backend server socket. If the server sends no data for this duration, HAProxy terminates the connection. Should be set longer than the slowest expected application response.
timeout queue: Maximum wait time for a connection queued behind a full server maxconn. Expiry returns a 503 to the client.
timeout tunnel: Replaces client and server timeouts once a tunnel is established (WebSocket, HTTP CONNECT). Must always be set explicitly — without it, tunneled connections can persist indefinitely, exhausting connection slots.
timeout http-request: Maximum time to receive a complete HTTP request header from the client. Protects against slowloris-style attacks that open connections and trickle headers slowly. Values of 5–15 seconds are typical.
timeout http-keep-alive: Idle time between HTTP requests on a persistent keep-alive connection. Short values (2–5 seconds) prevent server-side socket accumulation under high concurrency by recycling idle keep-alive connections quickly.

Slow-Start: Graduated Traffic Ramp-Up for Recovering Servers

When a server transitions from DOWN to UP after a health check failure, the

slowstart

parameter prevents it from immediately receiving its full configured weight of traffic. Instead, HAProxy linearly ramps the server's effective weight from zero to its configured value over the specified time window:

backend app_servers
    balance leastconn
    server app01 10.10.1.41:8080 maxconn 200 check inter 3s fall 2 rise 3 slowstart 60s weight 100
    server app02 10.10.1.42:8080 maxconn 200 check inter 3s fall 2 rise 3 slowstart 60s weight 100
    server app03 10.10.1.43:8080 maxconn 200 check inter 3s fall 2 rise 3 slowstart 60s weight 100

Over the 60-second slow-start window, a server with

weight 100

will start at effective weight 0 and reach weight 100 gradually. This prevents a freshly restarted application instance — whose JVM is not yet warmed, whose database connection pool is not full, or whose in-memory cache is cold — from being immediately overwhelmed with the same traffic volume it was handling before it failed.

slowstart

only activates on UP transitions following a DOWN state. It does not apply when HAProxy first starts if the server is already healthy at startup. Inspect current effective weights with the Runtime API:

echo "show servers state app_servers" | \
  socat stdio /var/run/haproxy/admin.sock

The

cur_weight

column shows the interpolated weight in real time as the slow-start ramp progresses.

Graceful Server Draining for Zero-Downtime Deployments

Before taking a backend server offline for a deployment or maintenance window, set it to DRAIN state via the Runtime API. In drain mode, HAProxy stops routing new requests to the server but allows all active sessions to complete naturally:

echo "set server app_servers/app01 state drain" | \
  socat stdio /var/run/haproxy/admin.sock

Poll the active session count until it reaches zero:

watch -n 2 'echo "show stat" | socat stdio /var/run/haproxy/admin.sock | \
  awk -F"," "/app_servers.*app01/ {print \"scur="\$5\" qcur="\$3}"'

Once both

scur

(active sessions) and

qcur

(queued connections) are zero, the server has fully drained and can be safely stopped or redeployed. After the operation, restore it to service:

echo "set server app_servers/app01 state ready" | \
  socat stdio /var/run/haproxy/admin.sock

slowstart

is configured, the server will ramp up gradually after returning to ready state — a natural pairing with drain-based deploys.

Weight Manipulation for Canary Deployments

HAProxy supports dynamic per-server weight changes at runtime without configuration reloads. This enables traffic splitting for canary or blue-green deployments:

# Bring up canary server with 10% of traffic
echo "set server app_servers/app04 weight 10" | \
  socat stdio /var/run/haproxy/admin.sock

# Verify effective weights
echo "show servers state app_servers" | \
  socat stdio /var/run/haproxy/admin.sock

With three servers at weight 100 and one canary at weight 10, approximately 1 in 31 requests routes to the canary. Gradually increment the canary's weight while monitoring error rates and latency. If the canary is stable, promote it to full weight and drain the old servers.

Complete Production Configuration Reference

The following configuration combines all the controls discussed in this article into a production-ready HAProxy setup for a web application backend:

global
    log 127.0.0.1 local0 info
    maxconn 60000
    ulimit-n 122880
    user haproxy
    group haproxy
    daemon
    stats socket /var/run/haproxy/admin.sock mode 660 level admin expose-fd listeners

defaults
    log     global
    mode    http
    option  httplog
    option  dontlognull
    option  redispatch
    retries 3
    timeout connect      5s
    timeout client      30s
    timeout server      60s
    timeout queue       30s
    timeout http-request 10s
    timeout http-keep-alive 5s

frontend https_in
    bind 10.10.1.10:443 ssl crt /etc/haproxy/certs/solvethenetwork.com.pem \
         alpn h2,http/1.1 backlog 16384
    maxconn 30000
    default_backend app_servers

backend app_servers
    balance leastconn
    option  httpchk GET /health HTTP/1.1\r\nHost:\ solvethenetwork.com
    http-check expect status 200
    timeout queue 30s

    server app01 10.10.1.41:8080 maxconn 250 check inter 3s fall 2 rise 3 slowstart 60s weight 100
    server app02 10.10.1.42:8080 maxconn 250 check inter 3s fall 2 rise 3 slowstart 60s weight 100
    server app03 10.10.1.43:8080 maxconn 250 check inter 3s fall 2 rise 3 slowstart 60s weight 100
    server app04 10.10.1.44:8080 maxconn 250 check inter 3s fall 2 rise 3 slowstart 60s weight 100

This configuration accepts up to 60,000 simultaneous connections globally, caps each application server at 250 concurrent requests before queuing, queues excess requests for up to 30 seconds before returning a 503, ramps recovered servers up over 60 seconds, and retries failed connections on alternate servers before giving up.

Key Metrics to Monitor in Production

Export the following HAProxy statistics metrics to your monitoring stack (Prometheus, Datadog, or similar) and alert on sustained anomalies:

qcur > 0 on any server: active queue pressure; consider scaling horizontally
scur approaching server maxconn: individual server saturation imminent
dreq increasing: frontend maxconn ceiling being hit; connections dropped by HAProxy
eresp increasing: backend returning more errors; correlate with application logs
wredis increasing: connections being redispatched after queue timeout; indicates sustained backend overload
conn_rate spiking: connection rate spike may indicate a traffic surge or a connection leak from clients

Frequently Asked Questions

What happens when HAProxy reaches its global maxconn limit?

When the global maxconn ceiling is hit, HAProxy stops calling accept() on its listening sockets. New TCP connections pile up in the kernel's accept backlog (controlled by the backlog parameter on each bind directive). Once the kernel backlog also fills, incoming SYN packets are silently dropped by the kernel. No error is sent to connecting clients — their connections simply time out. The HAProxy stats field dreq (denied requests) increments to indicate this condition.

What is the difference between frontend maxconn and server maxconn?

Frontend maxconn limits how many client connections HAProxy holds open simultaneously on a given listener. When this limit is reached, HAProxy stops accepting connections from the kernel queue. Server maxconn limits how many concurrent connections HAProxy forwards to a specific backend server. Connections exceeding server maxconn are queued inside HAProxy and dispatched when a slot opens. The two limits operate independently and can both be active simultaneously.

How does timeout queue differ from timeout server?

timeout server governs inactivity on an established connection between HAProxy and a backend server — if the server stops sending data for that duration, HAProxy terminates the connection. timeout queue governs how long a connection waits in the per-server queue inside HAProxy before a forwarding slot becomes available. If the queue timeout expires before the server accepts the connection, HAProxy returns a 503 to the client without the connection ever reaching the server.

When should I use option redispatch?

Enable option redispatch when your backends are stateless or when requests are idempotent (GET, HEAD, idempotent PUT). It allows HAProxy to reroute queued requests to alternative servers when the preferred server's queue times out or when connection attempts fail after exhausting retries. Avoid enabling it blindly for POST or stateful workloads, since a failed partial request may be replayed on a different server, potentially causing duplicate side effects.

How does slow-start work and when does it activate?

When a server with slowstart configured transitions from DOWN to UP (after passing the rise threshold of health checks), HAProxy begins linearly interpolating its effective weight from 0 to its configured weight value over the slow-start duration. This prevents a freshly recovered server from receiving full traffic load before it is ready — for example, before a JVM finishes warming up or before database connection pools are fully established. Slow-start does not activate on the initial HAProxy startup if the server is already healthy.

How do I drain a server without dropping active connections?

Use the Runtime API to set the server to drain state: echo "set server backend/server state drain" | socat stdio /var/run/haproxy/admin.sock. In drain mode, HAProxy stops routing new sessions to that server but keeps all existing sessions alive until they complete naturally. Monitor scur (current sessions) via show stat until it reaches zero, then safely take the server offline. Restore it with state ready when ready to accept traffic again.

How do I correctly set ulimit-n for HAProxy?

Use the formula (maxconn * 2) + 64 as a minimum. For a maxconn of 50000, set ulimit-n to at least 100064. If HAProxy runs under systemd, the ulimit-n directive in haproxy.cfg may be overridden by the systemd service unit. Add LimitNOFILE to a systemd override file at /etc/systemd/system/haproxy.service.d/limits.conf and run systemctl daemon-reload to apply it. Verify the live limit with cat /proc/$(pidof haproxy)/limits.

What is timeout tunnel and when do I need it?

timeout tunnel overrides timeout client and timeout server once a connection has been upgraded to a tunnel — this includes WebSocket connections and HTTP CONNECT proxied connections. Without it, the default behavior applies client and server inactivity timeouts to tunneled connections, which will disconnect WebSocket clients that are idle but legitimately connected. Always set timeout tunnel explicitly (commonly 1h to 24h for WebSocket workloads) to avoid unexpected disconnects on persistent tunnels.

What is timeout http-request used for?

timeout http-request limits how long HAProxy waits for a complete HTTP request header from a client after the TCP connection is established. It protects against slowloris-style attacks where an attacker opens many connections and slowly trickles request headers, never completing them, thereby exhausting HAProxy connection slots. Values between 5 and 15 seconds are typical for public-facing HTTP frontends. For internal service-to-service traffic, the value can be relaxed.

Can I change server weights at runtime without a configuration reload?

Yes. Use the Runtime API socket: echo "set server backend_name/server_name weight 50" | socat stdio /var/run/haproxy/admin.sock. Weight changes take effect immediately for new connection assignments and persist until HAProxy is reloaded or the weight is changed again. This enables gradual canary deployments by starting a new server at low weight and incrementing it as confidence builds, without any configuration file changes or service disruption.

HAProxy Connection Limits, Timeouts, and Queue Management: maxconn, Slow-Start, and Graceful Draining

Overview

The HAProxy Connection Limit Hierarchy

Global maxconn and File Descriptor Tuning

Frontend maxconn and Kernel Backlog

Per-Server maxconn and Queue Behavior

timeout queue and Queue Depth Monitoring

option redispatch and Retries

Timeout Directives: Full Reference

Slow-Start: Graduated Traffic Ramp-Up for Recovering Servers

Graceful Server Draining for Zero-Downtime Deployments

Weight Manipulation for Canary Deployments

Complete Production Configuration Reference

Key Metrics to Monitor in Production

Related Articles

Frequently Asked Questions

What happens when HAProxy reaches its global maxconn limit?

What is the difference between frontend maxconn and server maxconn?

How does timeout queue differ from timeout server?

When should I use option redispatch?

How does slow-start work and when does it activate?

How do I drain a server without dropping active connections?

How do I correctly set ulimit-n for HAProxy?

What is timeout tunnel and when do I need it?

What is timeout http-request used for?

Can I change server weights at runtime without a configuration reload?

Related Articles