InfraRunBook
    Back to articles

    HAProxy Connection Limits, Timeouts, and Queue Management: maxconn, Slow-Start, and Graceful Draining

    HAProxy
    Published: Mar 25, 2026
    Updated: Mar 25, 2026

    A production-focused deep dive into HAProxy connection limit hierarchy, timeout directives, queue behavior, slow-start ramp-up, and graceful server draining for zero-downtime deployments.

    HAProxy Connection Limits, Timeouts, and Queue Management: maxconn, Slow-Start, and Graceful Draining

    Overview

    Every HAProxy deployment eventually faces the same question: what happens when backend servers get overwhelmed? Misconfigured connection limits let traffic pile up silently until sockets are exhausted. Poorly tuned timeouts leave connections open far longer than necessary, consuming file descriptors and memory. A server brought back online abruptly receives full production load immediately, often crashing again before caches warm or connection pools stabilize.

    HAProxy provides a layered system of connection controls — global maxconn, frontend maxconn, backend maxconn, per-server maxconn, queue depth, and timeout directives — that together determine exactly how connections are accepted, queued, forwarded, and cleaned up. This article covers production tuning at every layer of that hierarchy, including queue behavior under saturation, slow-start ramp-up for recovering servers, and graceful draining strategies for zero-downtime deploys.


    The HAProxy Connection Limit Hierarchy

    HAProxy enforces connection limits at four distinct levels:

    1. Global maxconn — absolute ceiling on simultaneous connections across the entire HAProxy process
    2. Frontend maxconn — per-listener cap controlling how many connections a single bind point accepts from clients
    3. Backend maxconn — aggregate cap on simultaneous connections forwarded to all servers in a backend combined
    4. Server maxconn — per-server concurrency limit; connections exceeding this value are queued at HAProxy

    These limits are independent and additive. A frontend may accept 30,000 connections while each individual backend server is capped at 250 concurrent requests. Excess requests wait in per-server queues inside HAProxy, not back at the kernel TCP stack. Understanding this separation is critical for diagnosing saturation.


    Global maxconn and File Descriptor Tuning

    The global section sets the absolute process-level ceiling:

    global
        maxconn 50000
        ulimit-n 102400
        user haproxy
        group haproxy
        daemon
        stats socket /var/run/haproxy/admin.sock mode 660 level admin expose-fd listeners

    When the global maxconn ceiling is reached, HAProxy stops calling

    accept()
    on its listening sockets. New TCP connections accumulate in the kernel's accept backlog until an existing connection closes and HAProxy resumes accepting. If the backlog also fills, the kernel silently drops incoming SYN packets.

    ulimit-n
    must be large enough to support all connections simultaneously. Each connection requires at least two file descriptors (one for the client socket, one for the server socket), plus additional descriptors for the stats socket, logging pipes, and control channels. A safe formula is
    (maxconn * 2) + 64
    . Verify the actual running limit with:

    cat /proc/$(pidof haproxy)/limits | grep "open files"

    If HAProxy is managed by systemd, set

    LimitNOFILE
    in the unit override file rather than relying on
    ulimit-n
    in haproxy.cfg — systemd applies its own resource limits independently:

    # /etc/systemd/system/haproxy.service.d/limits.conf
    [Service]
    LimitNOFILE=131072

    Frontend maxconn and Kernel Backlog

    The frontend maxconn directive limits concurrent established connections from clients to the HAProxy listener:

    frontend http_in
        bind 10.10.1.10:80 backlog 16384
        maxconn 20000
        default_backend web_servers

    Once the frontend maxconn limit is reached, HAProxy stops accepting new connections. Incoming SYNs queue in the kernel's TCP accept backlog up to the value specified on the

    bind
    directive. Beyond that, the kernel drops SYNs or resets connections depending on socket settings.

    On Linux, the kernel enforces its own cap on the backlog value. Tune the following sysctls to allow large backlog values to take effect:

    sysctl -w net.core.somaxconn=32768
    sysctl -w net.ipv4.tcp_max_syn_backlog=32768

    Setting a large backlog (16384 or higher) gives HAProxy a buffer during traffic bursts. During a spike, connections accumulate in the kernel queue rather than being dropped outright while HAProxy drains its active connection count.


    Per-Server maxconn and Queue Behavior

    The most operationally important connection limit is per-server maxconn. When a server hits its limit, HAProxy queues additional requests internally rather than forwarding them:

    backend web_servers
        balance leastconn
        timeout queue 30s
        server web01 10.10.1.21:8080 maxconn 200 check inter 3s fall 2 rise 3
        server web02 10.10.1.22:8080 maxconn 200 check inter 3s fall 2 rise 3
        server web03 10.10.1.23:8080 maxconn 200 check inter 3s fall 2 rise 3

    When

    web01
    holds 200 active connections, the 201st request enters a server-level queue inside HAProxy. HAProxy dispatches it to
    web01
    as soon as a slot opens. If
    option redispatch
    is enabled and the queue timeout expires before a slot opens, HAProxy may route the request to another server instead of returning a 503.

    Backend-level maxconn without a per-server value caps aggregate concurrency across all servers combined:

    backend api_servers
        maxconn 600
        server api01 10.10.1.31:9000 check
        server api02 10.10.1.32:9000 check
        server api03 10.10.1.33:9000 check

    Here, no single server is individually capped but the backend will not forward more than 600 connections total regardless of how many servers exist.


    timeout queue and Queue Depth Monitoring

    timeout queue
    defines how long a connection waits in the per-server queue before HAProxy abandons it with a 503. This is a critical tuning knob for application SLA compliance:

    backend app_servers
        timeout queue 20s
        server app01 10.10.1.41:8080 maxconn 150 check
        server app02 10.10.1.42:8080 maxconn 150 check

    For interactive HTTP APIs, 5–15 seconds is typically the upper bound of user tolerance. For asynchronous background job processors, 60–120 seconds may be acceptable. Never leave

    timeout queue
    unset — it defaults to inheriting from
    timeout connect
    , which is usually far too short and can cause unexpected 503 bursts during transient load spikes.

    Monitor live queue depth via the Runtime API socket:

    echo "show stat" | socat stdio /var/run/haproxy/admin.sock | \
      cut -d',' -f1,2,3,4,6 | column -t -s','

    The key CSV fields are

    qcur
    (current queue depth) and
    qmax
    (maximum queue depth observed since last reset). Any sustained non-zero
    qcur
    indicates backend server saturation and should trigger scaling or alerting.


    option redispatch and Retries

    option redispatch
    allows HAProxy to re-route a connection to a different server if the originally selected server's queue times out or if connection attempts fail after exhausting retries:

    backend web_servers
        option redispatch
        retries 3
        timeout queue 20s
        server web01 10.10.1.21:8080 maxconn 150 check
        server web02 10.10.1.22:8080 maxconn 150 check
        server web03 10.10.1.23:8080 maxconn 150 check

    retries
    controls how many times HAProxy retries a failed TCP connection attempt to a specific server before treating the attempt as failed. With
    option redispatch
    , after exhausting retries, HAProxy selects a different server from the pool rather than returning an error to the client.

    Exercise caution with

    option redispatch
    for non-idempotent POST or PUT requests. If the initial server partially processed a request before the connection failed, redispatching to another server may cause duplicate processing. For stateless read-heavy APIs this is generally safe.


    Timeout Directives: Full Reference

    HAProxy maintains separate timeout clocks for each phase of a connection lifecycle. Misconfiguring any of these is a frequent source of production incidents involving half-open connections, zombie sessions, and file descriptor exhaustion.

    defaults
        timeout connect      5s
        timeout client      30s
        timeout server      60s
        timeout queue       30s
        timeout tunnel      1h
        timeout http-request 10s
        timeout http-keep-alive 5s
    • timeout connect: Time HAProxy allows to establish a TCP connection to a backend server. Set to 3–5 seconds for LAN backends. Fast failure here enables quick failover when a server is unreachable.
    • timeout client: Inactivity timeout on the client-facing socket. If the client sends no data for this duration, HAProxy closes the connection. For long-polling or streaming, increase this value or override per-frontend.
    • timeout server: Inactivity timeout on the backend server socket. If the server sends no data for this duration, HAProxy terminates the connection. Should be set longer than the slowest expected application response.
    • timeout queue: Maximum wait time for a connection queued behind a full server maxconn. Expiry returns a 503 to the client.
    • timeout tunnel: Replaces client and server timeouts once a tunnel is established (WebSocket, HTTP CONNECT). Must always be set explicitly — without it, tunneled connections can persist indefinitely, exhausting connection slots.
    • timeout http-request: Maximum time to receive a complete HTTP request header from the client. Protects against slowloris-style attacks that open connections and trickle headers slowly. Values of 5–15 seconds are typical.
    • timeout http-keep-alive: Idle time between HTTP requests on a persistent keep-alive connection. Short values (2–5 seconds) prevent server-side socket accumulation under high concurrency by recycling idle keep-alive connections quickly.

    Slow-Start: Graduated Traffic Ramp-Up for Recovering Servers

    When a server transitions from DOWN to UP after a health check failure, the

    slowstart
    parameter prevents it from immediately receiving its full configured weight of traffic. Instead, HAProxy linearly ramps the server's effective weight from zero to its configured value over the specified time window:

    backend app_servers
        balance leastconn
        server app01 10.10.1.41:8080 maxconn 200 check inter 3s fall 2 rise 3 slowstart 60s weight 100
        server app02 10.10.1.42:8080 maxconn 200 check inter 3s fall 2 rise 3 slowstart 60s weight 100
        server app03 10.10.1.43:8080 maxconn 200 check inter 3s fall 2 rise 3 slowstart 60s weight 100

    Over the 60-second slow-start window, a server with

    weight 100
    will start at effective weight 0 and reach weight 100 gradually. This prevents a freshly restarted application instance — whose JVM is not yet warmed, whose database connection pool is not full, or whose in-memory cache is cold — from being immediately overwhelmed with the same traffic volume it was handling before it failed.

    slowstart
    only activates on UP transitions following a DOWN state. It does not apply when HAProxy first starts if the server is already healthy at startup. Inspect current effective weights with the Runtime API:

    echo "show servers state app_servers" | \
      socat stdio /var/run/haproxy/admin.sock

    The

    cur_weight
    column shows the interpolated weight in real time as the slow-start ramp progresses.


    Graceful Server Draining for Zero-Downtime Deployments

    Before taking a backend server offline for a deployment or maintenance window, set it to DRAIN state via the Runtime API. In drain mode, HAProxy stops routing new requests to the server but allows all active sessions to complete naturally:

    echo "set server app_servers/app01 state drain" | \
      socat stdio /var/run/haproxy/admin.sock

    Poll the active session count until it reaches zero:

    watch -n 2 'echo "show stat" | socat stdio /var/run/haproxy/admin.sock | \
      awk -F"," "/app_servers.*app01/ {print \"scur="\$5\" qcur="\$3}"'

    Once both

    scur
    (active sessions) and
    qcur
    (queued connections) are zero, the server has fully drained and can be safely stopped or redeployed. After the operation, restore it to service:

    echo "set server app_servers/app01 state ready" | \
      socat stdio /var/run/haproxy/admin.sock

    If

    slowstart
    is configured, the server will ramp up gradually after returning to ready state — a natural pairing with drain-based deploys.


    Weight Manipulation for Canary Deployments

    HAProxy supports dynamic per-server weight changes at runtime without configuration reloads. This enables traffic splitting for canary or blue-green deployments:

    # Bring up canary server with 10% of traffic
    echo "set server app_servers/app04 weight 10" | \
      socat stdio /var/run/haproxy/admin.sock
    
    # Verify effective weights
    echo "show servers state app_servers" | \
      socat stdio /var/run/haproxy/admin.sock

    With three servers at weight 100 and one canary at weight 10, approximately 1 in 31 requests routes to the canary. Gradually increment the canary's weight while monitoring error rates and latency. If the canary is stable, promote it to full weight and drain the old servers.


    Complete Production Configuration Reference

    The following configuration combines all the controls discussed in this article into a production-ready HAProxy setup for a web application backend:

    global
        log 127.0.0.1 local0 info
        maxconn 60000
        ulimit-n 122880
        user haproxy
        group haproxy
        daemon
        stats socket /var/run/haproxy/admin.sock mode 660 level admin expose-fd listeners
    
    defaults
        log     global
        mode    http
        option  httplog
        option  dontlognull
        option  redispatch
        retries 3
        timeout connect      5s
        timeout client      30s
        timeout server      60s
        timeout queue       30s
        timeout http-request 10s
        timeout http-keep-alive 5s
    
    frontend https_in
        bind 10.10.1.10:443 ssl crt /etc/haproxy/certs/solvethenetwork.com.pem \
             alpn h2,http/1.1 backlog 16384
        maxconn 30000
        default_backend app_servers
    
    backend app_servers
        balance leastconn
        option  httpchk GET /health HTTP/1.1\r\nHost:\ solvethenetwork.com
        http-check expect status 200
        timeout queue 30s
    
        server app01 10.10.1.41:8080 maxconn 250 check inter 3s fall 2 rise 3 slowstart 60s weight 100
        server app02 10.10.1.42:8080 maxconn 250 check inter 3s fall 2 rise 3 slowstart 60s weight 100
        server app03 10.10.1.43:8080 maxconn 250 check inter 3s fall 2 rise 3 slowstart 60s weight 100
        server app04 10.10.1.44:8080 maxconn 250 check inter 3s fall 2 rise 3 slowstart 60s weight 100

    This configuration accepts up to 60,000 simultaneous connections globally, caps each application server at 250 concurrent requests before queuing, queues excess requests for up to 30 seconds before returning a 503, ramps recovered servers up over 60 seconds, and retries failed connections on alternate servers before giving up.


    Key Metrics to Monitor in Production

    Export the following HAProxy statistics metrics to your monitoring stack (Prometheus, Datadog, or similar) and alert on sustained anomalies:

    • qcur > 0 on any server: active queue pressure; consider scaling horizontally
    • scur approaching server maxconn: individual server saturation imminent
    • dreq increasing: frontend maxconn ceiling being hit; connections dropped by HAProxy
    • eresp increasing: backend returning more errors; correlate with application logs
    • wredis increasing: connections being redispatched after queue timeout; indicates sustained backend overload
    • conn_rate spiking: connection rate spike may indicate a traffic surge or a connection leak from clients

    Frequently Asked Questions

    What happens when HAProxy reaches its global maxconn limit?

    When the global maxconn ceiling is hit, HAProxy stops calling accept() on its listening sockets. New TCP connections pile up in the kernel's accept backlog (controlled by the backlog parameter on each bind directive). Once the kernel backlog also fills, incoming SYN packets are silently dropped by the kernel. No error is sent to connecting clients — their connections simply time out. The HAProxy stats field dreq (denied requests) increments to indicate this condition.

    What is the difference between frontend maxconn and server maxconn?

    Frontend maxconn limits how many client connections HAProxy holds open simultaneously on a given listener. When this limit is reached, HAProxy stops accepting connections from the kernel queue. Server maxconn limits how many concurrent connections HAProxy forwards to a specific backend server. Connections exceeding server maxconn are queued inside HAProxy and dispatched when a slot opens. The two limits operate independently and can both be active simultaneously.

    How does timeout queue differ from timeout server?

    timeout server governs inactivity on an established connection between HAProxy and a backend server — if the server stops sending data for that duration, HAProxy terminates the connection. timeout queue governs how long a connection waits in the per-server queue inside HAProxy before a forwarding slot becomes available. If the queue timeout expires before the server accepts the connection, HAProxy returns a 503 to the client without the connection ever reaching the server.

    When should I use option redispatch?

    Enable option redispatch when your backends are stateless or when requests are idempotent (GET, HEAD, idempotent PUT). It allows HAProxy to reroute queued requests to alternative servers when the preferred server's queue times out or when connection attempts fail after exhausting retries. Avoid enabling it blindly for POST or stateful workloads, since a failed partial request may be replayed on a different server, potentially causing duplicate side effects.

    How does slow-start work and when does it activate?

    When a server with slowstart configured transitions from DOWN to UP (after passing the rise threshold of health checks), HAProxy begins linearly interpolating its effective weight from 0 to its configured weight value over the slow-start duration. This prevents a freshly recovered server from receiving full traffic load before it is ready — for example, before a JVM finishes warming up or before database connection pools are fully established. Slow-start does not activate on the initial HAProxy startup if the server is already healthy.

    How do I drain a server without dropping active connections?

    Use the Runtime API to set the server to drain state: echo "set server backend/server state drain" | socat stdio /var/run/haproxy/admin.sock. In drain mode, HAProxy stops routing new sessions to that server but keeps all existing sessions alive until they complete naturally. Monitor scur (current sessions) via show stat until it reaches zero, then safely take the server offline. Restore it with state ready when ready to accept traffic again.

    How do I correctly set ulimit-n for HAProxy?

    Use the formula (maxconn * 2) + 64 as a minimum. For a maxconn of 50000, set ulimit-n to at least 100064. If HAProxy runs under systemd, the ulimit-n directive in haproxy.cfg may be overridden by the systemd service unit. Add LimitNOFILE to a systemd override file at /etc/systemd/system/haproxy.service.d/limits.conf and run systemctl daemon-reload to apply it. Verify the live limit with cat /proc/$(pidof haproxy)/limits.

    What is timeout tunnel and when do I need it?

    timeout tunnel overrides timeout client and timeout server once a connection has been upgraded to a tunnel — this includes WebSocket connections and HTTP CONNECT proxied connections. Without it, the default behavior applies client and server inactivity timeouts to tunneled connections, which will disconnect WebSocket clients that are idle but legitimately connected. Always set timeout tunnel explicitly (commonly 1h to 24h for WebSocket workloads) to avoid unexpected disconnects on persistent tunnels.

    What is timeout http-request used for?

    timeout http-request limits how long HAProxy waits for a complete HTTP request header from a client after the TCP connection is established. It protects against slowloris-style attacks where an attacker opens many connections and slowly trickles request headers, never completing them, thereby exhausting HAProxy connection slots. Values between 5 and 15 seconds are typical for public-facing HTTP frontends. For internal service-to-service traffic, the value can be relaxed.

    Can I change server weights at runtime without a configuration reload?

    Yes. Use the Runtime API socket: echo "set server backend_name/server_name weight 50" | socat stdio /var/run/haproxy/admin.sock. Weight changes take effect immediately for new connection assignments and persist until HAProxy is reloaded or the weight is changed again. This enables gradual canary deployments by starting a new server at low weight and incrementing it as confidence builds, without any configuration file changes or service disruption.

    Related Articles