InfraRunBook
    Back to articles

    HAProxy Backend Server Marked Down

    HAProxy
    Published: Apr 4, 2026
    Updated: Apr 4, 2026

    Learn why HAProxy marks backend servers as DOWN and how to diagnose and fix the most common causes, including misconfigured health checks, port mismatches, SSL errors, timeouts, and backend crashes.

    HAProxy Backend Server Marked Down

    Symptoms

    When HAProxy marks one or more backend servers as DOWN, the effects are immediate and often visible to end users. Understanding what to look for is the first step toward a fast resolution. Common symptoms include:

    • Clients receive HTTP 503 Service Unavailable responses, particularly when all backend pool members are DOWN
    • The HAProxy statistics page shows affected servers highlighted in red with a DOWN status label
    • Monitoring and alerting systems fire notifications for backend availability or upstream connection failures
    • HAProxy logs in
      /var/log/haproxy.log
      contain entries referencing Layer4 or Layer7 check failures
    • Traffic concentrates onto a reduced set of healthy servers, potentially overloading them and triggering cascading failures
    • Application logs on upstream services report connection errors, ECONNREFUSED, or upstream timeout messages

    A typical log entry on sw-infrarunbook-01 when a backend is marked DOWN looks like this:

    Apr  4 09:14:22 sw-infrarunbook-01 haproxy[12345]: Server app_backend/web01 is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 1ms, 0 active and 1 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.

    On the stats page accessible at

    http://192.168.10.100:8080/haproxy?stats
    , the affected backend row appears as:

    web01   192.168.10.20:80   DOWN   0/3   L4CON   -   -   -

    The

    0/3
    column shows zero successful checks out of the last three attempts, and
    L4CON
    identifies the failure at the TCP connection layer. Different failure codes point to different root causes, which the following sections address in full.


    Root Cause 1: Health Check Misconfigured

    Why It Happens

    HAProxy uses health checks to determine whether a backend server is capable of handling traffic. When the health check configuration does not match what the backend actually serves — wrong URI path, unexpected HTTP response code, or wrong check type — HAProxy will repeatedly fail checks and eventually mark the server as DOWN, even when the backend is perfectly healthy and serving real traffic without issue.

    Common misconfiguration scenarios include:

    • Using a plain TCP check (
      check
      ) when the application expects an HTTP probe with a specific
      Host
      header
    • Sending an HTTP health check to the wrong URI path (e.g.,
      /health
      returns 404 instead of 200)
    • Expecting a specific HTTP status code such as 200 but the server legitimately returns 204 or 302
    • Health check rise and fall thresholds are too aggressive, causing a server to be marked DOWN on the first transient slowdown

    How to Identify It

    Inspect the backend stanza in your HAProxy configuration:

    grep -A 20 'backend app_backend' /etc/haproxy/haproxy.cfg
    backend app_backend
        balance roundrobin
        option httpchk GET /status
        http-check expect status 200
        server web01 192.168.10.20:80 check inter 2s fall 3 rise 2

    Now verify what the backend actually returns at that path:

    curl -v http://192.168.10.20:80/status
    * Connected to 192.168.10.20 (192.168.10.20) port 80
    > GET /status HTTP/1.1
    > Host: 192.168.10.20
    < HTTP/1.1 204 No Content
    < Connection: keep-alive

    The server returns 204 No Content but HAProxy is configured to expect 200 OK. Every health check fails, and after three consecutive failures HAProxy marks the server DOWN.

    How to Fix It

    Update the

    http-check expect
    directive to match the real response code. Using a regex range is more resilient to minor application changes:

    backend app_backend
        balance roundrobin
        option httpchk GET /status
        http-check expect rstatus (2|3)[0-9][0-9]
        server web01 192.168.10.20:80 check inter 2s fall 3 rise 2

    If the application has no dedicated health endpoint, use a HEAD request to the root path with a wide status code range:

    backend app_backend
        balance roundrobin
        option httpchk HEAD / HTTP/1.1\r\nHost:\ 192.168.10.20
        http-check expect status 200-404
        server web01 192.168.10.20:80 check inter 2s fall 3 rise 2

    Validate the configuration and reload HAProxy without dropping live connections:

    haproxy -c -f /etc/haproxy/haproxy.cfg && systemctl reload haproxy

    Tail the log to confirm the server recovers:

    tail -f /var/log/haproxy.log | grep 'web01'
    Apr  4 09:22:10 sw-infrarunbook-01 haproxy[12345]: Server app_backend/web01 is UP, reason: Layer7 check passed, code: 204, check duration: 3ms, 1 active and 0 backup servers online.

    Root Cause 2: Port Mismatch

    Why It Happens

    A port mismatch occurs when HAProxy is configured to connect to a backend server on a port that is not actively listening. This is common after application deployments where a service migrates from one port to another (for example, from port 80 to port 8080 after containerization), or when a new team member deploys a service on a non-standard port without updating the HAProxy configuration. HAProxy receives a TCP RST from the kernel or a Connection refused error and immediately fails the health check.

    How to Identify It

    The log entry is unmistakable — the check duration is near zero because the TCP connection is rejected instantly:

    grep 'web01' /var/log/haproxy.log | tail -5
    Apr  4 10:05:33 sw-infrarunbook-01 haproxy[12345]: Server app_backend/web01 is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms

    Confirm what the backend server is actually listening on:

    ssh infrarunbook-admin@192.168.10.20 "ss -tlnp | grep -E 'State|80|443|8080|8443'"
    State   Recv-Q  Send-Q  Local Address:Port  Peer Address:Port
    LISTEN  0       511     0.0.0.0:8080        0.0.0.0:*     users:(("node",pid=4421,fd=22))

    The application is now listening on port 8080 but HAProxy is configured for port 80. A quick test from sw-infrarunbook-01 confirms the mismatch:

    nc -zv 192.168.10.20 80
    nc -zv 192.168.10.20 8080
    nc: connect to 192.168.10.20 port 80 (tcp) failed: Connection refused
    Connection to 192.168.10.20 8080 port [tcp/*] succeeded!

    How to Fix It

    Update the server line in the HAProxy backend stanza to use the correct port:

    # Before
    server web01 192.168.10.20:80 check inter 2s fall 3 rise 2
    
    # After
    server web01 192.168.10.20:8080 check inter 2s fall 3 rise 2

    Validate and reload:

    haproxy -c -f /etc/haproxy/haproxy.cfg
    Configuration file is valid
    
    systemctl reload haproxy

    Root Cause 3: SSL Mismatch

    Why It Happens

    SSL/TLS mismatches between HAProxy and a backend are a subtle but frequent cause of servers being marked DOWN. Unlike a port mismatch that fails instantly at Layer 4, SSL failures manifest at Layer 6 during the TLS handshake. This can occur when:

    • HAProxy is configured with
      ssl
      on the server line but the backend does not accept TLS at all
    • The backend uses a self-signed or internally-signed certificate and HAProxy is set to
      verify required
      against the system CA bundle
    • There is a protocol version incompatibility (e.g., the backend enforces TLSv1.3 but HAProxy negotiates TLSv1.1)
    • The SNI hostname HAProxy sends does not match the certificate CN or Subject Alternative Name on the backend
    • HAProxy omits
      ssl
      on the server line but the backend only accepts HTTPS connections

    How to Identify It

    SSL failures appear in logs as Layer6 errors:

    Apr  4 11:30:12 sw-infrarunbook-01 haproxy[12345]: Server app_backend/web01 is DOWN, reason: Layer6 invalid response, info: "SSL handshake failure", check duration: 12ms

    For certificate verification failures specifically:

    Apr  4 11:30:14 sw-infrarunbook-01 haproxy[12345]: Server app_backend/web01 is DOWN, reason: Layer6 invalid response, info: "SSL certificate verification failure", check duration: 15ms

    Test the TLS connection manually from sw-infrarunbook-01 using the same parameters HAProxy would use:

    openssl s_client -connect 192.168.10.20:443 -servername api.solvethenetwork.com
    CONNECTED(00000003)
    depth=0 CN = api.solvethenetwork.com
    verify error:num=18:self signed certificate
    verify return:1
    ---
    Certificate chain
     0 s:CN = api.solvethenetwork.com
       i:CN = api.solvethenetwork.com
    ---
    SSL handshake has read 1338 bytes and written 444 bytes

    Now review the HAProxy server line:

    grep 'server web01' /etc/haproxy/haproxy.cfg
    server web01 192.168.10.20:443 check ssl verify required ca-file /etc/ssl/certs/ca-certificates.crt inter 2s fall 3 rise 2

    The backend presents a self-signed certificate. Because

    verify required
    checks against the system CA bundle (which does not include the self-signed cert), every check fails at the SSL layer.

    How to Fix It

    For internal backends using self-signed certificates on a trusted private network, disable certificate verification. For production setups, provide the specific internal CA certificate instead:

    # Option 1: Disable verification (acceptable for internal private networks only)
    server web01 192.168.10.20:443 check ssl verify none inter 2s fall 3 rise 2
    
    # Option 2: Provide internal CA cert
    server web01 192.168.10.20:443 check ssl verify required ca-file /etc/haproxy/certs/internal-ca.crt inter 2s fall 3 rise 2

    If the backend does not accept SSL at all and the server line incorrectly includes

    ssl
    , simply remove it:

    # Before
    server web01 192.168.10.20:80 check ssl verify none inter 2s fall 3 rise 2
    
    # After
    server web01 192.168.10.20:80 check inter 2s fall 3 rise 2

    Validate and reload:

    haproxy -c -f /etc/haproxy/haproxy.cfg && systemctl reload haproxy

    Root Cause 4: Timeout Too Low

    Why It Happens

    Every HAProxy health check is governed by a timeout. If the backend does not respond within that window, HAProxy treats the check as a failure and increments the fall counter. This is especially problematic for:

    • Backends under heavy load that respond slowly to health check probes
    • Application health endpoints that perform database connectivity queries, cache warm-up, or dependency checks as part of the response
    • Virtualized or containerized backends that experience CPU steal, I/O contention, or garbage collection pauses
    • Cold-start scenarios where an application has just been (re)started and its connection pools are not yet warmed up

    How to Identify It

    The key indicator in the log is that the check duration equals the configured timeout value exactly:

    Apr  4 12:15:44 sw-infrarunbook-01 haproxy[12345]: Server app_backend/web01 is DOWN, reason: Layer4 timeout, check duration: 2000ms

    check duration: 2000ms matching your timeout configuration confirms the check hit the ceiling rather than receiving a genuine failure response. Inspect the current timeout settings:

    grep -E 'timeout|inter|fall|rise' /etc/haproxy/haproxy.cfg
        timeout connect     2s
        timeout client      30s
        timeout server      30s
        timeout check       2s

    Now measure the actual backend health endpoint response time under realistic load:

    time curl -s -o /dev/null -w "%{http_code}" http://192.168.10.20:80/health
    200
    real    0m3.421s

    The backend legitimately returns 200 but takes 3.4 seconds. The 2-second check timeout causes every probe to time out before the response arrives.

    How to Fix It

    Increase

    timeout check
    to accommodate the backend's real response time. Set it to at least 2x the 99th percentile response time of your health endpoint under load:

    defaults
        timeout connect     5s
        timeout client      30s
        timeout server      30s
        timeout check       8s

    To override only for a specific backend without changing global defaults:

    backend app_backend
        timeout check 8s
        option httpchk GET /health
        http-check expect status 200
        server web01 192.168.10.20:80 check inter 10s fall 3 rise 2

    Simultaneously, work to reduce the health endpoint's response time. If it queries a database for connectivity verification, ensure the query uses a connection pool and a short statement timeout. Ideally a health endpoint should respond in under 100ms. Reload after changes:

    systemctl reload haproxy

    Root Cause 5: Backend Crash

    Why It Happens

    Sometimes HAProxy marks a server DOWN because it genuinely is down. The application process may have exited due to an unhandled exception, a failed dependency check at startup, an out-of-memory (OOM) kill, a deployment rollout failure, or a kernel panic. In these cases, HAProxy is performing exactly as designed — protecting users from routing traffic to a dead server.

    How to Identify It

    SSH to the backend and check the service status immediately:

    ssh infrarunbook-admin@192.168.10.20 "systemctl status app-service"
    ● app-service.service - Application Service
       Loaded: loaded (/etc/systemd/system/app-service.service; enabled)
       Active: failed (Result: exit-code) since Fri 2026-04-04 12:30:00 UTC; 5min ago
      Process: 9812 ExecStart=/usr/bin/node /opt/app/server.js (code=exited, status=1/FAILURE)
     Main PID: 9812 (code=exited, status=1/FAILURE)

    Review the application logs for the crash root cause:

    ssh infrarunbook-admin@192.168.10.20 "journalctl -u app-service --since '30 minutes ago' | tail -30"
    Apr 04 12:29:58 192.168.10.20 node[9812]: Error: connect ECONNREFUSED 192.168.10.50:5432
    Apr 04 12:29:58 192.168.10.20 node[9812]: Unhandled promise rejection -- exiting
    Apr 04 12:30:00 192.168.10.20 systemd[1]: app-service.service: Main process exited, code=exited, status=1/FAILURE

    Check for OOM kills, which often leave no application-level trace:

    ssh infrarunbook-admin@192.168.10.20 "dmesg | grep -i 'oom\|killed process' | tail -10"
    [185432.123456] Out of memory: Killed process 9812 (node) total-vm:1524432kB, anon-rss:1020212kB, file-rss:4kB, shmem-rss:0kB

    How to Fix It

    The resolution depends entirely on the crash cause. For a database connection failure (ECONNREFUSED to 192.168.10.50:5432), restore connectivity to the PostgreSQL instance and restart:

    ssh infrarunbook-admin@192.168.10.20 "systemctl restart app-service"
    ● app-service.service - Application Service
       Active: active (running) since Fri 2026-04-04 12:45:00 UTC; 3s ago
     Main PID: 10234 (node)

    For OOM kills, increase the host memory, reduce the application heap limit, or add a memory limit with an automatic restart policy in the service unit:

    # /etc/systemd/system/app-service.service
    [Service]
    MemoryMax=1G
    Restart=on-failure
    RestartSec=5s

    Once the application is back up, HAProxy detects recovery on the next successful health check cycle and automatically marks the server UP:

    Apr  4 12:45:10 sw-infrarunbook-01 haproxy[12345]: Server app_backend/web01 is UP, reason: Layer7 check passed, code: 200, check duration: 4ms, 1 active and 0 backup servers online.

    Root Cause 6: Firewall Blocking Health Check Traffic

    Why It Happens

    HAProxy sends health check packets originating from the load balancer's IP address to the backend's IP and port. If a host-based firewall on the backend, an intermediate security group, or a network ACL drops packets from the HAProxy source IP, the health check fails even though the application is running normally and may even be serving real user traffic via a different path. This produces a situation where the application is healthy but HAProxy cannot determine that.

    How to Identify It

    The log entry looks identical to a genuine connection refusal. The distinguishing test is that the application works from the backend itself but not from the HAProxy host. Test from sw-infrarunbook-01:

    nc -zv 192.168.10.20 80
    nc: connect to 192.168.10.20 port 80 (tcp) failed: No route to host

    Compare with a test from the backend host itself:

    ssh infrarunbook-admin@192.168.10.20 "curl -s -o /dev/null -w '%{http_code}' http://127.0.0.1:80/health"
    200

    The discrepancy confirms a network or firewall issue rather than an application problem. Inspect iptables rules on the backend:

    ssh infrarunbook-admin@192.168.10.20 "iptables -L INPUT -n -v --line-numbers | grep -E '80|DROP|REJECT'"
    5    0     0 DROP  tcp  --  *  *  0.0.0.0/0  0.0.0.0/0  tcp dpt:80

    A blanket DROP rule on port 80 is blocking health check probes from sw-infrarunbook-01 (192.168.10.100).

    How to Fix It

    Insert an ACCEPT rule for health check traffic from the HAProxy source IP before the DROP rule:

    ssh infrarunbook-admin@192.168.10.20 "iptables -I INPUT 1 -s 192.168.10.100 -p tcp --dport 80 -j ACCEPT"

    Persist the rule across reboots:

    ssh infrarunbook-admin@192.168.10.20 "iptables-save > /etc/iptables/rules.v4"

    Root Cause 7: Backend Resource Exhaustion

    Why It Happens

    A backend server that is alive but saturated will fail to accept new TCP connections even though the process is running. This can occur when the application has exhausted its open file descriptor limit (since every socket consumes a file descriptor), the OS listen backlog queue is full, or the application's thread pool or connection pool is completely occupied with long-running requests. HAProxy sees the connection refusal or timeout and marks the server DOWN. This is self-reinforcing: marking one server DOWN pushes more load onto the remaining servers, potentially triggering further failures.

    How to Identify It

    Check the application process's file descriptor usage on the backend:

    ssh infrarunbook-admin@192.168.10.20 "cat /proc/\$(pgrep -f server.js)/limits | grep 'open files'"
    Max open files            1024                 1024                 files
    ssh infrarunbook-admin@192.168.10.20 "ls /proc/\$(pgrep -f server.js)/fd | wc -l"
    1021

    The process is at 1021 out of a maximum of 1024 open file descriptors — three away from saturation. Any new connection attempt (including health check probes) will be refused. Check system-wide socket stats too:

    ssh infrarunbook-admin@192.168.10.20 "ss -s"
    Total: 4096 (kernel 4100)
    TCP:   4090 (estab 3980, closed 110, orphaned 0, synrecv 0, timewait 98/0), ports 0
    
    Transport Total     IP        IPv6
    *         4100      -         -
    RAW       0         0         0
    UDP       4         4         0
    TCP       3980      3980      0
    INET      3984      3984      0
    FRAG      0         0         0

    How to Fix It

    Increase the open file descriptor limit for the application service unit:

    # /etc/systemd/system/app-service.service
    [Service]
    LimitNOFILE=65535
    ssh infrarunbook-admin@192.168.10.20 "systemctl daemon-reload && systemctl restart app-service"

    On the HAProxy side, use

    maxconn
    per server to cap the connection count before the backend reaches saturation, giving you a controlled queue rather than an abrupt refusal:

    server web01 192.168.10.20:80 check inter 2s fall 3 rise 2 maxconn 200

    Prevention

    Preventing unexpected backend server downtime in HAProxy requires proactive configuration discipline, robust monitoring, and codified operational procedures.

    • Use application-level health checks: Always prefer
      option httpchk
      with a dedicated health endpoint over raw TCP checks. The endpoint should verify the application, its database connections, and any critical dependencies simultaneously. It must respond in under 200ms under normal load.
    • Set appropriate timeouts based on measurement: Baseline your backend response times under realistic load and set
      timeout check
      to at least 2x the 99th percentile response time of your health endpoint. Do not rely on defaults.
    • Tune rise and fall thresholds carefully: Use
      fall 3
      to require three consecutive failures before marking a server DOWN, avoiding false positives from transient network glitches. Use
      rise 2
      to require two consecutive successes before marking a server back UP, preventing flapping.
    • Enable the admin socket: Configure the UNIX socket interface so you can enable, disable, and drain servers dynamically without a configuration reload:
      stats socket /var/run/haproxy/admin.sock mode 660 level admin expose-fd listeners
    • Ship logs to a centralized system and alert: Forward
      /var/log/haproxy.log
      to your log aggregation platform. Alert immediately on any
      is DOWN
      event so engineers investigate before all backends fail.
    • Version-control haproxy.cfg: Store
      /etc/haproxy/haproxy.cfg
      in a git repository. Run
      haproxy -c -f /etc/haproxy/haproxy.cfg
      in your CI/CD pipeline on every proposed change to catch misconfigurations before they reach production.
    • Document port and TLS configuration per service: Maintain a service registry that records each backend's expected port, protocol, and certificate authority. Update the HAProxy configuration as an explicit step in every backend deployment runbook.
    • Cap per-server connections with maxconn: Use the
      maxconn
      directive per server to apply back-pressure before resource exhaustion causes health checks to fail. A queued request is better than a refused connection.
    • Test health checks independently and regularly: Schedule periodic tests from sw-infrarunbook-01 using
      curl
      and
      openssl s_client
      to each backend, verifying the health check path is reachable and returns the expected response independent of real user traffic paths.
    • Use slowstart for freshly recovered servers: Add
      slowstart 30s
      to server definitions so that a server returning to UP state receives gradually increasing traffic rather than being immediately flooded, reducing the risk of it being immediately overwhelmed and marked DOWN again.

    Frequently Asked Questions

    Q: What does "L4CON" mean in the HAProxy stats page?

    A: L4CON stands for Layer 4 Connection problem. It means HAProxy attempted a TCP connection to the backend server and received either a connection refused (RST) or a no-route-to-host error. The server's port is either not listening or is blocked by a firewall. This is distinct from L4TOUT (timeout at Layer 4) and L7STS (Layer 7 HTTP status failure).

    Q: How do I manually re-enable a server that HAProxy has marked DOWN?

    A: Use the HAProxy admin socket. With the socket enabled, run:

    echo "set server app_backend/web01 state ready" | socat stdio /var/run/haproxy/admin.sock
    . This immediately marks the server as UP and resumes health checks. Note that if the underlying issue is not fixed, HAProxy will mark it DOWN again after the next failed check cycle.

    Q: What is the difference between fall and rise thresholds in HAProxy?

    A: The

    fall
    threshold is the number of consecutive failed health checks required before HAProxy marks a server as DOWN. The
    rise
    threshold is the number of consecutive successful checks required before a DOWN server is marked back UP. Setting
    fall 3 rise 2
    means the server must fail three times in a row to go DOWN, and must succeed twice in a row to come back UP. This prevents flapping caused by transient network issues.

    Q: Can I check which backend servers are currently DOWN from the command line without using the stats web page?

    A: Yes. Query the admin socket directly:

    echo "show servers state" | socat stdio /var/run/haproxy/admin.sock
    . This returns a table with each server's current state, check result, and timing data. You can also parse the HAProxy stats CSV via:
    echo "show stat" | socat stdio /var/run/haproxy/admin.sock | cut -d',' -f1,2,18,19
    to extract backend/server names and their status fields.

    Q: What is the default health check behavior if I do not explicitly configure one?

    A: If you add

    check
    to a server line without any
    option httpchk
    or similar directive, HAProxy performs a basic TCP connect check — it opens a TCP connection to the server's IP and port, and if the connection succeeds (TCP handshake completes), the check passes. It does not send any data or validate the HTTP response. This means the check passes even if the application inside the TCP listener has crashed, as long as something is accepting connections on that port.

    Q: Why does HAProxy mark a server DOWN even though I can ping it successfully?

    A: ICMP ping (Layer 3) and HAProxy health checks (TCP Layer 4 or HTTP Layer 7) are completely independent. A server can respond to pings while its application process is crashed, its listening port is firewalled, or its file descriptors are exhausted. HAProxy does not use ICMP at any point. A successful ping tells you only that the host's network interface is alive and routing is working — not that the application is healthy.

    Q: How do I configure HAProxy to use a backup server when the primary backend is DOWN?

    A: Add the

    backup
    keyword to the server line you want to act as a standby:
    server web01-backup 192.168.10.21:80 check inter 2s fall 3 rise 2 backup
    . HAProxy will only send traffic to backup servers when all non-backup servers in the pool are marked DOWN. The backup server participates in normal health checks but receives no traffic while primary servers are UP.

    Q: Can HAProxy drain active sessions from a server before marking it DOWN during a planned maintenance?

    A: Yes. Use the admin socket to put the server in DRAIN state:

    echo "set server app_backend/web01 state drain" | socat stdio /var/run/haproxy/admin.sock
    . In DRAIN mode, the server stops receiving new sessions but continues serving existing ones until they close naturally. Once traffic drops to zero, you can safely take the server offline. This is preferable to an abrupt DOWN state during maintenance windows.

    Q: How do I debug health checks in real time to see exactly what HAProxy is sending and receiving?

    A: Increase the HAProxy log verbosity by setting

    option log-health-checks
    in the backend stanza. This logs every individual health check result — both successes and failures — to syslog. You can also use
    tcpdump
    on sw-infrarunbook-01 to capture the raw health check traffic:
    tcpdump -i any -nn host 192.168.10.20 and port 80 -A
    . This shows exactly what HAProxy sends and what the backend responds with at the packet level.

    Q: What is the slowstart option and when should I use it?

    A: The

    slowstart
    option causes HAProxy to gradually increase the weight (and therefore the traffic share) of a server transitioning from DOWN to UP over a specified time period. For example,
    server web01 192.168.10.20:80 check slowstart 60s
    ramps the server from 0% to 100% of its configured weight over 60 seconds. Use it for backends that require a warm-up period — such as JVM applications building JIT-compiled code caches, or services pre-loading data into memory — to avoid immediately overwhelming them and triggering a rapid DOWN/UP cycle.

    Q: How do I ensure HAProxy health check configuration changes do not cause a service interruption when reloading?

    A: Use

    systemctl reload haproxy
    rather than
    restart
    . A reload performs a zero-downtime configuration reload by spawning a new HAProxy master process, migrating existing connections, and gracefully terminating the old process. Existing sessions are preserved. Always validate the configuration with
    haproxy -c -f /etc/haproxy/haproxy.cfg
    before reloading to ensure no syntax errors are introduced.

    Q: Can I configure different health check parameters for individual servers within the same backend pool?

    A: Yes. While backend-level directives like

    option httpchk
    and
    timeout check
    apply to all servers in the pool, per-server attributes such as
    inter
    (check interval),
    fall
    ,
    rise
    ,
    maxconn
    , and
    weight
    can be set individually on each server line. This allows you to, for example, check a high-capacity server more frequently or apply a stricter fall threshold to a server known to be less stable.

    Frequently Asked Questions

    What does L4CON mean in the HAProxy stats page?

    L4CON stands for Layer 4 Connection problem. It means HAProxy attempted a TCP connection to the backend and received a connection refused (RST) or no-route-to-host error. The port is either not listening or is blocked by a firewall. It is distinct from L4TOUT (Layer 4 timeout) and L7STS (Layer 7 HTTP status mismatch).

    How do I manually re-enable a server that HAProxy has marked DOWN?

    Use the HAProxy admin socket: echo "set server app_backend/web01 state ready" | socat stdio /var/run/haproxy/admin.sock. This immediately marks the server UP and resumes health checks. If the underlying issue is not fixed, HAProxy will mark it DOWN again after the next failed check cycle.

    What is the difference between fall and rise thresholds in HAProxy?

    The fall threshold is the number of consecutive failed checks required to mark a server DOWN. The rise threshold is the number of consecutive successes required to bring a DOWN server back UP. Using fall 3 rise 2 means three consecutive failures trigger a DOWN state, and two consecutive successes restore UP status, preventing flapping from transient issues.

    Can I check which backend servers are DOWN from the command line without the web stats page?

    Yes. Query the admin socket: echo "show servers state" | socat stdio /var/run/haproxy/admin.sock. This returns a table with each server's current state, check results, and timing. You can also parse the CSV stats output with: echo "show stat" | socat stdio /var/run/haproxy/admin.sock | cut -d',' -f1,2,18,19 to extract backend names and status fields.

    What is the default health check behavior if I do not explicitly configure one?

    With only the check keyword on a server line and no option httpchk directive, HAProxy performs a bare TCP connect check. It opens a TCP connection and considers the check passed if the handshake succeeds. It sends no data and does not validate an HTTP response, so the check passes even if the application behind the port has crashed as long as something accepts the TCP connection.

    Why does HAProxy mark a server DOWN even though I can ping it successfully?

    ICMP ping operates at Layer 3 and HAProxy health checks operate at Layer 4 (TCP) or Layer 7 (HTTP). A server responds to ping based on its kernel network stack, which is entirely independent of whether the application process is running, its port is accepting connections, or a firewall is blocking TCP traffic. A successful ping only confirms the host is reachable at Layer 3.

    How do I configure HAProxy to use a backup server when the primary backend is DOWN?

    Add the backup keyword to the standby server line: server web01-backup 192.168.10.21:80 check inter 2s fall 3 rise 2 backup. HAProxy sends traffic to backup servers only when all non-backup servers in the pool are DOWN. Backup servers still participate in health checks but receive no traffic while primary servers are active.

    Can HAProxy drain active sessions from a server before marking it DOWN during planned maintenance?

    Yes. Put the server in DRAIN state via the admin socket: echo "set server app_backend/web01 state drain" | socat stdio /var/run/haproxy/admin.sock. In DRAIN mode the server stops receiving new sessions but continues serving existing ones until they close naturally. Once active sessions reach zero, you can safely take the server offline without impacting users.

    How do I debug HAProxy health checks in real time?

    Add option log-health-checks to the backend stanza to log every individual check result to syslog. You can also run tcpdump on sw-infrarunbook-01 to capture the raw check traffic: tcpdump -i any -nn host 192.168.10.20 and port 80 -A. This shows exactly what HAProxy sends and what the backend returns at the packet level, useful for diagnosing subtle HTTP response mismatches.

    What is the slowstart option and when should I use it?

    slowstart causes HAProxy to gradually ramp up a server's traffic share from 0% to its configured weight over a specified duration as it transitions from DOWN to UP. For example, server web01 192.168.10.20:80 check slowstart 60s gives the server 60 seconds to warm up. Use it for JVM applications, cache-preloading services, or any backend that requires initialization time before it can handle full traffic load.

    Related Articles