HAProxy Backend Server Marked Down

Published: Apr 4, 2026

Updated: Apr 13, 2026

Learn why HAProxy marks backend servers as DOWN and how to diagnose and fix the most common causes, including misconfigured health checks, port mismatches, SSL errors, timeouts, and backend crashes.

Symptoms

When HAProxy marks one or more backend servers as DOWN, the effects are immediate and often visible to end users. Understanding what to look for is the first step toward a fast resolution. Common symptoms include:

Clients receive HTTP 503 Service Unavailable responses, particularly when all backend pool members are DOWN
The HAProxy statistics page shows affected servers highlighted in red with a DOWN status label
Monitoring and alerting systems fire notifications for backend availability or upstream connection failures
HAProxy logs in
/var/log/haproxy.log
contain entries referencing Layer4 or Layer7 check failures
Traffic concentrates onto a reduced set of healthy servers, potentially overloading them and triggering cascading failures
Application logs on upstream services report connection errors, ECONNREFUSED, or upstream timeout messages

A typical log entry on sw-infrarunbook-01 when a backend is marked DOWN looks like this:

Apr  4 09:14:22 sw-infrarunbook-01 haproxy[12345]: Server app_backend/web01 is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 1ms, 0 active and 1 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.

On the stats page accessible at

http://192.168.10.100:8080/haproxy?stats

, the affected backend row appears as:

web01   192.168.10.20:80   DOWN   0/3   L4CON   -   -   -

The

0/3

column shows zero successful checks out of the last three attempts, and

L4CON

identifies the failure at the TCP connection layer. Different failure codes point to different root causes, which the following sections address in full.

Root Cause 1: Health Check Misconfigured

Why It Happens

HAProxy uses health checks to determine whether a backend server is capable of handling traffic. When the health check configuration does not match what the backend actually serves — wrong URI path, unexpected HTTP response code, or wrong check type — HAProxy will repeatedly fail checks and eventually mark the server as DOWN, even when the backend is perfectly healthy and serving real traffic without issue.

Common misconfiguration scenarios include:

Using a plain TCP check (
check
) when the application expects an HTTP probe with a specific
Host
header
Sending an HTTP health check to the wrong URI path (e.g.,
/health
returns 404 instead of 200)
Expecting a specific HTTP status code such as 200 but the server legitimately returns 204 or 302
Health check rise and fall thresholds are too aggressive, causing a server to be marked DOWN on the first transient slowdown

How to Identify It

Inspect the backend stanza in your HAProxy configuration:

grep -A 20 'backend app_backend' /etc/haproxy/haproxy.cfg

backend app_backend
    balance roundrobin
    option httpchk GET /status
    http-check expect status 200
    server web01 192.168.10.20:80 check inter 2s fall 3 rise 2

Now verify what the backend actually returns at that path:

curl -v http://192.168.10.20:80/status

* Connected to 192.168.10.20 (192.168.10.20) port 80
> GET /status HTTP/1.1
> Host: 192.168.10.20
< HTTP/1.1 204 No Content
< Connection: keep-alive

The server returns 204 No Content but HAProxy is configured to expect 200 OK. Every health check fails, and after three consecutive failures HAProxy marks the server DOWN.

How to Fix It

Update the

http-check expect

directive to match the real response code. Using a regex range is more resilient to minor application changes:

backend app_backend
    balance roundrobin
    option httpchk GET /status
    http-check expect rstatus (2|3)[0-9][0-9]
    server web01 192.168.10.20:80 check inter 2s fall 3 rise 2

If the application has no dedicated health endpoint, use a HEAD request to the root path with a wide status code range:

backend app_backend
    balance roundrobin
    option httpchk HEAD / HTTP/1.1\r\nHost:\ 192.168.10.20
    http-check expect status 200-404
    server web01 192.168.10.20:80 check inter 2s fall 3 rise 2

Validate the configuration and reload HAProxy without dropping live connections:

haproxy -c -f /etc/haproxy/haproxy.cfg && systemctl reload haproxy

Tail the log to confirm the server recovers:

tail -f /var/log/haproxy.log | grep 'web01'

Apr  4 09:22:10 sw-infrarunbook-01 haproxy[12345]: Server app_backend/web01 is UP, reason: Layer7 check passed, code: 204, check duration: 3ms, 1 active and 0 backup servers online.

Root Cause 2: Port Mismatch

Why It Happens

A port mismatch occurs when HAProxy is configured to connect to a backend server on a port that is not actively listening. This is common after application deployments where a service migrates from one port to another (for example, from port 80 to port 8080 after containerization), or when a new team member deploys a service on a non-standard port without updating the HAProxy configuration. HAProxy receives a TCP RST from the kernel or a Connection refused error and immediately fails the health check.

How to Identify It

The log entry is unmistakable — the check duration is near zero because the TCP connection is rejected instantly:

grep 'web01' /var/log/haproxy.log | tail -5

Apr  4 10:05:33 sw-infrarunbook-01 haproxy[12345]: Server app_backend/web01 is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms

Confirm what the backend server is actually listening on:

ssh infrarunbook-admin@192.168.10.20 "ss -tlnp | grep -E 'State|80|443|8080|8443'"

State   Recv-Q  Send-Q  Local Address:Port  Peer Address:Port
LISTEN  0       511     0.0.0.0:8080        0.0.0.0:*     users:(("node",pid=4421,fd=22))

The application is now listening on port 8080 but HAProxy is configured for port 80. A quick test from sw-infrarunbook-01 confirms the mismatch:

nc -zv 192.168.10.20 80
nc -zv 192.168.10.20 8080

nc: connect to 192.168.10.20 port 80 (tcp) failed: Connection refused
Connection to 192.168.10.20 8080 port [tcp/*] succeeded!

How to Fix It

Update the server line in the HAProxy backend stanza to use the correct port:

# Before
server web01 192.168.10.20:80 check inter 2s fall 3 rise 2

# After
server web01 192.168.10.20:8080 check inter 2s fall 3 rise 2

Validate and reload:

haproxy -c -f /etc/haproxy/haproxy.cfg
Configuration file is valid

systemctl reload haproxy

Root Cause 3: SSL Mismatch

Why It Happens

SSL/TLS mismatches between HAProxy and a backend are a subtle but frequent cause of servers being marked DOWN. Unlike a port mismatch that fails instantly at Layer 4, SSL failures manifest at Layer 6 during the TLS handshake. This can occur when:

HAProxy is configured with
ssl
on the server line but the backend does not accept TLS at all
The backend uses a self-signed or internally-signed certificate and HAProxy is set to
verify required
against the system CA bundle
There is a protocol version incompatibility (e.g., the backend enforces TLSv1.3 but HAProxy negotiates TLSv1.1)
The SNI hostname HAProxy sends does not match the certificate CN or Subject Alternative Name on the backend
HAProxy omits
ssl
on the server line but the backend only accepts HTTPS connections

How to Identify It

SSL failures appear in logs as Layer6 errors:

Apr  4 11:30:12 sw-infrarunbook-01 haproxy[12345]: Server app_backend/web01 is DOWN, reason: Layer6 invalid response, info: "SSL handshake failure", check duration: 12ms

For certificate verification failures specifically:

Apr  4 11:30:14 sw-infrarunbook-01 haproxy[12345]: Server app_backend/web01 is DOWN, reason: Layer6 invalid response, info: "SSL certificate verification failure", check duration: 15ms

Test the TLS connection manually from sw-infrarunbook-01 using the same parameters HAProxy would use:

openssl s_client -connect 192.168.10.20:443 -servername api.solvethenetwork.com

CONNECTED(00000003)
depth=0 CN = api.solvethenetwork.com
verify error:num=18:self signed certificate
verify return:1
---
Certificate chain
 0 s:CN = api.solvethenetwork.com
   i:CN = api.solvethenetwork.com
---
SSL handshake has read 1338 bytes and written 444 bytes

Now review the HAProxy server line:

grep 'server web01' /etc/haproxy/haproxy.cfg

server web01 192.168.10.20:443 check ssl verify required ca-file /etc/ssl/certs/ca-certificates.crt inter 2s fall 3 rise 2

The backend presents a self-signed certificate. Because

verify required

checks against the system CA bundle (which does not include the self-signed cert), every check fails at the SSL layer.

How to Fix It

For internal backends using self-signed certificates on a trusted private network, disable certificate verification. For production setups, provide the specific internal CA certificate instead:

# Option 1: Disable verification (acceptable for internal private networks only)
server web01 192.168.10.20:443 check ssl verify none inter 2s fall 3 rise 2

# Option 2: Provide internal CA cert
server web01 192.168.10.20:443 check ssl verify required ca-file /etc/haproxy/certs/internal-ca.crt inter 2s fall 3 rise 2

If the backend does not accept SSL at all and the server line incorrectly includes

ssl

, simply remove it:

# Before
server web01 192.168.10.20:80 check ssl verify none inter 2s fall 3 rise 2

# After
server web01 192.168.10.20:80 check inter 2s fall 3 rise 2

Validate and reload:

haproxy -c -f /etc/haproxy/haproxy.cfg && systemctl reload haproxy

Root Cause 4: Timeout Too Low

Why It Happens

Every HAProxy health check is governed by a timeout. If the backend does not respond within that window, HAProxy treats the check as a failure and increments the fall counter. This is especially problematic for:

Backends under heavy load that respond slowly to health check probes
Application health endpoints that perform database connectivity queries, cache warm-up, or dependency checks as part of the response
Virtualized or containerized backends that experience CPU steal, I/O contention, or garbage collection pauses
Cold-start scenarios where an application has just been (re)started and its connection pools are not yet warmed up

How to Identify It

The key indicator in the log is that the check duration equals the configured timeout value exactly:

Apr  4 12:15:44 sw-infrarunbook-01 haproxy[12345]: Server app_backend/web01 is DOWN, reason: Layer4 timeout, check duration: 2000ms

check duration: 2000ms matching your timeout configuration confirms the check hit the ceiling rather than receiving a genuine failure response. Inspect the current timeout settings:

grep -E 'timeout|inter|fall|rise' /etc/haproxy/haproxy.cfg

    timeout connect     2s
    timeout client      30s
    timeout server      30s
    timeout check       2s

Now measure the actual backend health endpoint response time under realistic load:

time curl -s -o /dev/null -w "%{http_code}" http://192.168.10.20:80/health

200
real    0m3.421s

The backend legitimately returns 200 but takes 3.4 seconds. The 2-second check timeout causes every probe to time out before the response arrives.

How to Fix It

Increase

timeout check

to accommodate the backend's real response time. Set it to at least 2x the 99th percentile response time of your health endpoint under load:

defaults
    timeout connect     5s
    timeout client      30s
    timeout server      30s
    timeout check       8s

To override only for a specific backend without changing global defaults:

backend app_backend
    timeout check 8s
    option httpchk GET /health
    http-check expect status 200
    server web01 192.168.10.20:80 check inter 10s fall 3 rise 2

Simultaneously, work to reduce the health endpoint's response time. If it queries a database for connectivity verification, ensure the query uses a connection pool and a short statement timeout. Ideally a health endpoint should respond in under 100ms. Reload after changes:

systemctl reload haproxy

Root Cause 5: Backend Crash

Why It Happens

Sometimes HAProxy marks a server DOWN because it genuinely is down. The application process may have exited due to an unhandled exception, a failed dependency check at startup, an out-of-memory (OOM) kill, a deployment rollout failure, or a kernel panic. In these cases, HAProxy is performing exactly as designed — protecting users from routing traffic to a dead server.

How to Identify It

SSH to the backend and check the service status immediately:

ssh infrarunbook-admin@192.168.10.20 "systemctl status app-service"

● app-service.service - Application Service
   Loaded: loaded (/etc/systemd/system/app-service.service; enabled)
   Active: failed (Result: exit-code) since Fri 2026-04-04 12:30:00 UTC; 5min ago
  Process: 9812 ExecStart=/usr/bin/node /opt/app/server.js (code=exited, status=1/FAILURE)
 Main PID: 9812 (code=exited, status=1/FAILURE)

Review the application logs for the crash root cause:

ssh infrarunbook-admin@192.168.10.20 "journalctl -u app-service --since '30 minutes ago' | tail -30"

Apr 04 12:29:58 192.168.10.20 node[9812]: Error: connect ECONNREFUSED 192.168.10.50:5432
Apr 04 12:29:58 192.168.10.20 node[9812]: Unhandled promise rejection -- exiting
Apr 04 12:30:00 192.168.10.20 systemd[1]: app-service.service: Main process exited, code=exited, status=1/FAILURE

Check for OOM kills, which often leave no application-level trace:

ssh infrarunbook-admin@192.168.10.20 "dmesg | grep -i 'oom\|killed process' | tail -10"

[185432.123456] Out of memory: Killed process 9812 (node) total-vm:1524432kB, anon-rss:1020212kB, file-rss:4kB, shmem-rss:0kB

How to Fix It

The resolution depends entirely on the crash cause. For a database connection failure (ECONNREFUSED to 192.168.10.50:5432), restore connectivity to the PostgreSQL instance and restart:

ssh infrarunbook-admin@192.168.10.20 "systemctl restart app-service"

● app-service.service - Application Service
   Active: active (running) since Fri 2026-04-04 12:45:00 UTC; 3s ago
 Main PID: 10234 (node)

For OOM kills, increase the host memory, reduce the application heap limit, or add a memory limit with an automatic restart policy in the service unit:

# /etc/systemd/system/app-service.service
[Service]
MemoryMax=1G
Restart=on-failure
RestartSec=5s

Once the application is back up, HAProxy detects recovery on the next successful health check cycle and automatically marks the server UP:

Apr  4 12:45:10 sw-infrarunbook-01 haproxy[12345]: Server app_backend/web01 is UP, reason: Layer7 check passed, code: 200, check duration: 4ms, 1 active and 0 backup servers online.

Root Cause 6: Firewall Blocking Health Check Traffic

Why It Happens

HAProxy sends health check packets originating from the load balancer's IP address to the backend's IP and port. If a host-based firewall on the backend, an intermediate security group, or a network ACL drops packets from the HAProxy source IP, the health check fails even though the application is running normally and may even be serving real user traffic via a different path. This produces a situation where the application is healthy but HAProxy cannot determine that.

How to Identify It

The log entry looks identical to a genuine connection refusal. The distinguishing test is that the application works from the backend itself but not from the HAProxy host. Test from sw-infrarunbook-01:

nc -zv 192.168.10.20 80

nc: connect to 192.168.10.20 port 80 (tcp) failed: No route to host

Compare with a test from the backend host itself:

ssh infrarunbook-admin@192.168.10.20 "curl -s -o /dev/null -w '%{http_code}' http://127.0.0.1:80/health"

The discrepancy confirms a network or firewall issue rather than an application problem. Inspect iptables rules on the backend:

ssh infrarunbook-admin@192.168.10.20 "iptables -L INPUT -n -v --line-numbers | grep -E '80|DROP|REJECT'"

5    0     0 DROP  tcp  --  *  *  0.0.0.0/0  0.0.0.0/0  tcp dpt:80

A blanket DROP rule on port 80 is blocking health check probes from sw-infrarunbook-01 (192.168.10.100).

How to Fix It

Insert an ACCEPT rule for health check traffic from the HAProxy source IP before the DROP rule:

ssh infrarunbook-admin@192.168.10.20 "iptables -I INPUT 1 -s 192.168.10.100 -p tcp --dport 80 -j ACCEPT"

Persist the rule across reboots:

ssh infrarunbook-admin@192.168.10.20 "iptables-save > /etc/iptables/rules.v4"

Root Cause 7: Backend Resource Exhaustion

Why It Happens

A backend server that is alive but saturated will fail to accept new TCP connections even though the process is running. This can occur when the application has exhausted its open file descriptor limit (since every socket consumes a file descriptor), the OS listen backlog queue is full, or the application's thread pool or connection pool is completely occupied with long-running requests. HAProxy sees the connection refusal or timeout and marks the server DOWN. This is self-reinforcing: marking one server DOWN pushes more load onto the remaining servers, potentially triggering further failures.

How to Identify It

Check the application process's file descriptor usage on the backend:

ssh infrarunbook-admin@192.168.10.20 "cat /proc/\$(pgrep -f server.js)/limits | grep 'open files'"

Max open files            1024                 1024                 files

ssh infrarunbook-admin@192.168.10.20 "ls /proc/\$(pgrep -f server.js)/fd | wc -l"

The process is at 1021 out of a maximum of 1024 open file descriptors — three away from saturation. Any new connection attempt (including health check probes) will be refused. Check system-wide socket stats too:

ssh infrarunbook-admin@192.168.10.20 "ss -s"

Total: 4096 (kernel 4100)
TCP:   4090 (estab 3980, closed 110, orphaned 0, synrecv 0, timewait 98/0), ports 0

Transport Total     IP        IPv6
*         4100      -         -
RAW       0         0         0
UDP       4         4         0
TCP       3980      3980      0
INET      3984      3984      0
FRAG      0         0         0

How to Fix It

Increase the open file descriptor limit for the application service unit:

# /etc/systemd/system/app-service.service
[Service]
LimitNOFILE=65535

ssh infrarunbook-admin@192.168.10.20 "systemctl daemon-reload && systemctl restart app-service"

On the HAProxy side, use

maxconn

per server to cap the connection count before the backend reaches saturation, giving you a controlled queue rather than an abrupt refusal:

server web01 192.168.10.20:80 check inter 2s fall 3 rise 2 maxconn 200

Prevention

Preventing unexpected backend server downtime in HAProxy requires proactive configuration discipline, robust monitoring, and codified operational procedures.

Use application-level health checks: Always prefer
option httpchk
with a dedicated health endpoint over raw TCP checks. The endpoint should verify the application, its database connections, and any critical dependencies simultaneously. It must respond in under 200ms under normal load.
Set appropriate timeouts based on measurement: Baseline your backend response times under realistic load and set
timeout check
to at least 2x the 99th percentile response time of your health endpoint. Do not rely on defaults.
Tune rise and fall thresholds carefully: Use
fall 3
to require three consecutive failures before marking a server DOWN, avoiding false positives from transient network glitches. Use
rise 2
to require two consecutive successes before marking a server back UP, preventing flapping.
Enable the admin socket: Configure the UNIX socket interface so you can enable, disable, and drain servers dynamically without a configuration reload:
stats socket /var/run/haproxy/admin.sock mode 660 level admin expose-fd listeners
Ship logs to a centralized system and alert: Forward
/var/log/haproxy.log
to your log aggregation platform. Alert immediately on any
is DOWN
event so engineers investigate before all backends fail.
Version-control haproxy.cfg: Store
/etc/haproxy/haproxy.cfg
in a git repository. Run
haproxy -c -f /etc/haproxy/haproxy.cfg
in your CI/CD pipeline on every proposed change to catch misconfigurations before they reach production.
Document port and TLS configuration per service: Maintain a service registry that records each backend's expected port, protocol, and certificate authority. Update the HAProxy configuration as an explicit step in every backend deployment runbook.
Cap per-server connections with maxconn: Use the
maxconn
directive per server to apply back-pressure before resource exhaustion causes health checks to fail. A queued request is better than a refused connection.
Test health checks independently and regularly: Schedule periodic tests from sw-infrarunbook-01 using
curl
and
openssl s_client
to each backend, verifying the health check path is reachable and returns the expected response independent of real user traffic paths.
Use slowstart for freshly recovered servers: Add
slowstart 30s
to server definitions so that a server returning to UP state receives gradually increasing traffic rather than being immediately flooded, reducing the risk of it being immediately overwhelmed and marked DOWN again.

Frequently Asked Questions

What does L4CON mean in the HAProxy stats page?

L4CON stands for Layer 4 Connection problem. It means HAProxy attempted a TCP connection to the backend and received a connection refused (RST) or no-route-to-host error. The port is either not listening or is blocked by a firewall. It is distinct from L4TOUT (Layer 4 timeout) and L7STS (Layer 7 HTTP status mismatch).

How do I manually re-enable a server that HAProxy has marked DOWN?

Use the HAProxy admin socket: echo "set server app_backend/web01 state ready" | socat stdio /var/run/haproxy/admin.sock. This immediately marks the server UP and resumes health checks. If the underlying issue is not fixed, HAProxy will mark it DOWN again after the next failed check cycle.

What is the difference between fall and rise thresholds in HAProxy?

The fall threshold is the number of consecutive failed checks required to mark a server DOWN. The rise threshold is the number of consecutive successes required to bring a DOWN server back UP. Using fall 3 rise 2 means three consecutive failures trigger a DOWN state, and two consecutive successes restore UP status, preventing flapping from transient issues.

Can I check which backend servers are DOWN from the command line without the web stats page?

Yes. Query the admin socket: echo "show servers state" | socat stdio /var/run/haproxy/admin.sock. This returns a table with each server's current state, check results, and timing. You can also parse the CSV stats output with: echo "show stat" | socat stdio /var/run/haproxy/admin.sock | cut -d',' -f1,2,18,19 to extract backend names and status fields.

What is the default health check behavior if I do not explicitly configure one?

With only the check keyword on a server line and no option httpchk directive, HAProxy performs a bare TCP connect check. It opens a TCP connection and considers the check passed if the handshake succeeds. It sends no data and does not validate an HTTP response, so the check passes even if the application behind the port has crashed as long as something accepts the TCP connection.

Why does HAProxy mark a server DOWN even though I can ping it successfully?

ICMP ping operates at Layer 3 and HAProxy health checks operate at Layer 4 (TCP) or Layer 7 (HTTP). A server responds to ping based on its kernel network stack, which is entirely independent of whether the application process is running, its port is accepting connections, or a firewall is blocking TCP traffic. A successful ping only confirms the host is reachable at Layer 3.

How do I configure HAProxy to use a backup server when the primary backend is DOWN?

Add the backup keyword to the standby server line: server web01-backup 192.168.10.21:80 check inter 2s fall 3 rise 2 backup. HAProxy sends traffic to backup servers only when all non-backup servers in the pool are DOWN. Backup servers still participate in health checks but receive no traffic while primary servers are active.

Can HAProxy drain active sessions from a server before marking it DOWN during planned maintenance?

Yes. Put the server in DRAIN state via the admin socket: echo "set server app_backend/web01 state drain" | socat stdio /var/run/haproxy/admin.sock. In DRAIN mode the server stops receiving new sessions but continues serving existing ones until they close naturally. Once active sessions reach zero, you can safely take the server offline without impacting users.

How do I debug HAProxy health checks in real time?

Add option log-health-checks to the backend stanza to log every individual check result to syslog. You can also run tcpdump on sw-infrarunbook-01 to capture the raw check traffic: tcpdump -i any -nn host 192.168.10.20 and port 80 -A. This shows exactly what HAProxy sends and what the backend returns at the packet level, useful for diagnosing subtle HTTP response mismatches.

What is the slowstart option and when should I use it?

slowstart causes HAProxy to gradually ramp up a server's traffic share from 0% to its configured weight over a specified duration as it transitions from DOWN to UP. For example, server web01 192.168.10.20:80 check slowstart 60s gives the server 60 seconds to warm up. Use it for JVM applications, cache-preloading services, or any backend that requires initialization time before it can handle full traffic load.

Symptoms

Root Cause 1: Health Check Misconfigured

Why It Happens

How to Identify It

How to Fix It

Root Cause 2: Port Mismatch

Why It Happens

How to Identify It

How to Fix It

Root Cause 3: SSL Mismatch

Why It Happens

How to Identify It

How to Fix It

Root Cause 4: Timeout Too Low

Why It Happens

How to Identify It

How to Fix It

Root Cause 5: Backend Crash

Why It Happens

How to Identify It

How to Fix It

Root Cause 6: Firewall Blocking Health Check Traffic

Why It Happens

How to Identify It

How to Fix It

Root Cause 7: Backend Resource Exhaustion

Why It Happens

How to Identify It

How to Fix It

Prevention

Related Articles

Frequently Asked Questions

What does L4CON mean in the HAProxy stats page?

How do I manually re-enable a server that HAProxy has marked DOWN?

What is the difference between fall and rise thresholds in HAProxy?

Can I check which backend servers are DOWN from the command line without the web stats page?

What is the default health check behavior if I do not explicitly configure one?

Why does HAProxy mark a server DOWN even though I can ping it successfully?

How do I configure HAProxy to use a backup server when the primary backend is DOWN?

Can HAProxy drain active sessions from a server before marking it DOWN during planned maintenance?

How do I debug HAProxy health checks in real time?

What is the slowstart option and when should I use it?

Related Articles