HAProxy 503 Service Unavailable Error

A 503 Service Unavailable response from HAProxy is one of the most disruptive errors in a load-balanced infrastructure. Unlike a 503 returned by an upstream application, HAProxy generates this error itself when it cannot forward a request to any backend server. Understanding the distinction between an application-generated 503 and an HAProxy-generated 503 is the first step in effective troubleshooting — and the two are resolved in completely different ways.

This article walks through the most common root causes of HAProxy 503 errors, how to identify each one using real diagnostic commands and log output, and exactly how to resolve them before they cascade into larger outages.

Symptoms

When HAProxy itself generates a 503, clients typically observe:

HTTP response 503 Service Unavailable with the body:
No server is available to handle this request.
The HAProxy stats page shows all servers in a backend as DOWN or MAINT
Application servers show no incoming requests while HAProxy logs show a flood of 503 responses
HAProxy log lines similar to the following, where the server slot shows a dash instead of a server name:

Apr  3 14:22:01 sw-infrarunbook-01 haproxy[2847]: 10.0.1.55:54321 [03/Apr/2026:14:22:01.042] web-frontend web-backend/- 0/-1/-1/-1/0 503 212 - - SC-- 3/3/0/0/0 0/0 "GET /api/status HTTP/1.1"

Monitoring alerts fire on 5xx error rate spikes across the entire backend pool
curl
returns immediately with a 503 rather than hanging until a timeout

The key indicator is that HAProxy itself is up and responding — it just has nowhere to send your traffic.

Root Cause 1: All Backends Down

Why It Happens

HAProxy continuously monitors the health of each backend server via its configured health checks. When every server in a backend pool fails its health checks or is manually set to

MAINT

, HAProxy has no viable destination for incoming requests and returns a 503 immediately. This can happen due to a mass deployment gone wrong, a network partition between the HAProxy host and the backend tier, or an upstream database failure that causes all application servers to crash simultaneously within the same health check interval window.

How to Identify It

Query the HAProxy runtime API via its stats socket to check the current state of every server:

echo "show stat" | socat stdio /var/run/haproxy/admin.sock | cut -d',' -f1,2,18,19,20

Output when all backends are down:

# pxname,svname,status,weight,act
web-frontend,FRONTEND,OPEN,,,
web-backend,web-01,DOWN,1,1
web-backend,web-02,DOWN,1,1
web-backend,web-03,DOWN,1,1
web-backend,BACKEND,DOWN,0,0

Confirm in the HAProxy log — the

-

after

web-backend/

(where a server name should appear) confirms HAProxy had no server to select:

tail -n 50 /var/log/haproxy.log | grep "503"

Apr  3 14:22:01 sw-infrarunbook-01 haproxy[2847]: 10.0.1.55:54321 [03/Apr/2026:14:22:01.042] web-frontend web-backend/- 0/-1/-1/-1/0 503 212 - - SC-- 3/3/0/0/0 0/0 "GET /api/status HTTP/1.1"

How to Fix It

First, verify connectivity to the backend servers directly from the HAProxy host:

curl -v http://10.10.1.11:8080/
curl -v http://10.10.1.12:8080/
curl -v http://10.10.1.13:8080/

If servers respond but HAProxy still shows them DOWN, force a re-evaluation using the runtime API:

echo "set server web-backend/web-01 state ready" | socat stdio /var/run/haproxy/admin.sock
echo "set server web-backend/web-02 state ready" | socat stdio /var/run/haproxy/admin.sock

If the servers are genuinely down, bring them back online and HAProxy will automatically re-add them once they pass their configured number of consecutive successful health checks (

rise

). Configure a backup server to serve a maintenance page in the interim so users see a graceful error instead of a raw 503:

backend web-backend
    balance roundrobin
    option httpchk GET /health
    server web-01 10.10.1.11:8080 check inter 5s fall 3 rise 2
    server web-02 10.10.1.12:8080 check inter 5s fall 3 rise 2
    server maintenance 10.10.1.99:8080 backup

Root Cause 2: Health Check Path Wrong

Why It Happens

HAProxy marks a server as DOWN when its health checks fail. If

option httpchk

is configured with an incorrect path — one that returns a non-2xx or non-3xx HTTP response, or one that simply does not exist on the backend — HAProxy will fail the check on every interval and mark the server DOWN, even if the application is otherwise serving traffic correctly on other paths. This is one of the most common misconfigurations, and it frequently surfaces after an application refactor that changes endpoint paths without a corresponding HAProxy config update.

How to Identify It

Inspect the health check configuration in

/etc/haproxy/haproxy.cfg

grep -A 10 "backend web-backend" /etc/haproxy/haproxy.cfg

backend web-backend
    balance roundrobin
    option httpchk GET /healthz
    http-check expect status 200
    server web-01 10.10.1.11:8080 check inter 5s fall 3 rise 2

Now test that path manually against the backend from the HAProxy host:

curl -v http://10.10.1.11:8080/healthz

*   Trying 10.10.1.11:8080...
* Connected to 10.10.1.11 (10.10.1.11) port 8080
> GET /healthz HTTP/1.1
...
< HTTP/1.1 404 Not Found
< Content-Type: text/plain
404 page not found

The endpoint returns 404 — HAProxy fails the check and marks the server DOWN. Enable health check logging in HAProxy to see the failure reason in real time:

echo "show health" | socat stdio /var/run/haproxy/admin.sock

web-backend/web-01: L7STS, code: 404, info: "Not Found", check duration: 2ms
web-backend/web-02: L7STS, code: 404, info: "Not Found", check duration: 3ms

How to Fix It

Identify the correct health endpoint on the running application:

curl -s -o /dev/null -w "%{http_code}" http://10.10.1.11:8080/health
curl -s -o /dev/null -w "%{http_code}" http://10.10.1.11:8080/ping
curl -s -o /dev/null -w "%{http_code}" http://10.10.1.11:8080/ready

Once confirmed, update

/etc/haproxy/haproxy.cfg

with the correct path:

backend web-backend
    balance roundrobin
    option httpchk GET /health
    http-check expect status 200
    server web-01 10.10.1.11:8080 check inter 5s fall 3 rise 2
    server web-02 10.10.1.12:8080 check inter 5s fall 3 rise 2

Validate the configuration and reload without dropping connections:

haproxy -c -f /etc/haproxy/haproxy.cfg && systemctl reload haproxy

Watch the stats socket to confirm servers transition from DOWN to UP:

watch -n 2 "echo 'show stat' | socat stdio /var/run/haproxy/admin.sock | cut -d',' -f1,2,18"

Root Cause 3: Max Connection Limit Reached

Why It Happens

HAProxy enforces connection limits at multiple levels: globally via

maxconn

in the

global

section, per-frontend, per-backend, and per individual server. When any of these limits is saturated, HAProxy cannot open new connections to backends. Requests that cannot be queued are rejected with a 503 immediately. This typically manifests under sudden traffic spikes, during a slow-response incident where connections pile up faster than they are released, or when the limits were sized for historical traffic patterns that no longer match reality.

How to Identify It

Query current connection counts via the stats socket:

echo "show info" | socat stdio /var/run/haproxy/admin.sock | grep -E "MaxConn|CurrConns|MaxConnRate"

Maxconn: 2000
CurrConns: 1998
MaxConnRate: 450

Nearly at the global limit of 2000. Inspect per-server current vs limit values:

echo "show stat" | socat stdio /var/run/haproxy/admin.sock | cut -d',' -f1,2,4,5,18

# pxname,svname,scur,slim,status
web-backend,web-01,250,250,UP
web-backend,web-02,250,250,UP
web-backend,BACKEND,500,500,UP

Both servers are at their per-server

maxconn

ceiling of 250. In the HAProxy log, look for the

SC--

termination flag combined with high current connection counts:

grep "SC--" /var/log/haproxy.log | tail -10

Apr  3 14:35:11 sw-infrarunbook-01 haproxy[2847]: 10.0.1.55:54400 [03/Apr/2026:14:35:11.100] web-frontend web-backend/- 0/-1/-1/-1/0 503 212 - - SC-- 2000/1999/0/0/3 0/0 "POST /api/submit HTTP/1.1"

How to Fix It

Increase the global

maxconn

/etc/haproxy/haproxy.cfg

and ensure the OS file descriptor limit can support it:

global
    maxconn 8000
    ulimit-n 20000

defaults
    maxconn 4000

Increase per-server connection limits as well:

backend web-backend
    balance leastconn
    server web-01 10.10.1.11:8080 check maxconn 1000
    server web-02 10.10.1.12:8080 check maxconn 1000

Verify and raise the OS-level file descriptor limit for the haproxy process:

ulimit -n
# If too low, add to /etc/security/limits.conf:
haproxy soft nofile 65535
haproxy hard nofile 65535

Reload HAProxy after updating the configuration:

haproxy -c -f /etc/haproxy/haproxy.cfg && systemctl reload haproxy

Root Cause 4: Backend Queue Full

Why It Happens

When all backend servers in a pool are at their per-server

maxconn

limit, HAProxy can queue incoming requests and wait for a connection slot to become available. This queue has a finite depth (governed by the backend

maxconn

and

fullconn

settings) and a finite patience (governed by

timeout queue

). When queued requests exceed the wait time or the queue depth is exhausted, HAProxy drops those requests with a 503. The primary trigger for queue buildup is slow backend responses — if each request holds a connection open for several seconds, the queue fills rapidly under moderate load.

How to Identify It

Look at queue depth columns in the stats output. The

qcur

(current queue) and

qmax

(maximum queue observed) columns are telling:

echo "show stat" | socat stdio /var/run/haproxy/admin.sock | cut -d',' -f1,2,3,4,5,18

# pxname,svname,qcur,qmax,scur,status
web-backend,web-01,45,120,250,UP
web-backend,web-02,38,115,250,UP
web-backend,BACKEND,83,235,500,UP

In the HAProxy log, queue timeout events carry the

sQ--

termination flag and show the queue position:

Apr  3 14:41:05 sw-infrarunbook-01 haproxy[2847]: 10.0.1.55:54501 [03/Apr/2026:14:41:04.800] web-frontend web-backend/- 5009/-1/-1/-1/5009 503 212 - - sQ-- 2001/2001/45/0/0 45/100 "GET /dashboard HTTP/1.1"

The

sQ--

flag, the 5009 ms total time, and the 45/100 queue position confirm the request waited in queue for ~5 seconds and was then evicted with a 503 because

timeout queue

expired. Check current queue settings:

grep -E "timeout queue|fullconn|maxconn" /etc/haproxy/haproxy.cfg

timeout queue 5s
fullconn 2000

How to Fix It

First profile actual backend response times to understand the root cause of the queue buildup:

curl -w "\nDNS: %{time_namelookup}s  Connect: %{time_connect}s  Total: %{time_total}s\n" \
     -o /dev/null -s http://10.10.1.11:8080/api/dashboard

Increase the queue timeout and backend pool capacity to tolerate legitimate slow responses:

defaults
    timeout queue 30s

backend web-backend
    maxconn 3000
    fullconn 5000
    balance leastconn
    server web-01 10.10.1.11:8080 check maxconn 750
    server web-02 10.10.1.12:8080 check maxconn 750
    server web-03 10.10.1.13:8080 check maxconn 750
    server web-04 10.10.1.14:8080 check maxconn 750

Adding more backend capacity directly reduces per-server load and drains the queue faster. The

leastconn

balance algorithm is preferable over

roundrobin

when response times are variable, as it routes new requests to whichever server has the fewest active connections.

Root Cause 5: No Server in Ready State

Why It Happens

HAProxy tracks each server across three separate state dimensions: the administrative state (READY, DRAIN, or MAINT), the operational state (UP or DOWN), and the health check result. A server can be operationally UP — passing all health checks — yet administratively placed in DRAIN or MAINT mode, which prevents HAProxy from routing any new traffic to it. If all servers in a backend are simultaneously in DRAIN or MAINT, HAProxy returns 503 for every new request even though the backend servers are alive and healthy. This scenario is most commonly triggered by a rolling deployment script that places servers into maintenance mode but fails midway through, or by runbook-driven manual maintenance steps executed across the entire pool without checking remaining capacity first.

How to Identify It

Check all server states explicitly using the

show servers state

command:

echo "show servers state web-backend" | socat stdio /var/run/haproxy/admin.sock

# be_id be_name srv_id srv_name srv_addr srv_op_state srv_admin_state srv_uweight
2 web-backend 1 web-01 10.10.1.11 2 1 1
2 web-backend 2 web-02 10.10.1.12 2 1 1
2 web-backend 3 web-03 10.10.1.13 2 2 1

srv_admin_state

0

= READY,

1

= DRAIN,

2

= MAINT. All three servers are removed from active routing. For a simpler summary view:

echo "show stat" | socat stdio /var/run/haproxy/admin.sock | awk -F',' 'NR==1 || $1=="web-backend" {print $1,$2,$18,$19}'

web-backend web-01 DRAIN 1
web-backend web-02 DRAIN 1
web-backend web-03 MAINT 1
web-backend BACKEND DOWN 0

How to Fix It

Restore servers to the READY state immediately via the HAProxy runtime API — no reload or restart required:

echo "set server web-backend/web-01 state ready" | socat stdio /var/run/haproxy/admin.sock
echo "set server web-backend/web-02 state ready" | socat stdio /var/run/haproxy/admin.sock
echo "set server web-backend/web-03 state ready" | socat stdio /var/run/haproxy/admin.sock

Confirm the state change took effect:

echo "show stat" | socat stdio /var/run/haproxy/admin.sock | awk -F',' '/web-backend/ {print $1,$2,$18}'

web-backend web-01 UP
web-backend web-02 UP
web-backend web-03 UP
web-backend BACKEND UP

To prevent this during future deployments, enforce a safeguard in your deployment script that verifies at least one backend server remains in READY/UP state before draining another:

#!/bin/bash
SERVERS=("web-01" "web-02" "web-03")
BACKEND="web-backend"
SOCK="/var/run/haproxy/admin.sock"

for SRV in "${SERVERS[@]}"; do
  # Verify at least one other server is UP before draining this one
  ACTIVE=$(echo "show stat" | socat stdio $SOCK | awk -F',' -v b="$BACKEND" -v s="$SRV" \
    '$1==b && $2!=s && $18=="UP" {count++} END {print count+0}')
  if [ "$ACTIVE" -lt 1 ]; then
    echo "ERROR: No other servers are UP. Aborting to prevent full outage."
    exit 1
  fi
  echo "set server ${BACKEND}/${SRV} state drain" | socat stdio $SOCK
  sleep 15
  ssh infrarunbook-admin@${SRV}.solvethenetwork.com "systemctl restart app"
  echo "set server ${BACKEND}/${SRV} state ready" | socat stdio $SOCK
  sleep 10
done

Root Cause 6: Backend Connection Timeout Misconfiguration

Why It Happens

HAProxy has several timeout values governing how long it waits when connecting to or receiving data from a backend. If

timeout connect

is set too low for the network path between HAProxy and the backend servers — for example, 100ms on a cross-datacenter link with 200ms latency — HAProxy will time out every connection attempt, mark the backend as failed, and eventually mark the server DOWN. This leads directly to a 503.

How to Identify It

Check your current timeout configuration:

grep -E "timeout (connect|server|client|queue)" /etc/haproxy/haproxy.cfg

timeout connect 100ms
timeout client  10s
timeout server  10s

Measure the actual TCP connection time to a backend from the HAProxy host:

time curl -s http://10.10.1.11:8080/health > /dev/null

real    0m0.312s

A 312ms round-trip will consistently fail a 100ms

timeout connect

. Confirm via the health check log:

grep "L4TOUT\|L4CON" /var/log/haproxy.log | tail -10

Apr  3 14:52:01 sw-infrarunbook-01 haproxy[2847]: Health check for server web-backend/web-01 failed, reason: Layer4 timeout, check duration: 101ms, status: ->DOWN.

How to Fix It

Set timeouts that are realistic for your actual network path and application response profile:

defaults
    timeout connect 5s
    timeout client  30s
    timeout server  30s
    timeout queue   30s

Reload and confirm servers return to UP status:

haproxy -c -f /etc/haproxy/haproxy.cfg && systemctl reload haproxy

Root Cause 7: Firewall Blocking Backend Connections

Why It Happens

A firewall rule added between the HAProxy host and the backend servers can silently drop connection attempts, causing HAProxy health checks to time out and eventually mark the server as DOWN. This is especially common when network ACLs are tightened as part of a security hardening exercise without coordination across teams — the application team marks the server healthy, but HAProxy cannot reach it because a new iptables or security group rule is blocking the port.

How to Identify It

Test TCP connectivity directly from the HAProxy host to the backend on the application port:

nc -zv 10.10.1.11 8080
telnet 10.10.1.11 8080

nc: connect to 10.10.1.11 port 8080 (tcp) failed: No route to host

Check for DROP or REJECT rules in iptables on both the HAProxy host (outbound) and the backend server (inbound):

iptables -L OUTPUT -n -v | grep -E "DROP|REJECT"
ssh infrarunbook-admin@10.10.1.11 "iptables -L INPUT -n -v | grep -E 'DROP|REJECT'"

How to Fix It

Allow traffic from the HAProxy host subnet to the backend application port. On the backend servers:

iptables -I INPUT -s 10.10.1.0/24 -p tcp --dport 8080 -j ACCEPT
iptables-save > /etc/iptables/rules.v4

Or if using firewalld:

firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="10.10.1.0/24" port port="8080" protocol="tcp" accept'
firewall-cmd --reload

Root Cause 8: SSL/TLS Issues on HTTPS Backends

Why It Happens

When HAProxy is configured to forward traffic to backends over HTTPS using the

ssl

keyword on server lines, TLS handshake failures — caused by expired backend certificates, CA verification failures, or TLS protocol version mismatches — will cause the connection to fail, the health check to fail, and the server to be marked DOWN. This is increasingly common as organizations enforce internal certificate authorities and mutual TLS between services.

How to Identify It

Inspect the backend SSL configuration:

grep -E "ssl|ca-file|verify" /etc/haproxy/haproxy.cfg

server web-01 10.10.1.11:8443 check ssl verify required ca-file /etc/ssl/certs/internal-ca.pem

Test the TLS connection manually and check for certificate errors:

openssl s_client -connect 10.10.1.11:8443 -CAfile /etc/ssl/certs/internal-ca.pem 2>&1 | grep -E "verify|expire|error"

verify error:num=10:certificate has expired
notAfter=Mar 15 00:00:00 2026 GMT

The HAProxy health check log will report the TLS failure:

Health check for server web-backend/web-01 failed, reason: SSL handshake failure, status: ->DOWN.

How to Fix It

Renew the expired backend certificate. During the renewal window, you can temporarily disable strict verification to restore service — but this should be reverted as soon as the certificate is replaced:

server web-01 10.10.1.11:8443 check ssl verify none

Warning:
verify none
disables certificate validation entirely. It should only be used as a temporary measure during a certificate renewal incident and must be removed immediately after the certificate is replaced.

Prevention

Preventing HAProxy 503 errors requires a combination of correct initial configuration, ongoing monitoring, and disciplined operational procedures. The following practices address the root causes covered above:

Use realistic health check intervals: Set
inter
,
fall
, and
rise
values that balance fast failure detection with tolerance for brief hiccups. A common production setting is
check inter 5s fall 3 rise 2
, which marks a server DOWN after three consecutive failures over 15 seconds and restores it after two consecutive successes.
Always configure a backup server: Use the
backup
keyword on a dedicated server that serves a maintenance or error page. When all primary servers are DOWN, users receive a graceful degraded response instead of a raw 503.
Instrument your health check endpoint properly: Ensure the
/health
endpoint (or whatever path you configure) actually validates the full application stack — database connectivity, cache availability, downstream service reachability. A health check that returns 200 unconditionally is worse than no health check because it masks real failures.
Enable verbose health check logging: Add
option log-health-checks
to each backend to log every check result, not only state transitions. This provides the data needed to diagnose intermittent health check failures before they cause a full pool outage.
Size
maxconn
with headroom: Profile your application under peak load, determine the maximum concurrent connections needed, and configure
maxconn
values with at least 25% headroom. Revisit these values as traffic patterns change.
Monitor queue depth continuously: Alert on sustained queue depth greater than zero for more than 30 seconds — this is an early warning sign of connection saturation before 503s begin appearing in client-facing metrics.
Enforce safe rolling deployments: Never place more than N-1 servers in DRAIN or MAINT simultaneously. Codify this constraint in your deployment scripts with an automated guard as shown in Root Cause 5 above.
Enable the HAProxy stats page on a management interface: Real-time visibility into server status, connection counts, queue depth, and error rates is invaluable during incidents:

frontend stats
    bind 10.10.1.1:8404
    stats enable
    stats uri /haproxy-stats
    stats refresh 10s
    stats auth infrarunbook-admin:ChangeMeNow123
    stats hide-version

Use
option redispatch
with retries: This allows HAProxy to retry a failed request on a different server rather than immediately returning 503 on the first backend failure, which adds resilience during partial outages:

defaults
    option redispatch
    retries 3

Validate firewall changes in staging first: Any network ACL modification in the path between HAProxy and its backends must be tested in a non-production environment that mirrors the production network topology before it is applied to production systems.
Automate certificate expiry monitoring: Check backend certificate expiry dates and alert well in advance. A 30-day warning prevents surprise TLS-induced 503 incidents:

echo | openssl s_client -connect 10.10.1.11:8443 2>/dev/null | openssl x509 -noout -dates

Symptoms

Root Cause 1: All Backends Down

Why It Happens

How to Identify It

How to Fix It

Root Cause 2: Health Check Path Wrong

Why It Happens

How to Identify It

How to Fix It

Root Cause 3: Max Connection Limit Reached

Why It Happens

How to Identify It

How to Fix It

Root Cause 4: Backend Queue Full

Why It Happens

How to Identify It

How to Fix It

Root Cause 5: No Server in Ready State

Why It Happens

How to Identify It

How to Fix It

Root Cause 6: Backend Connection Timeout Misconfiguration

Why It Happens

How to Identify It

How to Fix It

Root Cause 7: Firewall Blocking Backend Connections

Why It Happens

How to Identify It

How to Fix It

Root Cause 8: SSL/TLS Issues on HTTPS Backends

Why It Happens

How to Identify It

How to Fix It

Prevention

Related Articles

Frequently Asked Questions

How do I tell if the 503 is coming from HAProxy itself or from my application?

What does the SC-- termination flag mean in HAProxy logs?

Can I serve a custom 503 error page instead of the default HAProxy message?

How do I remove a server from rotation for maintenance without triggering 503 errors?

What is the difference between DRAIN and MAINT state in HAProxy?

How do I increase the retry count before HAProxy returns a 503?

Why does HAProxy show a server as UP but I am still getting 503 errors?

How can I monitor the total number of 503 errors HAProxy has returned?

How do I set up an alert when a backend server goes DOWN in HAProxy?

Does HAProxy support TCP-level health checks for non-HTTP backends?

Related Articles