Nginx 502 Bad Gateway Deep Dive

Symptoms

When Nginx returns a 502 Bad Gateway, the browser displays an HTTP 502 status code. The default Nginx error page reads:

502 Bad Gateway
nginx/1.24.0

In the access log you will see the 502 recorded alongside the upstream address:

192.168.10.45 - infrarunbook-admin [04/Apr/2026:08:12:33 +0000] "GET /api/status HTTP/1.1" 502 157 "-" "Mozilla/5.0"

The actual cause is almost always visible in the Nginx error log at the error level. A representative entry looks like:

2026/04/04 08:12:33 [error] 1234#1234: *42 connect() failed (111: Connection refused) while connecting to upstream, client: 192.168.10.45, server: solvethenetwork.com, request: "GET /api/status HTTP/1.1", upstream: "http://10.10.10.20:8080/api/status", host: "solvethenetwork.com"

A 502 is always an Nginx-to-upstream communication failure. Nginx received the client request, attempted to proxy it to the backend, and the backend either refused the connection, never responded, or returned something Nginx could not parse as a valid HTTP response. The error log is the primary diagnostic tool — read it before doing anything else.

Root Cause 1: Upstream Server Is Down

Why It Happens

The most common cause of a 502 is that the backend application — Node.js, Gunicorn, PHP-FPM, Tomcat, or any other upstream process — has crashed, been stopped, or is not yet listening on the expected port. When Nginx attempts a TCP connection to the upstream address, the OS kernel responds with a TCP RST (connection refused) because no process is bound to that port.

How to Identify It

Tail the Nginx error log in real time:

tail -f /var/log/nginx/error.log

The key phrase to look for is connect() failed (111: Connection refused):

2026/04/04 08:14:55 [error] 2210#2210: *88 connect() failed (111: Connection refused) while connecting to upstream, client: 192.168.10.101, server: solvethenetwork.com, request: "POST /login HTTP/1.1", upstream: "http://127.0.0.1:3000/login", host: "solvethenetwork.com"

Confirm nothing is listening on the expected port using

ss

ss -tlnp | grep 3000

When the app is down, the command produces no output. When the app is running correctly:

LISTEN 0 511 127.0.0.1:3000 0.0.0.0:* users:(("node",pid=4502,fd=22))

Check the upstream service status directly:

systemctl status app-backend

A crashed service shows:

● app-backend.service - Application Backend
   Loaded: loaded (/etc/systemd/system/app-backend.service; enabled)
   Active: failed (Result: exit-code) since Fri 2026-04-04 08:10:01 UTC; 5min ago
  Process: 4490 ExecStart=/usr/bin/node /opt/app/server.js (code=exited, status=1/FAILURE)

How to Fix It

Start or restart the upstream service and verify it is listening:

systemctl restart app-backend
ss -tlnp | grep 3000

If the service fails to start, inspect the journal for application errors:

journalctl -u app-backend -n 50 --no-pager

For PHP-FPM:

systemctl restart php8.2-fpm
ss -tlnp | grep 9000

No Nginx reload is required. Once the upstream process is listening, Nginx will successfully proxy on the next request.

Root Cause 2: Wrong proxy_pass Directive

Why It Happens

A misconfigured

proxy_pass

URL points Nginx at a host, port, or path that does not match what the backend actually serves. Common mistakes include a wrong port number, an HTTP vs HTTPS mismatch, a trailing slash that rewrites the URI unintentionally, or a hostname typo in the config file.

How to Identify It

Dump all

proxy_pass

values from the running configuration:

nginx -T | grep proxy_pass

Output:

        proxy_pass http://10.10.10.20:8081/;

Then check what port the backend actually listens on from sw-infrarunbook-01:

ssh infrarunbook-admin@10.10.10.20 "ss -tlnp | grep -E '8080|8081'"

Output:

LISTEN 0 511 0.0.0.0:8080 0.0.0.0:* users:(("java",pid=7812,fd=45))

The app is on port 8080 but

proxy_pass

targets port 8081. The error log confirms connection refused on 8081. Trailing slash bugs are subtler: with

proxy_pass http://10.10.10.20:8080/;

a request for

/api/users

forwards as

/users

, stripping the

/api

prefix, which causes backend 404s that can surface as 502s if Nginx has

proxy_intercept_errors on

How to Fix It

Edit

/etc/nginx/sites-available/solvethenetwork.com

. Before:

location /api/ {
    proxy_pass http://10.10.10.20:8081/;
}

After:

location /api/ {
    proxy_pass http://10.10.10.20:8080/api/;
}

Validate syntax and reload:

nginx -t && systemctl reload nginx

Expected output:

nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful

Root Cause 3: DNS Resolution Failure

Why It Happens

When

proxy_pass

uses a hostname instead of an IP (e.g.,

proxy_pass http://backend.solvethenetwork.com:8080

), Nginx resolves the hostname once at startup by default — not per request. If the DNS resolver is unavailable at startup, the upstream block fails to initialize. If DNS was available at startup but the record later changes (a backend host is replaced or rotated), Nginx keeps using the stale IP indefinitely until the next reload, causing 502s for every request routed to that dead IP. In an

upstream {}

block, any server FQDN that fails to resolve at reload time can mark the entire pool invalid.

How to Identify It

A DNS failure at reload time produces an emerg log entry:

nginx: [emerg] host not found in upstream "backend.solvethenetwork.com" in /etc/nginx/sites-available/solvethenetwork.com:12

A runtime DNS failure (no resolver configured for a variable-based upstream) looks like:

2026/04/04 10:30:44 [error] 2210#2210: *204 no resolver defined to resolve backend.solvethenetwork.com, client: 192.168.10.77, server: solvethenetwork.com, request: "GET / HTTP/1.1", upstream: "http://backend.solvethenetwork.com:8080/", host: "solvethenetwork.com"

Test DNS resolution from sw-infrarunbook-01 directly:

dig backend.solvethenetwork.com @10.10.10.1

A broken DNS zone returns an empty answer section:

;; ANSWER SECTION:
(empty)

;; Query time: 2 msec
;; SERVER: 10.10.10.1#53(10.10.10.1)

Verify the nameserver in

/etc/resolv.conf

is reachable:

cat /etc/resolv.conf
nc -zv 10.10.10.1 53

How to Fix It

Option A — Use IP addresses directly (best for static infrastructure):

location / {
    proxy_pass http://10.10.10.20:8080;
}

Option B — Declare a resolver and use a variable to force per-request DNS lookup:

http {
    resolver 10.10.10.1 10.10.10.2 valid=30s;
    resolver_timeout 5s;

    server {
        location / {
            set $backend "backend.solvethenetwork.com";
            proxy_pass http://$backend:8080;
        }
    }
}

The variable assignment forces Nginx to consult the resolver on every request rather than caching the result from startup. The

valid=30s

parameter caps how long Nginx trusts the DNS TTL, ensuring stale records are discarded every 30 seconds.

nginx -t && systemctl reload nginx

Root Cause 4: Proxy Timeout

Why It Happens

Nginx enforces several timeout values during upstream communication. If a backend is slow to accept connections, slow to begin sending response headers, or transmits a large response body in very slow chunks, Nginx gives up and returns a 502 (or a 504, depending on where in the cycle the timeout fires). The critical directives are:

proxy_connect_timeout — maximum time to establish a TCP connection to the upstream (default: 60s)
proxy_read_timeout — maximum idle time between two successive read operations from the upstream response (default: 60s)
proxy_send_timeout — maximum idle time between two successive write operations to the upstream (default: 60s)

A 502 from a connection timeout is common when

proxy_connect_timeout

is set aggressively low (e.g., 1s) or when a backend is overloaded and not accepting new TCP connections within that window.

How to Identify It

The error log will say upstream timed out:

2026/04/04 11:45:02 [error] 2210#2210: *310 upstream timed out (110: Connection timed out) while connecting to upstream, client: 192.168.10.99, server: solvethenetwork.com, request: "GET /reports/generate HTTP/1.1", upstream: "http://10.10.10.20:8080/reports/generate", host: "solvethenetwork.com"

The phrase while connecting = connect timeout. While reading response header = read timeout. Measure actual backend response time from sw-infrarunbook-01:

curl -o /dev/null -s -w "connect: %{time_connect}s\nttfb: %{time_starttransfer}s\ntotal: %{time_total}s\n" http://10.10.10.20:8080/reports/generate

Output revealing a slow backend:

connect: 0.002s
ttfb: 65.481s
total: 65.499s

The backend takes 65 seconds to send the first byte; with a 60-second

proxy_read_timeout

, Nginx always kills it first. Check the current timeout configuration:

nginx -T | grep -E 'proxy_(connect|read|send)_timeout'

How to Fix It

Short-term — raise timeouts on the specific location where slow endpoints live:

location /reports/ {
    proxy_connect_timeout 10s;
    proxy_read_timeout    120s;
    proxy_send_timeout    120s;
    proxy_pass http://10.10.10.20:8080;
}

Long-term — profile and fix the backend. Higher timeouts are a workaround, not a solution. If the connect timeout fires, also verify basic network reachability:

ping -c 3 10.10.10.20
traceroute 10.10.10.20
nc -zv 10.10.10.20 8080

Root Cause 5: Backend Sending an Invalid HTTP Response

Why It Happens

Nginx is an HTTP proxy with a strict parser. When the upstream returns data that does not conform to HTTP/1.1, Nginx cannot forward it and returns 502. Common invalid response scenarios include:

An empty response — the backend accepted the TCP connection then closed it without writing any bytes
The response begins with binary data rather than an HTTP status line
Malformed headers — missing the blank line separating headers from the body, or invalid header syntax
A
Content-Length
value that does not match the actual body length
An SSL/TLS handshake error when
proxy_pass
uses HTTPS but the backend presents an expired or self-signed certificate that Nginx is configured to verify

How to Identify It

The error log will contain one of these signatures:

2026/04/04 13:10:22 [error] 2210#2210: *415 upstream sent invalid header while reading response header from upstream, client: 192.168.10.33, server: solvethenetwork.com, request: "GET /healthz HTTP/1.1", upstream: "http://10.10.10.30:9000/healthz", host: "solvethenetwork.com"

2026/04/04 13:11:58 [error] 2210#2210: *420 upstream prematurely closed connection while reading response header from upstream, client: 192.168.10.33, server: solvethenetwork.com, request: "GET /healthz HTTP/1.1", upstream: "http://10.10.10.30:9000/healthz", host: "solvethenetwork.com"

Bypass Nginx and curl the backend directly from sw-infrarunbook-01 to see the raw response:

curl -v http://10.10.10.30:9000/healthz 2>&1 | head -40

A healthy backend returns:

< HTTP/1.1 200 OK
< Content-Type: application/json
< Content-Length: 15
<
{"status":"ok"}

A broken backend shows:

* Connected to 10.10.10.30 (10.10.10.30) port 9000 (#0)
> GET /healthz HTTP/1.1
> Host: 10.10.10.30:9000
>
* Empty reply from server
curl: (52) Empty reply from server

For HTTPS upstreams, inspect the TLS certificate directly:

openssl s_client -connect 10.10.10.30:443 -servername backend.solvethenetwork.com 2>&1 | grep -E 'Verify|subject|issuer|error'

How to Fix It

The authoritative fix is always to correct the backend application so it emits valid HTTP. For an empty response, check the backend for crashes mid-response:

ssh infrarunbook-admin@10.10.10.30 "journalctl -u backend-app -n 100 --no-pager | grep -E 'error|panic|segfault'"

For SSL certificate verification failures on HTTPS upstreams where the backend uses a private CA, either add the CA to the system trust store or configure Nginx to trust it explicitly:

location / {
    proxy_pass https://10.10.10.30:443;
    proxy_ssl_trusted_certificate /etc/ssl/certs/internal-ca.crt;
    proxy_ssl_verify       on;
    proxy_ssl_verify_depth 2;
}

Disabling verification entirely (

proxy_ssl_verify off

) is only acceptable on a private network segment and should be treated as a temporary workaround.

Root Cause 6: Unix Domain Socket Misconfiguration

Why It Happens

When Nginx proxies to a local process over a Unix domain socket — the standard pattern for PHP-FPM, Gunicorn, and uWSGI — a 502 occurs if the socket file does not exist, has incorrect ownership, or the backend process is not listening on it. The socket file disappears when the backend process stops and is recreated only when it starts again.

How to Identify It

The error log clearly states the socket path and the OS error:

2026/04/04 14:00:11 [error] 2210#2210: *511 connect() to unix:/run/php/php8.2-fpm.sock failed (2: No such file or directory) while connecting to upstream, client: 192.168.10.20, server: solvethenetwork.com, request: "GET /index.php HTTP/1.1", upstream: "fastcgi_pass unix:/run/php/php8.2-fpm.sock", host: "solvethenetwork.com"

Verify whether the socket file exists and inspect its permissions:

ls -la /run/php/php8.2-fpm.sock

When missing:

ls: cannot access '/run/php/php8.2-fpm.sock': No such file or directory

When present but with wrong ownership (Nginx user cannot connect):

srw-rw---- 1 www-data www-data 0 Apr  4 14:02 /run/php/php8.2-fpm.sock

How to Fix It

Restart PHP-FPM to recreate the socket:

systemctl restart php8.2-fpm
ls -la /run/php/php8.2-fpm.sock

If the Nginx worker user does not match the socket group, edit the FPM pool configuration at

/etc/php/8.2/fpm/pool.d/www.conf

. Before:

listen.owner = www-data
listen.group = www-data
listen.mode  = 0660

After (if Nginx runs as the

nginx

user):

listen.owner = nginx
listen.group = nginx
listen.mode  = 0660

Restart FPM to apply:

systemctl restart php8.2-fpm

Root Cause 7: Upstream Connection Limit Exhausted

Why It Happens

Under sustained load, Nginx workers or backend servers can exhaust their maximum concurrent connection counts. If

worker_connections

is set too low in

nginx.conf

, Nginx cannot open new upstream connections even if the backend is healthy. If the upstream process has its own concurrency cap (e.g., Gunicorn's

--workers

count or a thread pool limit), excess requests will be refused and result in 502s.

How to Identify It

The Nginx error log will emit worker connection warnings:

2026/04/04 15:22:44 [warn] 2210#2210: 512 worker_connections are not enough while connecting to upstream

Check live connections from sw-infrarunbook-01 to the upstream:

ss -tn dst 10.10.10.20 | wc -l

Inspect Nginx stub status (requires the stub_status module to be enabled):

curl http://127.0.0.1/nginx_status

Output indicating saturation:

Active connections: 512
server accepts handled requests
 18293 18293 24100
Reading: 0 Writing: 512 Waiting: 0

Writing: 512

matching the

worker_connections

value and

Waiting: 0

means all connections are active and the pool is exhausted.

How to Fix It

Increase

worker_connections

/etc/nginx/nginx.conf

. Before:

events {
    worker_connections 512;
}

After:

events {
    worker_connections 4096;
    use epoll;
    multi_accept on;
}

Raise the OS file descriptor limit for the Nginx user. Add to

/etc/security/limits.conf

nginx   soft   nofile   65535
nginx   hard   nofile   65535

Also set the system-wide limit in the Nginx systemd unit or

/etc/nginx/nginx.conf

worker_rlimit_nofile 65535;

For upstream keepalive tuning to reduce connection churn:

upstream backend_pool {
    server 10.10.10.20:8080;
    server 10.10.10.21:8080;
    keepalive         64;
    keepalive_requests 1000;
    keepalive_timeout  60s;
}

Reload Nginx after all changes:

nginx -t && systemctl reload nginx

Prevention

Most 502 errors are preventable with a combination of monitoring, health checks, and conservative configuration defaults.

Enable upstream health checks. Nginx Plus supports active health checks natively. In open-source Nginx, use passive health checks via the

max_fails

and

fail_timeout

parameters on upstream server blocks:

upstream backend_pool {
    server 10.10.10.20:8080 max_fails=3 fail_timeout=30s;
    server 10.10.10.21:8080 max_fails=3 fail_timeout=30s;
    server 10.10.10.22:8080 backup;
}

After three consecutive failures within 30 seconds, Nginx marks that upstream peer as unavailable and routes traffic to surviving peers. The

backup

server only receives traffic when all primary peers are marked down.

Always tail the error log during deployments. Configure log aggregation so errors are visible in a central dashboard. Use a log format that includes upstream response time:

log_format upstream_timing '$remote_addr - $remote_user [$time_local] '
    '"$request" $status $body_bytes_sent '
    'upstream=$upstream_addr '
    'upstream_status=$upstream_status '
    'upstream_response_time=$upstream_response_time '
    'request_time=$request_time';

Use IP addresses in proxy_pass for static backends. Avoid hostname-based

proxy_pass

without a

resolver

directive. DNS changes will not be reflected until the next Nginx reload, silently routing traffic to dead hosts.

Set realistic timeout values per location. Global timeouts should be conservative. Endpoints that legitimately need more time (report generation, file exports) should have their own

location

block with higher timeout values, keeping the global defaults strict for all other paths.

Monitor upstream processes with systemd and alerting. Configure

Restart=on-failure

in your upstream service unit files so crashed backends automatically restart:

[Service]
Restart=on-failure
RestartSec=3s

Test configuration before every reload. Never skip

nginx -t

. A failed reload leaves the previous config in place and silently discards your changes — or worse, if it is the first load, Nginx does not start at all. Automate the test in any deployment pipeline that touches Nginx configuration files.

Size worker_connections to your expected traffic. A conservative starting formula is:

worker_processes * worker_connections >= peak concurrent connections * 2

. Monitor Nginx stub status continuously and alert before the pool saturates.

Symptoms

Root Cause 1: Upstream Server Is Down

Why It Happens

How to Identify It

How to Fix It

Root Cause 2: Wrong proxy_pass Directive

Why It Happens

How to Identify It

How to Fix It

Root Cause 3: DNS Resolution Failure

Why It Happens

How to Identify It

How to Fix It

Root Cause 4: Proxy Timeout

Why It Happens

How to Identify It

How to Fix It

Root Cause 5: Backend Sending an Invalid HTTP Response

Why It Happens

How to Identify It

How to Fix It

Root Cause 6: Unix Domain Socket Misconfiguration

Why It Happens

How to Identify It

How to Fix It

Root Cause 7: Upstream Connection Limit Exhausted

Why It Happens

How to Identify It

How to Fix It

Prevention

Related Articles

Frequently Asked Questions

What is the fastest way to find the root cause of a 502?

What is the difference between a 502 Bad Gateway and a 504 Gateway Timeout?

Does a 502 always mean the backend is down?

How do I tell if Nginx itself is the problem versus the backend?

Why does my 502 only happen intermittently under load?

Can I configure Nginx to retry failed upstream requests automatically?

How do I show a custom error page instead of the default Nginx 502 page?

My backend uses a self-signed TLS certificate. Why does Nginx return 502?

How do I check which upstream server in a pool is generating 502 errors?

What does 'no live upstreams while connecting to upstream' mean?

Related Articles