Symptoms
When Nginx returns a 502 Bad Gateway, the browser displays an HTTP 502 status code. The default Nginx error page reads:
502 Bad Gateway
nginx/1.24.0In the access log you will see the 502 recorded alongside the upstream address:
192.168.10.45 - infrarunbook-admin [04/Apr/2026:08:12:33 +0000] "GET /api/status HTTP/1.1" 502 157 "-" "Mozilla/5.0"The actual cause is almost always visible in the Nginx error log at the error level. A representative entry looks like:
2026/04/04 08:12:33 [error] 1234#1234: *42 connect() failed (111: Connection refused) while connecting to upstream, client: 192.168.10.45, server: solvethenetwork.com, request: "GET /api/status HTTP/1.1", upstream: "http://10.10.10.20:8080/api/status", host: "solvethenetwork.com"A 502 is always an Nginx-to-upstream communication failure. Nginx received the client request, attempted to proxy it to the backend, and the backend either refused the connection, never responded, or returned something Nginx could not parse as a valid HTTP response. The error log is the primary diagnostic tool — read it before doing anything else.
Root Cause 1: Upstream Server Is Down
Why It Happens
The most common cause of a 502 is that the backend application — Node.js, Gunicorn, PHP-FPM, Tomcat, or any other upstream process — has crashed, been stopped, or is not yet listening on the expected port. When Nginx attempts a TCP connection to the upstream address, the OS kernel responds with a TCP RST (connection refused) because no process is bound to that port.
How to Identify It
Tail the Nginx error log in real time:
tail -f /var/log/nginx/error.logThe key phrase to look for is connect() failed (111: Connection refused):
2026/04/04 08:14:55 [error] 2210#2210: *88 connect() failed (111: Connection refused) while connecting to upstream, client: 192.168.10.101, server: solvethenetwork.com, request: "POST /login HTTP/1.1", upstream: "http://127.0.0.1:3000/login", host: "solvethenetwork.com"Confirm nothing is listening on the expected port using
ss:
ss -tlnp | grep 3000When the app is down, the command produces no output. When the app is running correctly:
LISTEN 0 511 127.0.0.1:3000 0.0.0.0:* users:(("node",pid=4502,fd=22))Check the upstream service status directly:
systemctl status app-backendA crashed service shows:
● app-backend.service - Application Backend
Loaded: loaded (/etc/systemd/system/app-backend.service; enabled)
Active: failed (Result: exit-code) since Fri 2026-04-04 08:10:01 UTC; 5min ago
Process: 4490 ExecStart=/usr/bin/node /opt/app/server.js (code=exited, status=1/FAILURE)How to Fix It
Start or restart the upstream service and verify it is listening:
systemctl restart app-backend
ss -tlnp | grep 3000If the service fails to start, inspect the journal for application errors:
journalctl -u app-backend -n 50 --no-pagerFor PHP-FPM:
systemctl restart php8.2-fpm
ss -tlnp | grep 9000No Nginx reload is required. Once the upstream process is listening, Nginx will successfully proxy on the next request.
Root Cause 2: Wrong proxy_pass Directive
Why It Happens
A misconfigured
proxy_passURL points Nginx at a host, port, or path that does not match what the backend actually serves. Common mistakes include a wrong port number, an HTTP vs HTTPS mismatch, a trailing slash that rewrites the URI unintentionally, or a hostname typo in the config file.
How to Identify It
Dump all
proxy_passvalues from the running configuration:
nginx -T | grep proxy_passOutput:
proxy_pass http://10.10.10.20:8081/;Then check what port the backend actually listens on from sw-infrarunbook-01:
ssh infrarunbook-admin@10.10.10.20 "ss -tlnp | grep -E '8080|8081'"Output:
LISTEN 0 511 0.0.0.0:8080 0.0.0.0:* users:(("java",pid=7812,fd=45))The app is on port 8080 but
proxy_passtargets port 8081. The error log confirms connection refused on 8081. Trailing slash bugs are subtler: with
proxy_pass http://10.10.10.20:8080/;a request for
/api/usersforwards as
/users, stripping the
/apiprefix, which causes backend 404s that can surface as 502s if Nginx has
proxy_intercept_errors on.
How to Fix It
Edit
/etc/nginx/sites-available/solvethenetwork.com. Before:
location /api/ {
proxy_pass http://10.10.10.20:8081/;
}After:
location /api/ {
proxy_pass http://10.10.10.20:8080/api/;
}Validate syntax and reload:
nginx -t && systemctl reload nginxExpected output:
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successfulRoot Cause 3: DNS Resolution Failure
Why It Happens
When
proxy_passuses a hostname instead of an IP (e.g.,
proxy_pass http://backend.solvethenetwork.com:8080), Nginx resolves the hostname once at startup by default — not per request. If the DNS resolver is unavailable at startup, the upstream block fails to initialize. If DNS was available at startup but the record later changes (a backend host is replaced or rotated), Nginx keeps using the stale IP indefinitely until the next reload, causing 502s for every request routed to that dead IP. In an
upstream {}block, any server FQDN that fails to resolve at reload time can mark the entire pool invalid.
How to Identify It
A DNS failure at reload time produces an emerg log entry:
nginx: [emerg] host not found in upstream "backend.solvethenetwork.com" in /etc/nginx/sites-available/solvethenetwork.com:12A runtime DNS failure (no resolver configured for a variable-based upstream) looks like:
2026/04/04 10:30:44 [error] 2210#2210: *204 no resolver defined to resolve backend.solvethenetwork.com, client: 192.168.10.77, server: solvethenetwork.com, request: "GET / HTTP/1.1", upstream: "http://backend.solvethenetwork.com:8080/", host: "solvethenetwork.com"Test DNS resolution from sw-infrarunbook-01 directly:
dig backend.solvethenetwork.com @10.10.10.1A broken DNS zone returns an empty answer section:
;; ANSWER SECTION:
(empty)
;; Query time: 2 msec
;; SERVER: 10.10.10.1#53(10.10.10.1)Verify the nameserver in
/etc/resolv.confis reachable:
cat /etc/resolv.conf
nc -zv 10.10.10.1 53How to Fix It
Option A — Use IP addresses directly (best for static infrastructure):
location / {
proxy_pass http://10.10.10.20:8080;
}Option B — Declare a resolver and use a variable to force per-request DNS lookup:
http {
resolver 10.10.10.1 10.10.10.2 valid=30s;
resolver_timeout 5s;
server {
location / {
set $backend "backend.solvethenetwork.com";
proxy_pass http://$backend:8080;
}
}
}The variable assignment forces Nginx to consult the resolver on every request rather than caching the result from startup. The
valid=30sparameter caps how long Nginx trusts the DNS TTL, ensuring stale records are discarded every 30 seconds.
nginx -t && systemctl reload nginxRoot Cause 4: Proxy Timeout
Why It Happens
Nginx enforces several timeout values during upstream communication. If a backend is slow to accept connections, slow to begin sending response headers, or transmits a large response body in very slow chunks, Nginx gives up and returns a 502 (or a 504, depending on where in the cycle the timeout fires). The critical directives are:
- proxy_connect_timeout — maximum time to establish a TCP connection to the upstream (default: 60s)
- proxy_read_timeout — maximum idle time between two successive read operations from the upstream response (default: 60s)
- proxy_send_timeout — maximum idle time between two successive write operations to the upstream (default: 60s)
A 502 from a connection timeout is common when
proxy_connect_timeoutis set aggressively low (e.g., 1s) or when a backend is overloaded and not accepting new TCP connections within that window.
How to Identify It
The error log will say upstream timed out:
2026/04/04 11:45:02 [error] 2210#2210: *310 upstream timed out (110: Connection timed out) while connecting to upstream, client: 192.168.10.99, server: solvethenetwork.com, request: "GET /reports/generate HTTP/1.1", upstream: "http://10.10.10.20:8080/reports/generate", host: "solvethenetwork.com"The phrase while connecting = connect timeout. While reading response header = read timeout. Measure actual backend response time from sw-infrarunbook-01:
curl -o /dev/null -s -w "connect: %{time_connect}s\nttfb: %{time_starttransfer}s\ntotal: %{time_total}s\n" http://10.10.10.20:8080/reports/generateOutput revealing a slow backend:
connect: 0.002s
ttfb: 65.481s
total: 65.499sThe backend takes 65 seconds to send the first byte; with a 60-second
proxy_read_timeout, Nginx always kills it first. Check the current timeout configuration:
nginx -T | grep -E 'proxy_(connect|read|send)_timeout'How to Fix It
Short-term — raise timeouts on the specific location where slow endpoints live:
location /reports/ {
proxy_connect_timeout 10s;
proxy_read_timeout 120s;
proxy_send_timeout 120s;
proxy_pass http://10.10.10.20:8080;
}Long-term — profile and fix the backend. Higher timeouts are a workaround, not a solution. If the connect timeout fires, also verify basic network reachability:
ping -c 3 10.10.10.20
traceroute 10.10.10.20
nc -zv 10.10.10.20 8080Root Cause 5: Backend Sending an Invalid HTTP Response
Why It Happens
Nginx is an HTTP proxy with a strict parser. When the upstream returns data that does not conform to HTTP/1.1, Nginx cannot forward it and returns 502. Common invalid response scenarios include:
- An empty response — the backend accepted the TCP connection then closed it without writing any bytes
- The response begins with binary data rather than an HTTP status line
- Malformed headers — missing the blank line separating headers from the body, or invalid header syntax
- A
Content-Length
value that does not match the actual body length - An SSL/TLS handshake error when
proxy_pass
uses HTTPS but the backend presents an expired or self-signed certificate that Nginx is configured to verify
How to Identify It
The error log will contain one of these signatures:
2026/04/04 13:10:22 [error] 2210#2210: *415 upstream sent invalid header while reading response header from upstream, client: 192.168.10.33, server: solvethenetwork.com, request: "GET /healthz HTTP/1.1", upstream: "http://10.10.10.30:9000/healthz", host: "solvethenetwork.com"2026/04/04 13:11:58 [error] 2210#2210: *420 upstream prematurely closed connection while reading response header from upstream, client: 192.168.10.33, server: solvethenetwork.com, request: "GET /healthz HTTP/1.1", upstream: "http://10.10.10.30:9000/healthz", host: "solvethenetwork.com"Bypass Nginx and curl the backend directly from sw-infrarunbook-01 to see the raw response:
curl -v http://10.10.10.30:9000/healthz 2>&1 | head -40A healthy backend returns:
< HTTP/1.1 200 OK
< Content-Type: application/json
< Content-Length: 15
<
{"status":"ok"}A broken backend shows:
* Connected to 10.10.10.30 (10.10.10.30) port 9000 (#0)
> GET /healthz HTTP/1.1
> Host: 10.10.10.30:9000
>
* Empty reply from server
curl: (52) Empty reply from serverFor HTTPS upstreams, inspect the TLS certificate directly:
openssl s_client -connect 10.10.10.30:443 -servername backend.solvethenetwork.com 2>&1 | grep -E 'Verify|subject|issuer|error'How to Fix It
The authoritative fix is always to correct the backend application so it emits valid HTTP. For an empty response, check the backend for crashes mid-response:
ssh infrarunbook-admin@10.10.10.30 "journalctl -u backend-app -n 100 --no-pager | grep -E 'error|panic|segfault'"For SSL certificate verification failures on HTTPS upstreams where the backend uses a private CA, either add the CA to the system trust store or configure Nginx to trust it explicitly:
location / {
proxy_pass https://10.10.10.30:443;
proxy_ssl_trusted_certificate /etc/ssl/certs/internal-ca.crt;
proxy_ssl_verify on;
proxy_ssl_verify_depth 2;
}Disabling verification entirely (
proxy_ssl_verify off) is only acceptable on a private network segment and should be treated as a temporary workaround.
Root Cause 6: Unix Domain Socket Misconfiguration
Why It Happens
When Nginx proxies to a local process over a Unix domain socket — the standard pattern for PHP-FPM, Gunicorn, and uWSGI — a 502 occurs if the socket file does not exist, has incorrect ownership, or the backend process is not listening on it. The socket file disappears when the backend process stops and is recreated only when it starts again.
How to Identify It
The error log clearly states the socket path and the OS error:
2026/04/04 14:00:11 [error] 2210#2210: *511 connect() to unix:/run/php/php8.2-fpm.sock failed (2: No such file or directory) while connecting to upstream, client: 192.168.10.20, server: solvethenetwork.com, request: "GET /index.php HTTP/1.1", upstream: "fastcgi_pass unix:/run/php/php8.2-fpm.sock", host: "solvethenetwork.com"Verify whether the socket file exists and inspect its permissions:
ls -la /run/php/php8.2-fpm.sockWhen missing:
ls: cannot access '/run/php/php8.2-fpm.sock': No such file or directoryWhen present but with wrong ownership (Nginx user cannot connect):
srw-rw---- 1 www-data www-data 0 Apr 4 14:02 /run/php/php8.2-fpm.sockHow to Fix It
Restart PHP-FPM to recreate the socket:
systemctl restart php8.2-fpm
ls -la /run/php/php8.2-fpm.sockIf the Nginx worker user does not match the socket group, edit the FPM pool configuration at
/etc/php/8.2/fpm/pool.d/www.conf. Before:
listen.owner = www-data
listen.group = www-data
listen.mode = 0660After (if Nginx runs as the
nginxuser):
listen.owner = nginx
listen.group = nginx
listen.mode = 0660Restart FPM to apply:
systemctl restart php8.2-fpmRoot Cause 7: Upstream Connection Limit Exhausted
Why It Happens
Under sustained load, Nginx workers or backend servers can exhaust their maximum concurrent connection counts. If
worker_connectionsis set too low in
nginx.conf, Nginx cannot open new upstream connections even if the backend is healthy. If the upstream process has its own concurrency cap (e.g., Gunicorn's
--workerscount or a thread pool limit), excess requests will be refused and result in 502s.
How to Identify It
The Nginx error log will emit worker connection warnings:
2026/04/04 15:22:44 [warn] 2210#2210: 512 worker_connections are not enough while connecting to upstreamCheck live connections from sw-infrarunbook-01 to the upstream:
ss -tn dst 10.10.10.20 | wc -lInspect Nginx stub status (requires the stub_status module to be enabled):
curl http://127.0.0.1/nginx_statusOutput indicating saturation:
Active connections: 512
server accepts handled requests
18293 18293 24100
Reading: 0 Writing: 512 Waiting: 0Writing: 512matching the
worker_connectionsvalue and
Waiting: 0means all connections are active and the pool is exhausted.
How to Fix It
Increase
worker_connectionsin
/etc/nginx/nginx.conf. Before:
events {
worker_connections 512;
}After:
events {
worker_connections 4096;
use epoll;
multi_accept on;
}Raise the OS file descriptor limit for the Nginx user. Add to
/etc/security/limits.conf:
nginx soft nofile 65535
nginx hard nofile 65535Also set the system-wide limit in the Nginx systemd unit or
/etc/nginx/nginx.conf:
worker_rlimit_nofile 65535;For upstream keepalive tuning to reduce connection churn:
upstream backend_pool {
server 10.10.10.20:8080;
server 10.10.10.21:8080;
keepalive 64;
keepalive_requests 1000;
keepalive_timeout 60s;
}Reload Nginx after all changes:
nginx -t && systemctl reload nginxPrevention
Most 502 errors are preventable with a combination of monitoring, health checks, and conservative configuration defaults.
Enable upstream health checks. Nginx Plus supports active health checks natively. In open-source Nginx, use passive health checks via the
max_failsand
fail_timeoutparameters on upstream server blocks:
upstream backend_pool {
server 10.10.10.20:8080 max_fails=3 fail_timeout=30s;
server 10.10.10.21:8080 max_fails=3 fail_timeout=30s;
server 10.10.10.22:8080 backup;
}After three consecutive failures within 30 seconds, Nginx marks that upstream peer as unavailable and routes traffic to surviving peers. The
backupserver only receives traffic when all primary peers are marked down.
Always tail the error log during deployments. Configure log aggregation so errors are visible in a central dashboard. Use a log format that includes upstream response time:
log_format upstream_timing '$remote_addr - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent '
'upstream=$upstream_addr '
'upstream_status=$upstream_status '
'upstream_response_time=$upstream_response_time '
'request_time=$request_time';Use IP addresses in proxy_pass for static backends. Avoid hostname-based
proxy_passwithout a
resolverdirective. DNS changes will not be reflected until the next Nginx reload, silently routing traffic to dead hosts.
Set realistic timeout values per location. Global timeouts should be conservative. Endpoints that legitimately need more time (report generation, file exports) should have their own
locationblock with higher timeout values, keeping the global defaults strict for all other paths.
Monitor upstream processes with systemd and alerting. Configure
Restart=on-failurein your upstream service unit files so crashed backends automatically restart:
[Service]
Restart=on-failure
RestartSec=3sTest configuration before every reload. Never skip
nginx -t. A failed reload leaves the previous config in place and silently discards your changes — or worse, if it is the first load, Nginx does not start at all. Automate the test in any deployment pipeline that touches Nginx configuration files.
Size worker_connections to your expected traffic. A conservative starting formula is:
worker_processes * worker_connections >= peak concurrent connections * 2. Monitor Nginx stub status continuously and alert before the pool saturates.
Frequently Asked Questions
Q: What is the fastest way to find the root cause of a 502?
A: Run
tail -f /var/log/nginx/error.logand reproduce the request. The error log entry will almost always identify the upstream address, the OS error code, and the phase of communication that failed (connecting, reading, writing). That single log line narrows the cause to one of the categories in this article.
Q: What is the difference between a 502 Bad Gateway and a 504 Gateway Timeout?
A: Both originate from upstream communication failures. A 502 means Nginx received an invalid or empty response, or the backend actively refused the connection. A 504 means Nginx waited the full timeout period without receiving a complete response — the backend was reachable but too slow. In practice, a very short
proxy_connect_timeout(1–2 seconds) can produce a 502 when the backend is simply busy, which is often mistaken for a genuine connectivity failure.
Q: Does a 502 always mean the backend is down?
A: No. The backend process can be running and healthy while Nginx still returns 502 due to a misconfigured
proxy_passport, a DNS resolution failure, an expired TLS certificate on an HTTPS upstream, or a Unix socket permission mismatch. Always check the error log message to distinguish connectivity failures from configuration errors.
Q: How do I tell if Nginx itself is the problem versus the backend?
A: Bypass Nginx and connect directly to the upstream from the same host:
curl -v http://10.10.10.20:8080/endpoint. If the direct request succeeds, Nginx's configuration or networking is the issue. If the direct request also fails, the backend itself is the problem.
Q: Why does my 502 only happen intermittently under load?
A: Intermittent 502s under load usually indicate one of: exhausted upstream connection pool (check
worker_connectionsand stub status), overloaded backend workers timing out, or a subset of upstream servers in the pool being down while others are healthy. Enable detailed upstream logging (including
$upstream_addrand
$upstream_status) to identify which specific backend peer is failing.
Q: Can I configure Nginx to retry failed upstream requests automatically?
A: Yes. The
proxy_next_upstreamdirective controls which error conditions trigger a retry on the next peer in the upstream pool. By default it retries on
errorand
timeout. You can extend it:
proxy_next_upstream error timeout http_502 http_503;
proxy_next_upstream_tries 3;
proxy_next_upstream_timeout 10s;Be careful with non-idempotent requests (POST, PUT, PATCH) — retrying them can cause duplicate operations on the backend.
Q: How do I suppress the default Nginx 502 error page and show my own?
A: Use the
error_pagedirective in your server block:
error_page 502 /errors/502.html;
location = /errors/502.html {
root /var/www/solvethenetwork.com;
internal;
}The
internalflag prevents external direct access to the error page URL.
Q: My backend uses a self-signed TLS certificate. Why does Nginx return 502?
A: By default,
proxy_ssl_verifyis off in most Nginx builds, so self-signed certificates are typically accepted. If you have explicitly enabled
proxy_ssl_verify onwithout providing a trusted CA bundle, Nginx will reject the certificate and return 502. Either provide the CA certificate via
proxy_ssl_trusted_certificateor, for fully private internal traffic where network security controls already protect the path, disable verification with
proxy_ssl_verify off.
Q: How do I check which upstream server in a pool is generating 502 errors?
A: Add
$upstream_addrand
$upstream_statusto your access log format. Then query the logs:
awk '$upstream_status == 502 {print $upstream_addr}' /var/log/nginx/access.log | sort | uniq -c | sort -rnThis reveals which specific backend IP is responsible for the majority of failures, letting you target that host for investigation without touching the healthy peers.
Q: What does "no live upstreams while connecting to upstream" mean?
A: This error means every server in the upstream pool has been marked as unavailable by Nginx's passive health check logic (i.e., each peer exceeded
max_failswithin
fail_timeout). Nginx has no healthy peer to route to. This is most often caused by a deployment outage where all backend instances went down simultaneously, or by setting
max_fails=1with a high
fail_timeoutso a single transient failure blacklists a peer for a long time.
Q: Should I use an upstream block or proxy_pass directly to an IP?
A: Use an
upstreamblock whenever you have more than one backend server, need load balancing, want passive health check parameters (
max_fails,
fail_timeout), or plan to enable keepalive connections. For a single static backend that will never scale, a direct
proxy_pass http://10.10.10.20:8080is simpler and has identical performance. The upstream block gives you more control and visibility at the cost of a few extra config lines.
Q: How can I confirm Nginx is picking up a config change without a full restart?
A: Always run
nginx -tfirst to validate syntax, then
systemctl reload nginx(or
nginx -s reload). A reload performs a graceful configuration swap — new worker processes start with the new config, in-flight requests on old workers complete normally, and old workers exit cleanly. Existing upstream connections are not dropped mid-request. Only use
systemctl restart nginxwhen a reload fails or when changing module configurations that require a full process restart.
