InfraRunBook
    Back to articles

    Nginx 502 Bad Gateway Deep Dive

    Nginx
    Published: Apr 4, 2026
    Updated: Apr 4, 2026

    A thorough technical breakdown of every root cause behind Nginx 502 Bad Gateway errors, with real error log signatures, diagnostic commands, and targeted fixes for each scenario.

    Nginx 502 Bad Gateway Deep Dive

    Symptoms

    When Nginx returns a 502 Bad Gateway, the browser displays an HTTP 502 status code. The default Nginx error page reads:

    502 Bad Gateway
    nginx/1.24.0

    In the access log you will see the 502 recorded alongside the upstream address:

    192.168.10.45 - infrarunbook-admin [04/Apr/2026:08:12:33 +0000] "GET /api/status HTTP/1.1" 502 157 "-" "Mozilla/5.0"

    The actual cause is almost always visible in the Nginx error log at the error level. A representative entry looks like:

    2026/04/04 08:12:33 [error] 1234#1234: *42 connect() failed (111: Connection refused) while connecting to upstream, client: 192.168.10.45, server: solvethenetwork.com, request: "GET /api/status HTTP/1.1", upstream: "http://10.10.10.20:8080/api/status", host: "solvethenetwork.com"

    A 502 is always an Nginx-to-upstream communication failure. Nginx received the client request, attempted to proxy it to the backend, and the backend either refused the connection, never responded, or returned something Nginx could not parse as a valid HTTP response. The error log is the primary diagnostic tool — read it before doing anything else.


    Root Cause 1: Upstream Server Is Down

    Why It Happens

    The most common cause of a 502 is that the backend application — Node.js, Gunicorn, PHP-FPM, Tomcat, or any other upstream process — has crashed, been stopped, or is not yet listening on the expected port. When Nginx attempts a TCP connection to the upstream address, the OS kernel responds with a TCP RST (connection refused) because no process is bound to that port.

    How to Identify It

    Tail the Nginx error log in real time:

    tail -f /var/log/nginx/error.log

    The key phrase to look for is connect() failed (111: Connection refused):

    2026/04/04 08:14:55 [error] 2210#2210: *88 connect() failed (111: Connection refused) while connecting to upstream, client: 192.168.10.101, server: solvethenetwork.com, request: "POST /login HTTP/1.1", upstream: "http://127.0.0.1:3000/login", host: "solvethenetwork.com"

    Confirm nothing is listening on the expected port using

    ss
    :

    ss -tlnp | grep 3000

    When the app is down, the command produces no output. When the app is running correctly:

    LISTEN 0 511 127.0.0.1:3000 0.0.0.0:* users:(("node",pid=4502,fd=22))

    Check the upstream service status directly:

    systemctl status app-backend

    A crashed service shows:

    ● app-backend.service - Application Backend
       Loaded: loaded (/etc/systemd/system/app-backend.service; enabled)
       Active: failed (Result: exit-code) since Fri 2026-04-04 08:10:01 UTC; 5min ago
      Process: 4490 ExecStart=/usr/bin/node /opt/app/server.js (code=exited, status=1/FAILURE)

    How to Fix It

    Start or restart the upstream service and verify it is listening:

    systemctl restart app-backend
    ss -tlnp | grep 3000

    If the service fails to start, inspect the journal for application errors:

    journalctl -u app-backend -n 50 --no-pager

    For PHP-FPM:

    systemctl restart php8.2-fpm
    ss -tlnp | grep 9000

    No Nginx reload is required. Once the upstream process is listening, Nginx will successfully proxy on the next request.


    Root Cause 2: Wrong proxy_pass Directive

    Why It Happens

    A misconfigured

    proxy_pass
    URL points Nginx at a host, port, or path that does not match what the backend actually serves. Common mistakes include a wrong port number, an HTTP vs HTTPS mismatch, a trailing slash that rewrites the URI unintentionally, or a hostname typo in the config file.

    How to Identify It

    Dump all

    proxy_pass
    values from the running configuration:

    nginx -T | grep proxy_pass

    Output:

            proxy_pass http://10.10.10.20:8081/;

    Then check what port the backend actually listens on from sw-infrarunbook-01:

    ssh infrarunbook-admin@10.10.10.20 "ss -tlnp | grep -E '8080|8081'"

    Output:

    LISTEN 0 511 0.0.0.0:8080 0.0.0.0:* users:(("java",pid=7812,fd=45))

    The app is on port 8080 but

    proxy_pass
    targets port 8081. The error log confirms connection refused on 8081. Trailing slash bugs are subtler: with
    proxy_pass http://10.10.10.20:8080/;
    a request for
    /api/users
    forwards as
    /users
    , stripping the
    /api
    prefix, which causes backend 404s that can surface as 502s if Nginx has
    proxy_intercept_errors on
    .

    How to Fix It

    Edit

    /etc/nginx/sites-available/solvethenetwork.com
    . Before:

    location /api/ {
        proxy_pass http://10.10.10.20:8081/;
    }

    After:

    location /api/ {
        proxy_pass http://10.10.10.20:8080/api/;
    }

    Validate syntax and reload:

    nginx -t && systemctl reload nginx

    Expected output:

    nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
    nginx: configuration file /etc/nginx/nginx.conf test is successful

    Root Cause 3: DNS Resolution Failure

    Why It Happens

    When

    proxy_pass
    uses a hostname instead of an IP (e.g.,
    proxy_pass http://backend.solvethenetwork.com:8080
    ), Nginx resolves the hostname once at startup by default — not per request. If the DNS resolver is unavailable at startup, the upstream block fails to initialize. If DNS was available at startup but the record later changes (a backend host is replaced or rotated), Nginx keeps using the stale IP indefinitely until the next reload, causing 502s for every request routed to that dead IP. In an
    upstream {}
    block, any server FQDN that fails to resolve at reload time can mark the entire pool invalid.

    How to Identify It

    A DNS failure at reload time produces an emerg log entry:

    nginx: [emerg] host not found in upstream "backend.solvethenetwork.com" in /etc/nginx/sites-available/solvethenetwork.com:12

    A runtime DNS failure (no resolver configured for a variable-based upstream) looks like:

    2026/04/04 10:30:44 [error] 2210#2210: *204 no resolver defined to resolve backend.solvethenetwork.com, client: 192.168.10.77, server: solvethenetwork.com, request: "GET / HTTP/1.1", upstream: "http://backend.solvethenetwork.com:8080/", host: "solvethenetwork.com"

    Test DNS resolution from sw-infrarunbook-01 directly:

    dig backend.solvethenetwork.com @10.10.10.1

    A broken DNS zone returns an empty answer section:

    ;; ANSWER SECTION:
    (empty)
    
    ;; Query time: 2 msec
    ;; SERVER: 10.10.10.1#53(10.10.10.1)

    Verify the nameserver in

    /etc/resolv.conf
    is reachable:

    cat /etc/resolv.conf
    nc -zv 10.10.10.1 53

    How to Fix It

    Option A — Use IP addresses directly (best for static infrastructure):

    location / {
        proxy_pass http://10.10.10.20:8080;
    }

    Option B — Declare a resolver and use a variable to force per-request DNS lookup:

    http {
        resolver 10.10.10.1 10.10.10.2 valid=30s;
        resolver_timeout 5s;
    
        server {
            location / {
                set $backend "backend.solvethenetwork.com";
                proxy_pass http://$backend:8080;
            }
        }
    }

    The variable assignment forces Nginx to consult the resolver on every request rather than caching the result from startup. The

    valid=30s
    parameter caps how long Nginx trusts the DNS TTL, ensuring stale records are discarded every 30 seconds.

    nginx -t && systemctl reload nginx

    Root Cause 4: Proxy Timeout

    Why It Happens

    Nginx enforces several timeout values during upstream communication. If a backend is slow to accept connections, slow to begin sending response headers, or transmits a large response body in very slow chunks, Nginx gives up and returns a 502 (or a 504, depending on where in the cycle the timeout fires). The critical directives are:

    • proxy_connect_timeout — maximum time to establish a TCP connection to the upstream (default: 60s)
    • proxy_read_timeout — maximum idle time between two successive read operations from the upstream response (default: 60s)
    • proxy_send_timeout — maximum idle time between two successive write operations to the upstream (default: 60s)

    A 502 from a connection timeout is common when

    proxy_connect_timeout
    is set aggressively low (e.g., 1s) or when a backend is overloaded and not accepting new TCP connections within that window.

    How to Identify It

    The error log will say upstream timed out:

    2026/04/04 11:45:02 [error] 2210#2210: *310 upstream timed out (110: Connection timed out) while connecting to upstream, client: 192.168.10.99, server: solvethenetwork.com, request: "GET /reports/generate HTTP/1.1", upstream: "http://10.10.10.20:8080/reports/generate", host: "solvethenetwork.com"

    The phrase while connecting = connect timeout. While reading response header = read timeout. Measure actual backend response time from sw-infrarunbook-01:

    curl -o /dev/null -s -w "connect: %{time_connect}s\nttfb: %{time_starttransfer}s\ntotal: %{time_total}s\n" http://10.10.10.20:8080/reports/generate

    Output revealing a slow backend:

    connect: 0.002s
    ttfb: 65.481s
    total: 65.499s

    The backend takes 65 seconds to send the first byte; with a 60-second

    proxy_read_timeout
    , Nginx always kills it first. Check the current timeout configuration:

    nginx -T | grep -E 'proxy_(connect|read|send)_timeout'

    How to Fix It

    Short-term — raise timeouts on the specific location where slow endpoints live:

    location /reports/ {
        proxy_connect_timeout 10s;
        proxy_read_timeout    120s;
        proxy_send_timeout    120s;
        proxy_pass http://10.10.10.20:8080;
    }

    Long-term — profile and fix the backend. Higher timeouts are a workaround, not a solution. If the connect timeout fires, also verify basic network reachability:

    ping -c 3 10.10.10.20
    traceroute 10.10.10.20
    nc -zv 10.10.10.20 8080

    Root Cause 5: Backend Sending an Invalid HTTP Response

    Why It Happens

    Nginx is an HTTP proxy with a strict parser. When the upstream returns data that does not conform to HTTP/1.1, Nginx cannot forward it and returns 502. Common invalid response scenarios include:

    • An empty response — the backend accepted the TCP connection then closed it without writing any bytes
    • The response begins with binary data rather than an HTTP status line
    • Malformed headers — missing the blank line separating headers from the body, or invalid header syntax
    • A
      Content-Length
      value that does not match the actual body length
    • An SSL/TLS handshake error when
      proxy_pass
      uses HTTPS but the backend presents an expired or self-signed certificate that Nginx is configured to verify

    How to Identify It

    The error log will contain one of these signatures:

    2026/04/04 13:10:22 [error] 2210#2210: *415 upstream sent invalid header while reading response header from upstream, client: 192.168.10.33, server: solvethenetwork.com, request: "GET /healthz HTTP/1.1", upstream: "http://10.10.10.30:9000/healthz", host: "solvethenetwork.com"
    2026/04/04 13:11:58 [error] 2210#2210: *420 upstream prematurely closed connection while reading response header from upstream, client: 192.168.10.33, server: solvethenetwork.com, request: "GET /healthz HTTP/1.1", upstream: "http://10.10.10.30:9000/healthz", host: "solvethenetwork.com"

    Bypass Nginx and curl the backend directly from sw-infrarunbook-01 to see the raw response:

    curl -v http://10.10.10.30:9000/healthz 2>&1 | head -40

    A healthy backend returns:

    < HTTP/1.1 200 OK
    < Content-Type: application/json
    < Content-Length: 15
    <
    {"status":"ok"}

    A broken backend shows:

    * Connected to 10.10.10.30 (10.10.10.30) port 9000 (#0)
    > GET /healthz HTTP/1.1
    > Host: 10.10.10.30:9000
    >
    * Empty reply from server
    curl: (52) Empty reply from server

    For HTTPS upstreams, inspect the TLS certificate directly:

    openssl s_client -connect 10.10.10.30:443 -servername backend.solvethenetwork.com 2>&1 | grep -E 'Verify|subject|issuer|error'

    How to Fix It

    The authoritative fix is always to correct the backend application so it emits valid HTTP. For an empty response, check the backend for crashes mid-response:

    ssh infrarunbook-admin@10.10.10.30 "journalctl -u backend-app -n 100 --no-pager | grep -E 'error|panic|segfault'"

    For SSL certificate verification failures on HTTPS upstreams where the backend uses a private CA, either add the CA to the system trust store or configure Nginx to trust it explicitly:

    location / {
        proxy_pass https://10.10.10.30:443;
        proxy_ssl_trusted_certificate /etc/ssl/certs/internal-ca.crt;
        proxy_ssl_verify       on;
        proxy_ssl_verify_depth 2;
    }

    Disabling verification entirely (

    proxy_ssl_verify off
    ) is only acceptable on a private network segment and should be treated as a temporary workaround.


    Root Cause 6: Unix Domain Socket Misconfiguration

    Why It Happens

    When Nginx proxies to a local process over a Unix domain socket — the standard pattern for PHP-FPM, Gunicorn, and uWSGI — a 502 occurs if the socket file does not exist, has incorrect ownership, or the backend process is not listening on it. The socket file disappears when the backend process stops and is recreated only when it starts again.

    How to Identify It

    The error log clearly states the socket path and the OS error:

    2026/04/04 14:00:11 [error] 2210#2210: *511 connect() to unix:/run/php/php8.2-fpm.sock failed (2: No such file or directory) while connecting to upstream, client: 192.168.10.20, server: solvethenetwork.com, request: "GET /index.php HTTP/1.1", upstream: "fastcgi_pass unix:/run/php/php8.2-fpm.sock", host: "solvethenetwork.com"

    Verify whether the socket file exists and inspect its permissions:

    ls -la /run/php/php8.2-fpm.sock

    When missing:

    ls: cannot access '/run/php/php8.2-fpm.sock': No such file or directory

    When present but with wrong ownership (Nginx user cannot connect):

    srw-rw---- 1 www-data www-data 0 Apr  4 14:02 /run/php/php8.2-fpm.sock

    How to Fix It

    Restart PHP-FPM to recreate the socket:

    systemctl restart php8.2-fpm
    ls -la /run/php/php8.2-fpm.sock

    If the Nginx worker user does not match the socket group, edit the FPM pool configuration at

    /etc/php/8.2/fpm/pool.d/www.conf
    . Before:

    listen.owner = www-data
    listen.group = www-data
    listen.mode  = 0660

    After (if Nginx runs as the

    nginx
    user):

    listen.owner = nginx
    listen.group = nginx
    listen.mode  = 0660

    Restart FPM to apply:

    systemctl restart php8.2-fpm

    Root Cause 7: Upstream Connection Limit Exhausted

    Why It Happens

    Under sustained load, Nginx workers or backend servers can exhaust their maximum concurrent connection counts. If

    worker_connections
    is set too low in
    nginx.conf
    , Nginx cannot open new upstream connections even if the backend is healthy. If the upstream process has its own concurrency cap (e.g., Gunicorn's
    --workers
    count or a thread pool limit), excess requests will be refused and result in 502s.

    How to Identify It

    The Nginx error log will emit worker connection warnings:

    2026/04/04 15:22:44 [warn] 2210#2210: 512 worker_connections are not enough while connecting to upstream

    Check live connections from sw-infrarunbook-01 to the upstream:

    ss -tn dst 10.10.10.20 | wc -l

    Inspect Nginx stub status (requires the stub_status module to be enabled):

    curl http://127.0.0.1/nginx_status

    Output indicating saturation:

    Active connections: 512
    server accepts handled requests
     18293 18293 24100
    Reading: 0 Writing: 512 Waiting: 0

    Writing: 512
    matching the
    worker_connections
    value and
    Waiting: 0
    means all connections are active and the pool is exhausted.

    How to Fix It

    Increase

    worker_connections
    in
    /etc/nginx/nginx.conf
    . Before:

    events {
        worker_connections 512;
    }

    After:

    events {
        worker_connections 4096;
        use epoll;
        multi_accept on;
    }

    Raise the OS file descriptor limit for the Nginx user. Add to

    /etc/security/limits.conf
    :

    nginx   soft   nofile   65535
    nginx   hard   nofile   65535

    Also set the system-wide limit in the Nginx systemd unit or

    /etc/nginx/nginx.conf
    :

    worker_rlimit_nofile 65535;

    For upstream keepalive tuning to reduce connection churn:

    upstream backend_pool {
        server 10.10.10.20:8080;
        server 10.10.10.21:8080;
        keepalive         64;
        keepalive_requests 1000;
        keepalive_timeout  60s;
    }

    Reload Nginx after all changes:

    nginx -t && systemctl reload nginx

    Prevention

    Most 502 errors are preventable with a combination of monitoring, health checks, and conservative configuration defaults.

    Enable upstream health checks. Nginx Plus supports active health checks natively. In open-source Nginx, use passive health checks via the

    max_fails
    and
    fail_timeout
    parameters on upstream server blocks:

    upstream backend_pool {
        server 10.10.10.20:8080 max_fails=3 fail_timeout=30s;
        server 10.10.10.21:8080 max_fails=3 fail_timeout=30s;
        server 10.10.10.22:8080 backup;
    }

    After three consecutive failures within 30 seconds, Nginx marks that upstream peer as unavailable and routes traffic to surviving peers. The

    backup
    server only receives traffic when all primary peers are marked down.

    Always tail the error log during deployments. Configure log aggregation so errors are visible in a central dashboard. Use a log format that includes upstream response time:

    log_format upstream_timing '$remote_addr - $remote_user [$time_local] '
        '"$request" $status $body_bytes_sent '
        'upstream=$upstream_addr '
        'upstream_status=$upstream_status '
        'upstream_response_time=$upstream_response_time '
        'request_time=$request_time';

    Use IP addresses in proxy_pass for static backends. Avoid hostname-based

    proxy_pass
    without a
    resolver
    directive. DNS changes will not be reflected until the next Nginx reload, silently routing traffic to dead hosts.

    Set realistic timeout values per location. Global timeouts should be conservative. Endpoints that legitimately need more time (report generation, file exports) should have their own

    location
    block with higher timeout values, keeping the global defaults strict for all other paths.

    Monitor upstream processes with systemd and alerting. Configure

    Restart=on-failure
    in your upstream service unit files so crashed backends automatically restart:

    [Service]
    Restart=on-failure
    RestartSec=3s

    Test configuration before every reload. Never skip

    nginx -t
    . A failed reload leaves the previous config in place and silently discards your changes — or worse, if it is the first load, Nginx does not start at all. Automate the test in any deployment pipeline that touches Nginx configuration files.

    Size worker_connections to your expected traffic. A conservative starting formula is:

    worker_processes * worker_connections >= peak concurrent connections * 2
    . Monitor Nginx stub status continuously and alert before the pool saturates.


    Related Articles

    Frequently Asked Questions

    What is the fastest way to find the root cause of a 502?

    Run tail -f /var/log/nginx/error.log and reproduce the request. The error log entry identifies the upstream address, the OS error code, and the phase of communication that failed (connecting, reading, writing), narrowing the cause immediately.

    What is the difference between a 502 Bad Gateway and a 504 Gateway Timeout?

    A 502 means Nginx received an invalid or empty response, or the backend refused the connection. A 504 means Nginx waited the full timeout period without a complete response. A very short proxy_connect_timeout can produce a 502 when the backend is simply busy.

    Does a 502 always mean the backend is down?

    No. The backend can be running while Nginx returns 502 due to a misconfigured proxy_pass port, DNS resolution failure, expired TLS certificate on an HTTPS upstream, or Unix socket permission mismatch. Always read the error log message first.

    How do I tell if Nginx itself is the problem versus the backend?

    Bypass Nginx and connect directly: curl -v http://10.10.10.20:8080/endpoint. If the direct request succeeds, Nginx's configuration or networking is the issue. If it also fails, the backend itself is the problem.

    Why does my 502 only happen intermittently under load?

    Intermittent 502s under load typically indicate exhausted upstream connection pools, overloaded backend workers timing out, or a subset of upstream pool servers being down. Enable upstream logging with $upstream_addr and $upstream_status to identify which peer is failing.

    Can I configure Nginx to retry failed upstream requests automatically?

    Yes. Use proxy_next_upstream to define which error conditions trigger retries, proxy_next_upstream_tries to set the retry count, and proxy_next_upstream_timeout for the total retry window. Be careful with non-idempotent methods like POST to avoid duplicate operations.

    How do I show a custom error page instead of the default Nginx 502 page?

    Use the error_page directive: error_page 502 /errors/502.html; then define a location block for that path with the internal flag to prevent direct external access.

    My backend uses a self-signed TLS certificate. Why does Nginx return 502?

    If proxy_ssl_verify is on without a trusted CA bundle, Nginx rejects the certificate and returns 502. Either provide the CA via proxy_ssl_trusted_certificate or disable verification with proxy_ssl_verify off for fully private internal network paths.

    How do I check which upstream server in a pool is generating 502 errors?

    Add $upstream_addr and $upstream_status to your access log format, then parse the log: awk '$upstream_status == 502 {print $upstream_addr}' /var/log/nginx/access.log | sort | uniq -c | sort -rn

    What does 'no live upstreams while connecting to upstream' mean?

    Every server in the upstream pool has been marked unavailable by Nginx's passive health check logic — each peer exceeded max_fails within fail_timeout. This usually means all backend instances went down simultaneously, or max_fails is set too low causing transient errors to blacklist peers for too long.

    Related Articles