InfraRunBook
    Back to articles

    Nginx 502 Bad Gateway Deep Dive

    Nginx
    Published: Apr 4, 2026
    Updated: Apr 4, 2026

    A thorough technical breakdown of every root cause behind Nginx 502 Bad Gateway errors, with real error log signatures, diagnostic commands, and targeted fixes for each scenario.

    Nginx 502 Bad Gateway Deep Dive

    Symptoms

    When Nginx returns a 502 Bad Gateway, the browser displays an HTTP 502 status code. The default Nginx error page reads:

    502 Bad Gateway
    nginx/1.24.0

    In the access log you will see the 502 recorded alongside the upstream address:

    192.168.10.45 - infrarunbook-admin [04/Apr/2026:08:12:33 +0000] "GET /api/status HTTP/1.1" 502 157 "-" "Mozilla/5.0"

    The actual cause is almost always visible in the Nginx error log at the error level. A representative entry looks like:

    2026/04/04 08:12:33 [error] 1234#1234: *42 connect() failed (111: Connection refused) while connecting to upstream, client: 192.168.10.45, server: solvethenetwork.com, request: "GET /api/status HTTP/1.1", upstream: "http://10.10.10.20:8080/api/status", host: "solvethenetwork.com"

    A 502 is always an Nginx-to-upstream communication failure. Nginx received the client request, attempted to proxy it to the backend, and the backend either refused the connection, never responded, or returned something Nginx could not parse as a valid HTTP response. The error log is the primary diagnostic tool — read it before doing anything else.


    Root Cause 1: Upstream Server Is Down

    Why It Happens

    The most common cause of a 502 is that the backend application — Node.js, Gunicorn, PHP-FPM, Tomcat, or any other upstream process — has crashed, been stopped, or is not yet listening on the expected port. When Nginx attempts a TCP connection to the upstream address, the OS kernel responds with a TCP RST (connection refused) because no process is bound to that port.

    How to Identify It

    Tail the Nginx error log in real time:

    tail -f /var/log/nginx/error.log

    The key phrase to look for is connect() failed (111: Connection refused):

    2026/04/04 08:14:55 [error] 2210#2210: *88 connect() failed (111: Connection refused) while connecting to upstream, client: 192.168.10.101, server: solvethenetwork.com, request: "POST /login HTTP/1.1", upstream: "http://127.0.0.1:3000/login", host: "solvethenetwork.com"

    Confirm nothing is listening on the expected port using

    ss
    :

    ss -tlnp | grep 3000

    When the app is down, the command produces no output. When the app is running correctly:

    LISTEN 0 511 127.0.0.1:3000 0.0.0.0:* users:(("node",pid=4502,fd=22))

    Check the upstream service status directly:

    systemctl status app-backend

    A crashed service shows:

    ● app-backend.service - Application Backend
       Loaded: loaded (/etc/systemd/system/app-backend.service; enabled)
       Active: failed (Result: exit-code) since Fri 2026-04-04 08:10:01 UTC; 5min ago
      Process: 4490 ExecStart=/usr/bin/node /opt/app/server.js (code=exited, status=1/FAILURE)

    How to Fix It

    Start or restart the upstream service and verify it is listening:

    systemctl restart app-backend
    ss -tlnp | grep 3000

    If the service fails to start, inspect the journal for application errors:

    journalctl -u app-backend -n 50 --no-pager

    For PHP-FPM:

    systemctl restart php8.2-fpm
    ss -tlnp | grep 9000

    No Nginx reload is required. Once the upstream process is listening, Nginx will successfully proxy on the next request.


    Root Cause 2: Wrong proxy_pass Directive

    Why It Happens

    A misconfigured

    proxy_pass
    URL points Nginx at a host, port, or path that does not match what the backend actually serves. Common mistakes include a wrong port number, an HTTP vs HTTPS mismatch, a trailing slash that rewrites the URI unintentionally, or a hostname typo in the config file.

    How to Identify It

    Dump all

    proxy_pass
    values from the running configuration:

    nginx -T | grep proxy_pass

    Output:

            proxy_pass http://10.10.10.20:8081/;

    Then check what port the backend actually listens on from sw-infrarunbook-01:

    ssh infrarunbook-admin@10.10.10.20 "ss -tlnp | grep -E '8080|8081'"

    Output:

    LISTEN 0 511 0.0.0.0:8080 0.0.0.0:* users:(("java",pid=7812,fd=45))

    The app is on port 8080 but

    proxy_pass
    targets port 8081. The error log confirms connection refused on 8081. Trailing slash bugs are subtler: with
    proxy_pass http://10.10.10.20:8080/;
    a request for
    /api/users
    forwards as
    /users
    , stripping the
    /api
    prefix, which causes backend 404s that can surface as 502s if Nginx has
    proxy_intercept_errors on
    .

    How to Fix It

    Edit

    /etc/nginx/sites-available/solvethenetwork.com
    . Before:

    location /api/ {
        proxy_pass http://10.10.10.20:8081/;
    }

    After:

    location /api/ {
        proxy_pass http://10.10.10.20:8080/api/;
    }

    Validate syntax and reload:

    nginx -t && systemctl reload nginx

    Expected output:

    nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
    nginx: configuration file /etc/nginx/nginx.conf test is successful

    Root Cause 3: DNS Resolution Failure

    Why It Happens

    When

    proxy_pass
    uses a hostname instead of an IP (e.g.,
    proxy_pass http://backend.solvethenetwork.com:8080
    ), Nginx resolves the hostname once at startup by default — not per request. If the DNS resolver is unavailable at startup, the upstream block fails to initialize. If DNS was available at startup but the record later changes (a backend host is replaced or rotated), Nginx keeps using the stale IP indefinitely until the next reload, causing 502s for every request routed to that dead IP. In an
    upstream {}
    block, any server FQDN that fails to resolve at reload time can mark the entire pool invalid.

    How to Identify It

    A DNS failure at reload time produces an emerg log entry:

    nginx: [emerg] host not found in upstream "backend.solvethenetwork.com" in /etc/nginx/sites-available/solvethenetwork.com:12

    A runtime DNS failure (no resolver configured for a variable-based upstream) looks like:

    2026/04/04 10:30:44 [error] 2210#2210: *204 no resolver defined to resolve backend.solvethenetwork.com, client: 192.168.10.77, server: solvethenetwork.com, request: "GET / HTTP/1.1", upstream: "http://backend.solvethenetwork.com:8080/", host: "solvethenetwork.com"

    Test DNS resolution from sw-infrarunbook-01 directly:

    dig backend.solvethenetwork.com @10.10.10.1

    A broken DNS zone returns an empty answer section:

    ;; ANSWER SECTION:
    (empty)
    
    ;; Query time: 2 msec
    ;; SERVER: 10.10.10.1#53(10.10.10.1)

    Verify the nameserver in

    /etc/resolv.conf
    is reachable:

    cat /etc/resolv.conf
    nc -zv 10.10.10.1 53

    How to Fix It

    Option A — Use IP addresses directly (best for static infrastructure):

    location / {
        proxy_pass http://10.10.10.20:8080;
    }

    Option B — Declare a resolver and use a variable to force per-request DNS lookup:

    http {
        resolver 10.10.10.1 10.10.10.2 valid=30s;
        resolver_timeout 5s;
    
        server {
            location / {
                set $backend "backend.solvethenetwork.com";
                proxy_pass http://$backend:8080;
            }
        }
    }

    The variable assignment forces Nginx to consult the resolver on every request rather than caching the result from startup. The

    valid=30s
    parameter caps how long Nginx trusts the DNS TTL, ensuring stale records are discarded every 30 seconds.

    nginx -t && systemctl reload nginx

    Root Cause 4: Proxy Timeout

    Why It Happens

    Nginx enforces several timeout values during upstream communication. If a backend is slow to accept connections, slow to begin sending response headers, or transmits a large response body in very slow chunks, Nginx gives up and returns a 502 (or a 504, depending on where in the cycle the timeout fires). The critical directives are:

    • proxy_connect_timeout — maximum time to establish a TCP connection to the upstream (default: 60s)
    • proxy_read_timeout — maximum idle time between two successive read operations from the upstream response (default: 60s)
    • proxy_send_timeout — maximum idle time between two successive write operations to the upstream (default: 60s)

    A 502 from a connection timeout is common when

    proxy_connect_timeout
    is set aggressively low (e.g., 1s) or when a backend is overloaded and not accepting new TCP connections within that window.

    How to Identify It

    The error log will say upstream timed out:

    2026/04/04 11:45:02 [error] 2210#2210: *310 upstream timed out (110: Connection timed out) while connecting to upstream, client: 192.168.10.99, server: solvethenetwork.com, request: "GET /reports/generate HTTP/1.1", upstream: "http://10.10.10.20:8080/reports/generate", host: "solvethenetwork.com"

    The phrase while connecting = connect timeout. While reading response header = read timeout. Measure actual backend response time from sw-infrarunbook-01:

    curl -o /dev/null -s -w "connect: %{time_connect}s\nttfb: %{time_starttransfer}s\ntotal: %{time_total}s\n" http://10.10.10.20:8080/reports/generate

    Output revealing a slow backend:

    connect: 0.002s
    ttfb: 65.481s
    total: 65.499s

    The backend takes 65 seconds to send the first byte; with a 60-second

    proxy_read_timeout
    , Nginx always kills it first. Check the current timeout configuration:

    nginx -T | grep -E 'proxy_(connect|read|send)_timeout'

    How to Fix It

    Short-term — raise timeouts on the specific location where slow endpoints live:

    location /reports/ {
        proxy_connect_timeout 10s;
        proxy_read_timeout    120s;
        proxy_send_timeout    120s;
        proxy_pass http://10.10.10.20:8080;
    }

    Long-term — profile and fix the backend. Higher timeouts are a workaround, not a solution. If the connect timeout fires, also verify basic network reachability:

    ping -c 3 10.10.10.20
    traceroute 10.10.10.20
    nc -zv 10.10.10.20 8080

    Root Cause 5: Backend Sending an Invalid HTTP Response

    Why It Happens

    Nginx is an HTTP proxy with a strict parser. When the upstream returns data that does not conform to HTTP/1.1, Nginx cannot forward it and returns 502. Common invalid response scenarios include:

    • An empty response — the backend accepted the TCP connection then closed it without writing any bytes
    • The response begins with binary data rather than an HTTP status line
    • Malformed headers — missing the blank line separating headers from the body, or invalid header syntax
    • A
      Content-Length
      value that does not match the actual body length
    • An SSL/TLS handshake error when
      proxy_pass
      uses HTTPS but the backend presents an expired or self-signed certificate that Nginx is configured to verify

    How to Identify It

    The error log will contain one of these signatures:

    2026/04/04 13:10:22 [error] 2210#2210: *415 upstream sent invalid header while reading response header from upstream, client: 192.168.10.33, server: solvethenetwork.com, request: "GET /healthz HTTP/1.1", upstream: "http://10.10.10.30:9000/healthz", host: "solvethenetwork.com"
    2026/04/04 13:11:58 [error] 2210#2210: *420 upstream prematurely closed connection while reading response header from upstream, client: 192.168.10.33, server: solvethenetwork.com, request: "GET /healthz HTTP/1.1", upstream: "http://10.10.10.30:9000/healthz", host: "solvethenetwork.com"

    Bypass Nginx and curl the backend directly from sw-infrarunbook-01 to see the raw response:

    curl -v http://10.10.10.30:9000/healthz 2>&1 | head -40

    A healthy backend returns:

    < HTTP/1.1 200 OK
    < Content-Type: application/json
    < Content-Length: 15
    <
    {"status":"ok"}

    A broken backend shows:

    * Connected to 10.10.10.30 (10.10.10.30) port 9000 (#0)
    > GET /healthz HTTP/1.1
    > Host: 10.10.10.30:9000
    >
    * Empty reply from server
    curl: (52) Empty reply from server

    For HTTPS upstreams, inspect the TLS certificate directly:

    openssl s_client -connect 10.10.10.30:443 -servername backend.solvethenetwork.com 2>&1 | grep -E 'Verify|subject|issuer|error'

    How to Fix It

    The authoritative fix is always to correct the backend application so it emits valid HTTP. For an empty response, check the backend for crashes mid-response:

    ssh infrarunbook-admin@10.10.10.30 "journalctl -u backend-app -n 100 --no-pager | grep -E 'error|panic|segfault'"

    For SSL certificate verification failures on HTTPS upstreams where the backend uses a private CA, either add the CA to the system trust store or configure Nginx to trust it explicitly:

    location / {
        proxy_pass https://10.10.10.30:443;
        proxy_ssl_trusted_certificate /etc/ssl/certs/internal-ca.crt;
        proxy_ssl_verify       on;
        proxy_ssl_verify_depth 2;
    }

    Disabling verification entirely (

    proxy_ssl_verify off
    ) is only acceptable on a private network segment and should be treated as a temporary workaround.


    Root Cause 6: Unix Domain Socket Misconfiguration

    Why It Happens

    When Nginx proxies to a local process over a Unix domain socket — the standard pattern for PHP-FPM, Gunicorn, and uWSGI — a 502 occurs if the socket file does not exist, has incorrect ownership, or the backend process is not listening on it. The socket file disappears when the backend process stops and is recreated only when it starts again.

    How to Identify It

    The error log clearly states the socket path and the OS error:

    2026/04/04 14:00:11 [error] 2210#2210: *511 connect() to unix:/run/php/php8.2-fpm.sock failed (2: No such file or directory) while connecting to upstream, client: 192.168.10.20, server: solvethenetwork.com, request: "GET /index.php HTTP/1.1", upstream: "fastcgi_pass unix:/run/php/php8.2-fpm.sock", host: "solvethenetwork.com"

    Verify whether the socket file exists and inspect its permissions:

    ls -la /run/php/php8.2-fpm.sock

    When missing:

    ls: cannot access '/run/php/php8.2-fpm.sock': No such file or directory

    When present but with wrong ownership (Nginx user cannot connect):

    srw-rw---- 1 www-data www-data 0 Apr  4 14:02 /run/php/php8.2-fpm.sock

    How to Fix It

    Restart PHP-FPM to recreate the socket:

    systemctl restart php8.2-fpm
    ls -la /run/php/php8.2-fpm.sock

    If the Nginx worker user does not match the socket group, edit the FPM pool configuration at

    /etc/php/8.2/fpm/pool.d/www.conf
    . Before:

    listen.owner = www-data
    listen.group = www-data
    listen.mode  = 0660

    After (if Nginx runs as the

    nginx
    user):

    listen.owner = nginx
    listen.group = nginx
    listen.mode  = 0660

    Restart FPM to apply:

    systemctl restart php8.2-fpm

    Root Cause 7: Upstream Connection Limit Exhausted

    Why It Happens

    Under sustained load, Nginx workers or backend servers can exhaust their maximum concurrent connection counts. If

    worker_connections
    is set too low in
    nginx.conf
    , Nginx cannot open new upstream connections even if the backend is healthy. If the upstream process has its own concurrency cap (e.g., Gunicorn's
    --workers
    count or a thread pool limit), excess requests will be refused and result in 502s.

    How to Identify It

    The Nginx error log will emit worker connection warnings:

    2026/04/04 15:22:44 [warn] 2210#2210: 512 worker_connections are not enough while connecting to upstream

    Check live connections from sw-infrarunbook-01 to the upstream:

    ss -tn dst 10.10.10.20 | wc -l

    Inspect Nginx stub status (requires the stub_status module to be enabled):

    curl http://127.0.0.1/nginx_status

    Output indicating saturation:

    Active connections: 512
    server accepts handled requests
     18293 18293 24100
    Reading: 0 Writing: 512 Waiting: 0

    Writing: 512
    matching the
    worker_connections
    value and
    Waiting: 0
    means all connections are active and the pool is exhausted.

    How to Fix It

    Increase

    worker_connections
    in
    /etc/nginx/nginx.conf
    . Before:

    events {
        worker_connections 512;
    }

    After:

    events {
        worker_connections 4096;
        use epoll;
        multi_accept on;
    }

    Raise the OS file descriptor limit for the Nginx user. Add to

    /etc/security/limits.conf
    :

    nginx   soft   nofile   65535
    nginx   hard   nofile   65535

    Also set the system-wide limit in the Nginx systemd unit or

    /etc/nginx/nginx.conf
    :

    worker_rlimit_nofile 65535;

    For upstream keepalive tuning to reduce connection churn:

    upstream backend_pool {
        server 10.10.10.20:8080;
        server 10.10.10.21:8080;
        keepalive         64;
        keepalive_requests 1000;
        keepalive_timeout  60s;
    }

    Reload Nginx after all changes:

    nginx -t && systemctl reload nginx

    Prevention

    Most 502 errors are preventable with a combination of monitoring, health checks, and conservative configuration defaults.

    Enable upstream health checks. Nginx Plus supports active health checks natively. In open-source Nginx, use passive health checks via the

    max_fails
    and
    fail_timeout
    parameters on upstream server blocks:

    upstream backend_pool {
        server 10.10.10.20:8080 max_fails=3 fail_timeout=30s;
        server 10.10.10.21:8080 max_fails=3 fail_timeout=30s;
        server 10.10.10.22:8080 backup;
    }

    After three consecutive failures within 30 seconds, Nginx marks that upstream peer as unavailable and routes traffic to surviving peers. The

    backup
    server only receives traffic when all primary peers are marked down.

    Always tail the error log during deployments. Configure log aggregation so errors are visible in a central dashboard. Use a log format that includes upstream response time:

    log_format upstream_timing '$remote_addr - $remote_user [$time_local] '
        '"$request" $status $body_bytes_sent '
        'upstream=$upstream_addr '
        'upstream_status=$upstream_status '
        'upstream_response_time=$upstream_response_time '
        'request_time=$request_time';

    Use IP addresses in proxy_pass for static backends. Avoid hostname-based

    proxy_pass
    without a
    resolver
    directive. DNS changes will not be reflected until the next Nginx reload, silently routing traffic to dead hosts.

    Set realistic timeout values per location. Global timeouts should be conservative. Endpoints that legitimately need more time (report generation, file exports) should have their own

    location
    block with higher timeout values, keeping the global defaults strict for all other paths.

    Monitor upstream processes with systemd and alerting. Configure

    Restart=on-failure
    in your upstream service unit files so crashed backends automatically restart:

    [Service]
    Restart=on-failure
    RestartSec=3s

    Test configuration before every reload. Never skip

    nginx -t
    . A failed reload leaves the previous config in place and silently discards your changes — or worse, if it is the first load, Nginx does not start at all. Automate the test in any deployment pipeline that touches Nginx configuration files.

    Size worker_connections to your expected traffic. A conservative starting formula is:

    worker_processes * worker_connections >= peak concurrent connections * 2
    . Monitor Nginx stub status continuously and alert before the pool saturates.


    Frequently Asked Questions

    Q: What is the fastest way to find the root cause of a 502?

    A: Run

    tail -f /var/log/nginx/error.log
    and reproduce the request. The error log entry will almost always identify the upstream address, the OS error code, and the phase of communication that failed (connecting, reading, writing). That single log line narrows the cause to one of the categories in this article.

    Q: What is the difference between a 502 Bad Gateway and a 504 Gateway Timeout?

    A: Both originate from upstream communication failures. A 502 means Nginx received an invalid or empty response, or the backend actively refused the connection. A 504 means Nginx waited the full timeout period without receiving a complete response — the backend was reachable but too slow. In practice, a very short

    proxy_connect_timeout
    (1–2 seconds) can produce a 502 when the backend is simply busy, which is often mistaken for a genuine connectivity failure.

    Q: Does a 502 always mean the backend is down?

    A: No. The backend process can be running and healthy while Nginx still returns 502 due to a misconfigured

    proxy_pass
    port, a DNS resolution failure, an expired TLS certificate on an HTTPS upstream, or a Unix socket permission mismatch. Always check the error log message to distinguish connectivity failures from configuration errors.

    Q: How do I tell if Nginx itself is the problem versus the backend?

    A: Bypass Nginx and connect directly to the upstream from the same host:

    curl -v http://10.10.10.20:8080/endpoint
    . If the direct request succeeds, Nginx's configuration or networking is the issue. If the direct request also fails, the backend itself is the problem.

    Q: Why does my 502 only happen intermittently under load?

    A: Intermittent 502s under load usually indicate one of: exhausted upstream connection pool (check

    worker_connections
    and stub status), overloaded backend workers timing out, or a subset of upstream servers in the pool being down while others are healthy. Enable detailed upstream logging (including
    $upstream_addr
    and
    $upstream_status
    ) to identify which specific backend peer is failing.

    Q: Can I configure Nginx to retry failed upstream requests automatically?

    A: Yes. The

    proxy_next_upstream
    directive controls which error conditions trigger a retry on the next peer in the upstream pool. By default it retries on
    error
    and
    timeout
    . You can extend it:

    proxy_next_upstream error timeout http_502 http_503;
    proxy_next_upstream_tries 3;
    proxy_next_upstream_timeout 10s;

    Be careful with non-idempotent requests (POST, PUT, PATCH) — retrying them can cause duplicate operations on the backend.

    Q: How do I suppress the default Nginx 502 error page and show my own?

    A: Use the

    error_page
    directive in your server block:

    error_page 502 /errors/502.html;
    location = /errors/502.html {
        root /var/www/solvethenetwork.com;
        internal;
    }

    The

    internal
    flag prevents external direct access to the error page URL.

    Q: My backend uses a self-signed TLS certificate. Why does Nginx return 502?

    A: By default,

    proxy_ssl_verify
    is off in most Nginx builds, so self-signed certificates are typically accepted. If you have explicitly enabled
    proxy_ssl_verify on
    without providing a trusted CA bundle, Nginx will reject the certificate and return 502. Either provide the CA certificate via
    proxy_ssl_trusted_certificate
    or, for fully private internal traffic where network security controls already protect the path, disable verification with
    proxy_ssl_verify off
    .

    Q: How do I check which upstream server in a pool is generating 502 errors?

    A: Add

    $upstream_addr
    and
    $upstream_status
    to your access log format. Then query the logs:

    awk '$upstream_status == 502 {print $upstream_addr}' /var/log/nginx/access.log | sort | uniq -c | sort -rn

    This reveals which specific backend IP is responsible for the majority of failures, letting you target that host for investigation without touching the healthy peers.

    Q: What does "no live upstreams while connecting to upstream" mean?

    A: This error means every server in the upstream pool has been marked as unavailable by Nginx's passive health check logic (i.e., each peer exceeded

    max_fails
    within
    fail_timeout
    ). Nginx has no healthy peer to route to. This is most often caused by a deployment outage where all backend instances went down simultaneously, or by setting
    max_fails=1
    with a high
    fail_timeout
    so a single transient failure blacklists a peer for a long time.

    Q: Should I use an upstream block or proxy_pass directly to an IP?

    A: Use an

    upstream
    block whenever you have more than one backend server, need load balancing, want passive health check parameters (
    max_fails
    ,
    fail_timeout
    ), or plan to enable keepalive connections. For a single static backend that will never scale, a direct
    proxy_pass http://10.10.10.20:8080
    is simpler and has identical performance. The upstream block gives you more control and visibility at the cost of a few extra config lines.

    Q: How can I confirm Nginx is picking up a config change without a full restart?

    A: Always run

    nginx -t
    first to validate syntax, then
    systemctl reload nginx
    (or
    nginx -s reload
    ). A reload performs a graceful configuration swap — new worker processes start with the new config, in-flight requests on old workers complete normally, and old workers exit cleanly. Existing upstream connections are not dropped mid-request. Only use
    systemctl restart nginx
    when a reload fails or when changing module configurations that require a full process restart.

    Frequently Asked Questions

    What is the fastest way to find the root cause of a 502?

    Run tail -f /var/log/nginx/error.log and reproduce the request. The error log entry identifies the upstream address, the OS error code, and the phase of communication that failed (connecting, reading, writing), narrowing the cause immediately.

    What is the difference between a 502 Bad Gateway and a 504 Gateway Timeout?

    A 502 means Nginx received an invalid or empty response, or the backend refused the connection. A 504 means Nginx waited the full timeout period without a complete response. A very short proxy_connect_timeout can produce a 502 when the backend is simply busy.

    Does a 502 always mean the backend is down?

    No. The backend can be running while Nginx returns 502 due to a misconfigured proxy_pass port, DNS resolution failure, expired TLS certificate on an HTTPS upstream, or Unix socket permission mismatch. Always read the error log message first.

    How do I tell if Nginx itself is the problem versus the backend?

    Bypass Nginx and connect directly: curl -v http://10.10.10.20:8080/endpoint. If the direct request succeeds, Nginx's configuration or networking is the issue. If it also fails, the backend itself is the problem.

    Why does my 502 only happen intermittently under load?

    Intermittent 502s under load typically indicate exhausted upstream connection pools, overloaded backend workers timing out, or a subset of upstream pool servers being down. Enable upstream logging with $upstream_addr and $upstream_status to identify which peer is failing.

    Can I configure Nginx to retry failed upstream requests automatically?

    Yes. Use proxy_next_upstream to define which error conditions trigger retries, proxy_next_upstream_tries to set the retry count, and proxy_next_upstream_timeout for the total retry window. Be careful with non-idempotent methods like POST to avoid duplicate operations.

    How do I show a custom error page instead of the default Nginx 502 page?

    Use the error_page directive: error_page 502 /errors/502.html; then define a location block for that path with the internal flag to prevent direct external access.

    My backend uses a self-signed TLS certificate. Why does Nginx return 502?

    If proxy_ssl_verify is on without a trusted CA bundle, Nginx rejects the certificate and returns 502. Either provide the CA via proxy_ssl_trusted_certificate or disable verification with proxy_ssl_verify off for fully private internal network paths.

    How do I check which upstream server in a pool is generating 502 errors?

    Add $upstream_addr and $upstream_status to your access log format, then parse the log: awk '$upstream_status == 502 {print $upstream_addr}' /var/log/nginx/access.log | sort | uniq -c | sort -rn

    What does 'no live upstreams while connecting to upstream' mean?

    Every server in the upstream pool has been marked unavailable by Nginx's passive health check logic — each peer exceeded max_fails within fail_timeout. This usually means all backend instances went down simultaneously, or max_fails is set too low causing transient errors to blacklist peers for too long.

    Related Articles