InfraRunBook
    Back to articles

    Nginx 502 Bad Gateway Fix

    Nginx
    Published: Apr 13, 2026
    Updated: Apr 13, 2026

    A practical guide to diagnosing and fixing Nginx 502 Bad Gateway errors, covering upstream failures, PHP-FPM issues, timeouts, buffer problems, and Unix socket permissions.

    Nginx 502 Bad Gateway Fix

    What a 502 Actually Means

    The 502 Bad Gateway error from Nginx is a proxy error, not a server error in the traditional sense. Nginx itself is alive and accepting connections — it got your request, tried to forward it to the upstream backend, and either got back nothing or got back something it couldn't make sense of. That's the whole story. Nginx is telling the client: "I did my job, the thing behind me didn't."

    I've watched engineers waste twenty minutes restarting Nginx on a 502. It fixes nothing, because Nginx isn't the problem. The upstream is broken, unreachable, overloaded, or misconfigured. That's where you need to look.

    Read the Error Log Before Touching Anything

    The very first thing you do is read the Nginx error log. Not the access log — that'll just show you a wall of 502 responses. The error log tells you why.

    tail -n 100 /var/log/nginx/error.log

    If your server block defines a custom

    error_log
    path, check there instead. Look for lines flagged
    [error]
    . The upstream field in those lines is the critical piece. You're going to see one of a handful of errno values, and each one points at a different failure mode.

    2024/08/14 09:23:41 [error] 3821#3821: *4821 connect() failed (111: Connection refused) while connecting to upstream, client: 203.0.113.55, server: solvethenetwork.com, request: "GET /api/status HTTP/1.1", upstream: "http://127.0.0.1:8080/api/status", host: "solvethenetwork.com"

    Error 111 is ECONNREFUSED. Nothing is listening on that port. Full stop. Don't overthink it — go check if your upstream process is running.

    2024/08/14 09:31:07 [error] 3821#3821: *5103 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 203.0.113.55, server: solvethenetwork.com, upstream: "http://10.0.1.15:8080/api/slow-query", host: "solvethenetwork.com"

    Error 110 is ETIMEDOUT. Something is listening, it accepted the connection, but it never sent a response header before the timeout fired. That's a slow backend problem — think a database query taking forever, GC pauses, or a thread pool that's completely saturated.

    2024/08/14 09:45:22 [error] 3821#3821: *5891 recv() failed (104: Connection reset by peer) while reading response header from upstream

    Error 104 is ECONNRESET. The upstream accepted the connection and then killed it mid-conversation. This usually signals a crash, a premature socket close, or an upstream in the middle of a graceful shutdown that's still receiving traffic.

    Upstream Not Running: The Most Common Cause

    In my experience, the single most common 502 cause is simply that the upstream process isn't running. It crashed, someone stopped it, or it never came back up after a deployment. Start here every time.

    ss -tlnp | grep 8080

    If that returns nothing, your upstream isn't listening. For a Node.js application on port 8080, check its service status directly:

    systemctl status node-app.service

    Or if it's not managed by systemd:

    ps aux | grep node

    Once you've confirmed the process is dead, check its own logs before restarting it blindly. There's a reason it crashed, and restarting without understanding why means it'll crash again. Start the service, watch it come up, then tail the Nginx error log to confirm 502s stop.

    systemctl start node-app.service
    journalctl -u node-app.service -f

    PHP-FPM: The Classic 502 Source

    If you're running PHP behind Nginx — WordPress, Laravel, any PHP application — then PHP-FPM is almost certainly involved in your 502. FPM either isn't running, ran out of workers, or the socket and port configuration doesn't match what Nginx is trying to connect to.

    First, check if PHP-FPM is running:

    systemctl status php8.2-fpm.service

    Then verify the listener matches what Nginx expects. Your Nginx config might say:

    fastcgi_pass 127.0.0.1:9000;

    But your PHP-FPM pool config (typically under

    /etc/php/8.2/fpm/pool.d/www.conf
    ) might be configured to listen on a Unix socket:

    listen = /run/php/php8.2-fpm.sock

    That mismatch alone causes a 502. Check both sides and make sure they agree. If FPM is using a socket, Nginx should use:

    fastcgi_pass unix:/run/php/php8.2-fpm.sock;

    The other PHP-FPM failure mode I see regularly is worker exhaustion. If you have a high-traffic site and FPM's

    pm.max_children
    is set too low, all workers get occupied and new requests queue until Nginx times out waiting. Check the FPM log:

    tail -n 50 /var/log/php8.2-fpm.log

    If you see lines like

    server reached pm.max_children setting (5), consider raising it
    , that's your problem. Increase
    pm.max_children
    to a value that reflects your server's available memory and reload FPM. A rough rule of thumb: divide available RAM by your average PHP process memory footprint, then take 80% of that number to leave headroom.

    ; /etc/php/8.2/fpm/pool.d/www.conf
    pm = dynamic
    pm.max_children = 25
    pm.start_servers = 5
    pm.min_spare_servers = 3
    pm.max_spare_servers = 10

    Upstream Timeouts: When the Backend Is Just Too Slow

    If your error log shows ETIMEDOUT and the upstream is definitely running, the backend is taking longer to respond than Nginx's timeout allows. The default

    proxy_read_timeout
    is 60 seconds — which sounds like plenty, but report generation endpoints, heavy aggregation queries, and external API calls can blow past that easily.

    You have two options: fix the slow backend (always the correct long-term answer) or increase the timeout for that specific location. Here's how to do it selectively without touching global defaults:

    location /api/reports {
        proxy_pass            http://10.0.1.15:8080;
        proxy_connect_timeout 10s;
        proxy_send_timeout    120s;
        proxy_read_timeout    120s;
    }

    Don't crank timeouts up globally. That masks problems and lets slow requests pile up, eventually exhausting your worker pool and causing a broader outage. Increase timeouts only for endpoints where the slowness is expected and acceptable.

    A quick clarification on what these directives actually control:

    proxy_connect_timeout
    is how long Nginx waits to establish the TCP connection to the upstream.
    proxy_read_timeout
    is the gap between successive read operations on the response — not the total response time.
    proxy_send_timeout
    covers the send side. In practice,
    proxy_read_timeout
    is the one you'll tune most often for slow backends.

    Buffer Size Problems Causing 502s

    This one is subtle and I've been bitten by it more than once. If your upstream sends a very large response header — common with applications that set numerous cookies, put JWT tokens in response headers, or emit verbose debug headers — Nginx may fail to buffer it and return a 502.

    The error in the log will say:

    upstream sent too big header while reading response header from upstream

    The fix is to increase the proxy buffer sizes in your server or location block:

    proxy_buffer_size        16k;
    proxy_buffers            8 16k;
    proxy_busy_buffers_size  32k;

    For FastCGI upstreams like PHP-FPM, the equivalent directives are:

    fastcgi_buffer_size        16k;
    fastcgi_buffers            8 16k;
    fastcgi_busy_buffers_size  32k;

    The default

    proxy_buffer_size
    is one memory page — typically 4k or 8k depending on your platform. A JWT token in a response header, or a Set-Cookie header with multiple long session values, can easily exceed that. Bumping to 16k resolves the majority of cases I've seen in production environments.

    Unix Socket Permission Issues

    If you're using Unix sockets instead of TCP ports — which is the right call for same-host communication since it avoids the loopback overhead — permissions are a trap that catches a lot of people. Nginx runs as

    www-data
    or
    nginx
    depending on your distribution, and it needs read and write access to the socket file.

    ls -la /run/php/php8.2-fpm.sock
    srw-rw---- 1 www-data www-data 0 Aug 14 09:00 /run/php/php8.2-fpm.sock

    If the socket is owned by a different user and doesn't grant write access to the Nginx user, you'll get a 502 with a permission denied error in the log. Fix it in the FPM pool config:

    ; /etc/php/8.2/fpm/pool.d/www.conf
    listen.owner = www-data
    listen.group = www-data
    listen.mode  = 0660

    Restart FPM and the socket gets recreated on startup with the permissions you've defined. The same principle applies to any upstream that uses a Unix socket — Gunicorn, uWSGI, whatever. The Nginx worker process user must be able to write to that socket.

    SSL Between Nginx and the Upstream

    If Nginx is proxying to an HTTPS upstream — common in multi-tier architectures where internal traffic is also encrypted — a certificate validation failure causes a 502. This happens more than it should, usually when a self-signed certificate is used internally and Nginx rejects it because it can't verify the chain.

    Your proxy block might look like this:

    location / {
        proxy_pass                    https://10.0.1.20:8443;
        proxy_ssl_verify              on;
        proxy_ssl_trusted_certificate /etc/nginx/certs/internal-ca.crt;
    }

    If

    proxy_ssl_verify
    is
    on
    and the upstream cert doesn't validate against that CA bundle, Nginx returns a 502 and logs something like:

    SSL_do_handshake() failed (SSL: error:14090086:SSL routines:ssl3_get_server_certificate:certificate verify failed) while SSL handshaking to upstream

    You either need to provide the correct CA certificate via

    proxy_ssl_trusted_certificate
    , or — in a genuinely trusted internal network where you understand the risk — disable verification with
    proxy_ssl_verify off
    . I don't love that option, but internal PKI is often a mess in practice and you need to balance security with operational reality.

    Also double-check that the upstream is actually serving TLS. If you point

    proxy_pass
    at an HTTPS address but the backend is serving plain HTTP, you'll get an immediate 502 because the TLS handshake fails against a non-TLS listener. Confirm with a direct curl from the server:

    curl -vk https://10.0.1.20:8443/health

    Upstream Group Exhaustion

    If you're using an

    upstream
    block with multiple backend servers, Nginx marks individual backends as unavailable when they fail beyond a threshold. The
    max_fails
    and
    fail_timeout
    parameters control this passive health checking:

    upstream app_backend {
        server 10.0.1.21:8080 max_fails=3 fail_timeout=30s;
        server 10.0.1.22:8080 max_fails=3 fail_timeout=30s;
        server 10.0.1.23:8080 max_fails=3 fail_timeout=30s;
    }

    If all three backends get marked as failed simultaneously — say, a shared database goes down and every app instance starts returning errors — Nginx has no healthy upstream and returns 502 for every incoming request. The error log will show:

    no live upstreams while connecting to upstream

    The fix here is resolving the underlying dependency failure — the database, the auth service, whatever caused the cascade. Once

    fail_timeout
    expires, Nginx will probe the marked-down servers again automatically. It's not a permanent blacklist. If you need a fallback during full upstream outages, add a backup server that returns a maintenance response:

    upstream app_backend {
        server 10.0.1.21:8080 max_fails=3 fail_timeout=30s;
        server 10.0.1.22:8080 max_fails=3 fail_timeout=30s;
        server 10.0.1.23:8080 max_fails=3 fail_timeout=30s;
        server 10.0.1.99:8080 backup;
    }

    The Systematic Debug Sequence on sw-infrarunbook-01

    When I'm debugging a 502 in production, this is the exact sequence I run through on a host like sw-infrarunbook-01. It moves from observation to isolation to confirmation without jumping to conclusions.

    # 1. What upstream address is Nginx trying to reach?
    grep -Ei 'proxy_pass|fastcgi_pass|uwsgi_pass' /etc/nginx/sites-enabled/solvethenetwork.com
    
    # 2. Is anything listening on that address and port?
    ss -tlnp | grep 8080
    
    # 3. What does the error log say right now?
    tail -f /var/log/nginx/error.log
    
    # 4. Is the upstream service up and healthy?
    systemctl status app.service
    
    # 5. Can you reach the upstream directly, bypassing Nginx?
    curl -v http://127.0.0.1:8080/health
    
    # 6. What does the upstream service's own log say?
    journalctl -u app.service --since "10 minutes ago"

    Step 5 is the one engineers most often skip. Curling the upstream directly from the same host Nginx runs on removes Nginx from the equation entirely. If

    curl http://127.0.0.1:8080/health
    returns a 200, Nginx should be able to reach it too — and you need to look at Nginx config or permissions. If it hangs, refuses, or errors, that's your upstream problem confirmed without Nginx being involved at all.

    Config Validation and Graceful Reload

    If your fix involved changing Nginx configuration — adjusting timeouts, buffer sizes, proxy addresses, upstream blocks — always test the config before applying it:

    nginx -t

    If it comes back with

    syntax is ok
    and
    test is successful
    , reload gracefully:

    systemctl reload nginx

    A reload is graceful — existing connections complete normally while new connections pick up the updated configuration. A full

    restart
    drops active connections. Use reload unless you specifically need a clean process restart.

    After reloading, watch the error log to confirm 502s stop appearing:

    tail -f /var/log/nginx/error.log | grep -v " info "

    If 502s stop and your upstream health checks pass, you're done. If they continue, you haven't found the actual root cause yet — go back to the error log, look more carefully at the upstream address and the errno, and work through the list again.

    Instrumenting for Intermittent 502s

    Intermittent 502s — ones that appear occasionally under load rather than constantly — are harder to diagnose. Common causes include upstream worker exhaustion during traffic bursts, memory pressure causing the backend to slow or OOM-crash, and connection pool limits being hit at the database or external API layer.

    For these situations, I add upstream timing fields to Nginx's access log format so there's data to correlate against:

    log_format upstream_timing '$remote_addr - $remote_user [$time_local] '
                               '"$request" $status $body_bytes_sent '
                               'upstream=$upstream_addr '
                               'upstream_status=$upstream_status '
                               'upstream_rt=$upstream_response_time '
                               'request_rt=$request_time';
    
    access_log /var/log/nginx/access.log upstream_timing;

    With that in place, you can grep for 502 responses and see exactly which upstream node returned them and how long the request ran before failing:

    awk '$9 == 502 {print $0}' /var/log/nginx/access.log | tail -30

    Patterns jump out quickly. If all 502s come from

    10.0.1.22:8080
    specifically, you've got a bad node. If they cluster around specific times of day, you've got a traffic-driven exhaustion problem. If the
    upstream_rt
    field consistently shows values just under your timeout threshold right before the 502, the backend is genuinely too slow and needs optimization, not just a longer timeout.

    A 502 Bad Gateway is always solvable. The error log gives you the errno, the errno tells you the failure class, and from there it's methodical elimination. Read before you restart, isolate before you guess, and fix the actual cause rather than the symptom.

    Frequently Asked Questions

    Why does Nginx return a 502 instead of a 503?

    Nginx returns 502 when it gets an invalid or no response from the upstream backend — the proxy layer itself is working but the backend isn't. A 503 Service Unavailable is returned when Nginx deliberately rejects the request, for example when rate limiting is triggered or when the upstream group is exhausted and a fallback error page is configured. In practice, a dead or crashing upstream almost always produces a 502.

    Does restarting Nginx fix a 502 Bad Gateway error?

    Almost never. Nginx is functioning correctly when it returns a 502 — it's the upstream backend that has failed. Restarting Nginx won't bring the upstream back. Check whether the backend process is running, whether it's listening on the expected port or socket, and whether it's responding to direct connections before touching Nginx at all.

    How do I fix a 502 caused by PHP-FPM specifically?

    Start by confirming PHP-FPM is running with systemctl status php-fpm. Then verify that the listener in your FPM pool config (listen =) matches the fastcgi_pass directive in your Nginx server block — both must use either the same TCP address and port or the same Unix socket path. If both sides agree and FPM is running, check the FPM log for worker exhaustion messages and increase pm.max_children if needed.

    What is the difference between proxy_connect_timeout and proxy_read_timeout in Nginx?

    proxy_connect_timeout controls how long Nginx waits to establish the initial TCP connection to the upstream. proxy_read_timeout controls the maximum time between successive read operations on the upstream's response — not the total response time. If a backend is slow to start processing but then responds steadily, you need to increase proxy_read_timeout. If the backend is unreachable or takes too long to accept connections, proxy_connect_timeout is the relevant directive.

    How can I tell which upstream server is causing 502 errors in a load-balanced setup?

    Add $upstream_addr and $upstream_status to your Nginx access log format. The $upstream_addr variable records the specific backend IP and port that Nginx connected to for each request, and $upstream_status shows what that backend returned. Once these fields are in your logs, you can filter for 502 responses and immediately see which backend node is responsible for the failures.

    Related Articles