Nginx 502 Bad Gateway Fix

What a 502 Actually Means

The 502 Bad Gateway error from Nginx is a proxy error, not a server error in the traditional sense. Nginx itself is alive and accepting connections — it got your request, tried to forward it to the upstream backend, and either got back nothing or got back something it couldn't make sense of. That's the whole story. Nginx is telling the client: "I did my job, the thing behind me didn't."

I've watched engineers waste twenty minutes restarting Nginx on a 502. It fixes nothing, because Nginx isn't the problem. The upstream is broken, unreachable, overloaded, or misconfigured. That's where you need to look.

Read the Error Log Before Touching Anything

The very first thing you do is read the Nginx error log. Not the access log — that'll just show you a wall of 502 responses. The error log tells you why.

tail -n 100 /var/log/nginx/error.log

If your server block defines a custom

error_log

path, check there instead. Look for lines flagged

[error]

. The upstream field in those lines is the critical piece. You're going to see one of a handful of errno values, and each one points at a different failure mode.

2024/08/14 09:23:41 [error] 3821#3821: *4821 connect() failed (111: Connection refused) while connecting to upstream, client: 203.0.113.55, server: solvethenetwork.com, request: "GET /api/status HTTP/1.1", upstream: "http://127.0.0.1:8080/api/status", host: "solvethenetwork.com"

Error 111 is ECONNREFUSED. Nothing is listening on that port. Full stop. Don't overthink it — go check if your upstream process is running.

2024/08/14 09:31:07 [error] 3821#3821: *5103 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 203.0.113.55, server: solvethenetwork.com, upstream: "http://10.0.1.15:8080/api/slow-query", host: "solvethenetwork.com"

Error 110 is ETIMEDOUT. Something is listening, it accepted the connection, but it never sent a response header before the timeout fired. That's a slow backend problem — think a database query taking forever, GC pauses, or a thread pool that's completely saturated.

2024/08/14 09:45:22 [error] 3821#3821: *5891 recv() failed (104: Connection reset by peer) while reading response header from upstream

Error 104 is ECONNRESET. The upstream accepted the connection and then killed it mid-conversation. This usually signals a crash, a premature socket close, or an upstream in the middle of a graceful shutdown that's still receiving traffic.

Upstream Not Running: The Most Common Cause

In my experience, the single most common 502 cause is simply that the upstream process isn't running. It crashed, someone stopped it, or it never came back up after a deployment. Start here every time.

ss -tlnp | grep 8080

If that returns nothing, your upstream isn't listening. For a Node.js application on port 8080, check its service status directly:

systemctl status node-app.service

Or if it's not managed by systemd:

ps aux | grep node

Once you've confirmed the process is dead, check its own logs before restarting it blindly. There's a reason it crashed, and restarting without understanding why means it'll crash again. Start the service, watch it come up, then tail the Nginx error log to confirm 502s stop.

systemctl start node-app.service
journalctl -u node-app.service -f

PHP-FPM: The Classic 502 Source

If you're running PHP behind Nginx — WordPress, Laravel, any PHP application — then PHP-FPM is almost certainly involved in your 502. FPM either isn't running, ran out of workers, or the socket and port configuration doesn't match what Nginx is trying to connect to.

First, check if PHP-FPM is running:

systemctl status php8.2-fpm.service

Then verify the listener matches what Nginx expects. Your Nginx config might say:

fastcgi_pass 127.0.0.1:9000;

But your PHP-FPM pool config (typically under

/etc/php/8.2/fpm/pool.d/www.conf

) might be configured to listen on a Unix socket:

listen = /run/php/php8.2-fpm.sock

That mismatch alone causes a 502. Check both sides and make sure they agree. If FPM is using a socket, Nginx should use:

fastcgi_pass unix:/run/php/php8.2-fpm.sock;

The other PHP-FPM failure mode I see regularly is worker exhaustion. If you have a high-traffic site and FPM's

pm.max_children

is set too low, all workers get occupied and new requests queue until Nginx times out waiting. Check the FPM log:

tail -n 50 /var/log/php8.2-fpm.log

If you see lines like

server reached pm.max_children setting (5), consider raising it

, that's your problem. Increase

pm.max_children

to a value that reflects your server's available memory and reload FPM. A rough rule of thumb: divide available RAM by your average PHP process memory footprint, then take 80% of that number to leave headroom.

; /etc/php/8.2/fpm/pool.d/www.conf
pm = dynamic
pm.max_children = 25
pm.start_servers = 5
pm.min_spare_servers = 3
pm.max_spare_servers = 10

Upstream Timeouts: When the Backend Is Just Too Slow

If your error log shows ETIMEDOUT and the upstream is definitely running, the backend is taking longer to respond than Nginx's timeout allows. The default

proxy_read_timeout

is 60 seconds — which sounds like plenty, but report generation endpoints, heavy aggregation queries, and external API calls can blow past that easily.

You have two options: fix the slow backend (always the correct long-term answer) or increase the timeout for that specific location. Here's how to do it selectively without touching global defaults:

location /api/reports {
    proxy_pass            http://10.0.1.15:8080;
    proxy_connect_timeout 10s;
    proxy_send_timeout    120s;
    proxy_read_timeout    120s;
}

Don't crank timeouts up globally. That masks problems and lets slow requests pile up, eventually exhausting your worker pool and causing a broader outage. Increase timeouts only for endpoints where the slowness is expected and acceptable.

A quick clarification on what these directives actually control:

proxy_connect_timeout

is how long Nginx waits to establish the TCP connection to the upstream.

proxy_read_timeout

is the gap between successive read operations on the response — not the total response time.

proxy_send_timeout

covers the send side. In practice,

proxy_read_timeout

is the one you'll tune most often for slow backends.

Buffer Size Problems Causing 502s

This one is subtle and I've been bitten by it more than once. If your upstream sends a very large response header — common with applications that set numerous cookies, put JWT tokens in response headers, or emit verbose debug headers — Nginx may fail to buffer it and return a 502.

The error in the log will say:

upstream sent too big header while reading response header from upstream

The fix is to increase the proxy buffer sizes in your server or location block:

proxy_buffer_size        16k;
proxy_buffers            8 16k;
proxy_busy_buffers_size  32k;

For FastCGI upstreams like PHP-FPM, the equivalent directives are:

fastcgi_buffer_size        16k;
fastcgi_buffers            8 16k;
fastcgi_busy_buffers_size  32k;

The default

proxy_buffer_size

is one memory page — typically 4k or 8k depending on your platform. A JWT token in a response header, or a Set-Cookie header with multiple long session values, can easily exceed that. Bumping to 16k resolves the majority of cases I've seen in production environments.

Unix Socket Permission Issues

If you're using Unix sockets instead of TCP ports — which is the right call for same-host communication since it avoids the loopback overhead — permissions are a trap that catches a lot of people. Nginx runs as

www-data

nginx

depending on your distribution, and it needs read and write access to the socket file.

ls -la /run/php/php8.2-fpm.sock
srw-rw---- 1 www-data www-data 0 Aug 14 09:00 /run/php/php8.2-fpm.sock

If the socket is owned by a different user and doesn't grant write access to the Nginx user, you'll get a 502 with a permission denied error in the log. Fix it in the FPM pool config:

; /etc/php/8.2/fpm/pool.d/www.conf
listen.owner = www-data
listen.group = www-data
listen.mode  = 0660

Restart FPM and the socket gets recreated on startup with the permissions you've defined. The same principle applies to any upstream that uses a Unix socket — Gunicorn, uWSGI, whatever. The Nginx worker process user must be able to write to that socket.

SSL Between Nginx and the Upstream

If Nginx is proxying to an HTTPS upstream — common in multi-tier architectures where internal traffic is also encrypted — a certificate validation failure causes a 502. This happens more than it should, usually when a self-signed certificate is used internally and Nginx rejects it because it can't verify the chain.

Your proxy block might look like this:

location / {
    proxy_pass                    https://10.0.1.20:8443;
    proxy_ssl_verify              on;
    proxy_ssl_trusted_certificate /etc/nginx/certs/internal-ca.crt;
}

proxy_ssl_verify

on

and the upstream cert doesn't validate against that CA bundle, Nginx returns a 502 and logs something like:

SSL_do_handshake() failed (SSL: error:14090086:SSL routines:ssl3_get_server_certificate:certificate verify failed) while SSL handshaking to upstream

You either need to provide the correct CA certificate via

proxy_ssl_trusted_certificate

, or — in a genuinely trusted internal network where you understand the risk — disable verification with

proxy_ssl_verify off

. I don't love that option, but internal PKI is often a mess in practice and you need to balance security with operational reality.

Also double-check that the upstream is actually serving TLS. If you point

proxy_pass

at an HTTPS address but the backend is serving plain HTTP, you'll get an immediate 502 because the TLS handshake fails against a non-TLS listener. Confirm with a direct curl from the server:

curl -vk https://10.0.1.20:8443/health

Upstream Group Exhaustion

If you're using an

upstream

block with multiple backend servers, Nginx marks individual backends as unavailable when they fail beyond a threshold. The

max_fails

and

fail_timeout

parameters control this passive health checking:

upstream app_backend {
    server 10.0.1.21:8080 max_fails=3 fail_timeout=30s;
    server 10.0.1.22:8080 max_fails=3 fail_timeout=30s;
    server 10.0.1.23:8080 max_fails=3 fail_timeout=30s;
}

If all three backends get marked as failed simultaneously — say, a shared database goes down and every app instance starts returning errors — Nginx has no healthy upstream and returns 502 for every incoming request. The error log will show:

no live upstreams while connecting to upstream

The fix here is resolving the underlying dependency failure — the database, the auth service, whatever caused the cascade. Once

fail_timeout

expires, Nginx will probe the marked-down servers again automatically. It's not a permanent blacklist. If you need a fallback during full upstream outages, add a backup server that returns a maintenance response:

upstream app_backend {
    server 10.0.1.21:8080 max_fails=3 fail_timeout=30s;
    server 10.0.1.22:8080 max_fails=3 fail_timeout=30s;
    server 10.0.1.23:8080 max_fails=3 fail_timeout=30s;
    server 10.0.1.99:8080 backup;
}

The Systematic Debug Sequence on sw-infrarunbook-01

When I'm debugging a 502 in production, this is the exact sequence I run through on a host like sw-infrarunbook-01. It moves from observation to isolation to confirmation without jumping to conclusions.

# 1. What upstream address is Nginx trying to reach?
grep -Ei 'proxy_pass|fastcgi_pass|uwsgi_pass' /etc/nginx/sites-enabled/solvethenetwork.com

# 2. Is anything listening on that address and port?
ss -tlnp | grep 8080

# 3. What does the error log say right now?
tail -f /var/log/nginx/error.log

# 4. Is the upstream service up and healthy?
systemctl status app.service

# 5. Can you reach the upstream directly, bypassing Nginx?
curl -v http://127.0.0.1:8080/health

# 6. What does the upstream service's own log say?
journalctl -u app.service --since "10 minutes ago"

Step 5 is the one engineers most often skip. Curling the upstream directly from the same host Nginx runs on removes Nginx from the equation entirely. If

curl http://127.0.0.1:8080/health

returns a 200, Nginx should be able to reach it too — and you need to look at Nginx config or permissions. If it hangs, refuses, or errors, that's your upstream problem confirmed without Nginx being involved at all.

Config Validation and Graceful Reload

If your fix involved changing Nginx configuration — adjusting timeouts, buffer sizes, proxy addresses, upstream blocks — always test the config before applying it:

nginx -t

If it comes back with

syntax is ok

and

test is successful

, reload gracefully:

systemctl reload nginx

A reload is graceful — existing connections complete normally while new connections pick up the updated configuration. A full

restart

drops active connections. Use reload unless you specifically need a clean process restart.

After reloading, watch the error log to confirm 502s stop appearing:

tail -f /var/log/nginx/error.log | grep -v " info "

If 502s stop and your upstream health checks pass, you're done. If they continue, you haven't found the actual root cause yet — go back to the error log, look more carefully at the upstream address and the errno, and work through the list again.

Instrumenting for Intermittent 502s

Intermittent 502s — ones that appear occasionally under load rather than constantly — are harder to diagnose. Common causes include upstream worker exhaustion during traffic bursts, memory pressure causing the backend to slow or OOM-crash, and connection pool limits being hit at the database or external API layer.

For these situations, I add upstream timing fields to Nginx's access log format so there's data to correlate against:

log_format upstream_timing '$remote_addr - $remote_user [$time_local] '
                           '"$request" $status $body_bytes_sent '
                           'upstream=$upstream_addr '
                           'upstream_status=$upstream_status '
                           'upstream_rt=$upstream_response_time '
                           'request_rt=$request_time';

access_log /var/log/nginx/access.log upstream_timing;

With that in place, you can grep for 502 responses and see exactly which upstream node returned them and how long the request ran before failing:

awk '$9 == 502 {print $0}' /var/log/nginx/access.log | tail -30

Patterns jump out quickly. If all 502s come from

10.0.1.22:8080

specifically, you've got a bad node. If they cluster around specific times of day, you've got a traffic-driven exhaustion problem. If the

upstream_rt

field consistently shows values just under your timeout threshold right before the 502, the backend is genuinely too slow and needs optimization, not just a longer timeout.

A 502 Bad Gateway is always solvable. The error log gives you the errno, the errno tells you the failure class, and from there it's methodical elimination. Read before you restart, isolate before you guess, and fix the actual cause rather than the symptom.

Nginx 502 Bad Gateway Fix

What a 502 Actually Means

Read the Error Log Before Touching Anything

Upstream Not Running: The Most Common Cause

PHP-FPM: The Classic 502 Source

Upstream Timeouts: When the Backend Is Just Too Slow

Buffer Size Problems Causing 502s

Unix Socket Permission Issues

SSL Between Nginx and the Upstream

Upstream Group Exhaustion

The Systematic Debug Sequence on sw-infrarunbook-01

Config Validation and Graceful Reload

Instrumenting for Intermittent 502s

Frequently Asked Questions

Why does Nginx return a 502 instead of a 503?

Does restarting Nginx fix a 502 Bad Gateway error?

How do I fix a 502 caused by PHP-FPM specifically?

What is the difference between proxy_connect_timeout and proxy_read_timeout in Nginx?

How can I tell which upstream server is causing 502 errors in a load-balanced setup?

Related Articles