InfraRunBook
    Back to articles

    How Nginx Improves Website Performance and Speed

    Nginx
    Published: Apr 1, 2026
    Updated: Apr 1, 2026

    Learn how Nginx's event-driven architecture, proxy caching, gzip compression, HTTP/2, and upstream keepalive eliminate latency and scale web infrastructure to handle high concurrent load.

    How Nginx Improves Website Performance and Speed

    Nginx has earned its place as one of the most widely deployed web servers and reverse proxies in production infrastructure. Originally written by Igor Sysoev to solve the C10K problem — handling ten thousand concurrent connections on a single server — Nginx achieves high performance through a fundamentally different architecture than thread-per-connection servers. For infrastructure teams managing sites under real traffic load, understanding how Nginx delivers performance gains is not optional; it is foundational knowledge.

    This article walks through the concrete mechanisms Nginx uses to improve website performance and speed: its event-driven core and worker tuning, gzip compression, proxy caching, connection keep-alive, HTTP/2, load balancing, SSL session reuse, and buffer optimization. Each section includes annotated configuration snippets drawn from a production-style setup on sw-infrarunbook-01 serving solvethenetwork.com.


    Event-Driven Architecture and Worker Process Tuning

    The foundation of Nginx's performance is its event-driven, asynchronous, non-blocking model. Rather than spawning a thread or process per connection — as Apache's prefork MPM does — Nginx uses a small, fixed number of worker processes. Each worker handles thousands of connections through the operating system's event notification interface: epoll on Linux, kqueue on BSD. The practical implication is that Nginx consumes far less memory under load and avoids the context-switching overhead that cripples thread-based servers at scale.

    Tuning the worker count and connection limits is the first lever infrastructure engineers pull:

    # /etc/nginx/nginx.conf — worker tuning on sw-infrarunbook-01
    
    worker_processes auto;          # one worker per logical CPU core
    worker_rlimit_nofile 65535;     # raise OS file descriptor limit per worker
    
    events {
        worker_connections 4096;    # max simultaneous connections per worker
        use epoll;                  # explicit epoll on Linux
        multi_accept on;            # accept all pending connections in one pass
    }

    With worker_processes auto, Nginx reads the CPU count at startup and spawns one worker per core. On a 4-core host, the total concurrency ceiling is 4 x 4096 = 16,384 simultaneous connections before any queuing occurs. Combined with multi_accept on, each worker drains its accept queue in a single system call pass, reducing accept latency under burst traffic.


    Gzip Compression to Reduce Transfer Size

    Network bandwidth remains a bottleneck for many users, particularly on mobile connections. Gzip compression reduces the byte count of text-based responses — HTML, CSS, JavaScript, JSON, XML — by 60 to 80 percent in typical cases. Nginx compresses responses on the fly before sending them to the client, or can serve pre-compressed .gz files using the gzip_static module.

    # Gzip configuration — nginx.conf http context
    
    http {
        gzip              on;
        gzip_vary         on;            # add Vary: Accept-Encoding header
        gzip_proxied      any;           # compress responses for all proxy requests
        gzip_comp_level   5;             # balance CPU usage vs compression ratio
        gzip_min_length   256;           # skip compressing responses under 256 bytes
        gzip_types
            text/plain
            text/css
            text/javascript
            application/javascript
            application/json
            application/xml
            image/svg+xml
            font/woff2;
    }

    gzip_comp_level 5 is a widely adopted production value. Levels above 6 provide diminishing compression gains while consuming noticeably more CPU. The gzip_vary on directive is critical when Nginx sits behind a CDN or Varnish cache — it signals that the cached object differs based on whether the client sent Accept-Encoding: gzip, preventing clients from receiving gzip-encoded content they cannot decode.


    Sendfile, TCP_NOPUSH, and TCP_NODELAY

    For static file delivery, Nginx can bypass user-space memory entirely using the sendfile() system call. Without sendfile, the kernel copies file data into user-space, then the application copies it back into the socket buffer — two unnecessary memory copies. The sendfile system call performs the transfer entirely within the kernel, reducing CPU time and memory bus pressure.

    http {
        sendfile        on;
        tcp_nopush      on;    # batch response headers with sendfile data
        tcp_nodelay     on;    # disable Nagle algorithm for keepalive connections
        keepalive_timeout    65;
        keepalive_requests   1000;
    }

    tcp_nopush works in conjunction with sendfile to batch the HTTP response header and the start of the file into a single TCP segment, reducing round trips. tcp_nodelay disables the Nagle algorithm on keep-alive connections so that small responses — API calls, health checks — are not artificially delayed waiting to accumulate enough data to fill a full TCP segment.


    Proxy Caching for Upstream Offload

    When Nginx acts as a reverse proxy in front of application servers, proxy caching is one of the highest-leverage performance tools available. Cached responses are served from disk or memory without touching the upstream at all, dramatically reducing backend load and Time to First Byte (TTFB). A single cache hit removes an entire upstream round trip.

    # /etc/nginx/nginx.conf — cache zone definition
    
    http {
        proxy_cache_path /var/cache/nginx/solvethenetwork
            levels=1:2
            keys_zone=app_cache:64m
            max_size=4g
            inactive=60m
            use_temp_path=off;
    
        server {
            listen 443 ssl http2;
            server_name solvethenetwork.com;
    
            location /api/public/ {
                proxy_pass              http://10.10.10.20:8080;
                proxy_cache             app_cache;
                proxy_cache_valid       200 10m;
                proxy_cache_valid       404 1m;
                proxy_cache_use_stale   error timeout updating http_500 http_502 http_503;
                proxy_cache_lock        on;
                add_header              X-Cache-Status $upstream_cache_status;
            }
        }
    }

    The proxy_cache_use_stale directive instructs Nginx to serve a stale cached copy if the upstream returns an error or times out. This is essential for resilience — users see slightly stale data rather than a 502 error during upstream degradation events. proxy_cache_lock on prevents cache stampedes: only one request revalidates an expired entry while others wait and then receive the freshly cached response, protecting the backend from a sudden burst of simultaneous cache-miss requests.


    HTTP Keep-Alive and Upstream Keepalive

    Each TCP connection involves a three-way handshake, and with TLS, an additional handshake costing multiple round trips. HTTP keep-alive reuses a single connection for multiple requests, eliminating this overhead entirely for subsequent requests from the same client or to the same upstream server.

    http {
        keepalive_timeout  65;
        keepalive_requests 1000;
    
        upstream app_servers {
            server 10.10.10.20:8080;
            server 10.10.10.21:8080;
            keepalive 64;        # idle keepalive connections per worker in pool
        }
    
        server {
            location / {
                proxy_pass         http://app_servers;
                proxy_http_version 1.1;
                proxy_set_header   Connection "";
            }
        }
    }

    The keepalive 64 directive in the upstream block maintains a pool of up to 64 idle persistent connections per worker to the backend servers. Without this, Nginx establishes a new TCP connection for every proxied request — significant overhead on high-throughput APIs. The proxy_http_version 1.1 and empty Connection header are mandatory companions: HTTP/1.0 does not support persistent connections, and clearing the Connection header prevents the client's connection disposition from being forwarded to the upstream.


    HTTP/2 for Multiplexed Connections

    HTTP/2 is the single most impactful protocol-level improvement for browser-facing traffic. It multiplexes multiple requests over a single TCP connection, eliminates head-of-line blocking at the HTTP layer, uses HPACK header compression to reduce redundant header bytes, and supports server push for preloading assets. Enabling it in Nginx requires only adding http2 to the listen directive:

    server {
        listen 443 ssl http2;
        server_name solvethenetwork.com;
    
        ssl_certificate      /etc/ssl/solvethenetwork/fullchain.pem;
        ssl_certificate_key  /etc/ssl/solvethenetwork/privkey.pem;
        ssl_protocols        TLSv1.2 TLSv1.3;
        ssl_ciphers          ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384;
        ssl_prefer_server_ciphers off;
        ssl_session_cache    shared:SSL:10m;
        ssl_session_timeout  1d;
        ssl_session_tickets  off;
    }

    ssl_session_cache shared:SSL:10m stores TLS session parameters so returning clients can resume without a full handshake. On solvethenetwork.com, this means returning visitors skip the 1 to 2 RTT TLS negotiation overhead entirely. The cache is shared across all worker processes using 10 MB of shared memory, which holds approximately 40,000 sessions. ssl_session_tickets off is the recommended setting for deployments prioritizing forward secrecy — session ticket key rotation requires careful orchestration to maintain security properties.


    Load Balancing Algorithms and Upstream Health Checks

    Nginx's upstream module provides several load balancing algorithms relevant to performance. The default round-robin distributes requests evenly. For application servers with varying response times, least_conn routes each new request to the backend with the fewest active connections, preventing queue buildup behind a temporarily slow server.

    upstream app_servers {
        least_conn;
    
        server 10.10.10.20:8080 weight=2 max_fails=3 fail_timeout=30s;
        server 10.10.10.21:8080 weight=1 max_fails=3 fail_timeout=30s;
        server 10.10.10.22:8080 backup;
    
        keepalive 64;
    }

    The backup flag marks 10.10.10.22 as a standby server — it only receives traffic when the primary servers are unavailable. Weight values direct proportionally more traffic to the higher-capacity node. Passive health checking is built-in: if a server fails three consecutive requests within 30 seconds, Nginx marks it as unavailable and stops routing to it until the fail_timeout window expires and a probe request succeeds.


    Rate Limiting to Protect Performance Under Load

    A server saturated by scrapers, credential stuffers, or a sudden traffic spike is not a performant server. Nginx's ngx_http_limit_req_module implements token bucket rate limiting to cap request rates per client IP, protecting backend resources and ensuring fair capacity distribution across legitimate users.

    http {
        limit_req_zone $binary_remote_addr zone=api_limit:10m   rate=30r/s;
        limit_req_zone $binary_remote_addr zone=login_limit:10m rate=5r/m;
    
        server {
            location /api/ {
                limit_req  zone=api_limit burst=50 nodelay;
                limit_req_status 429;
                proxy_pass http://app_servers;
            }
    
            location /auth/login {
                limit_req  zone=login_limit burst=3 nodelay;
                limit_req_status 429;
                proxy_pass http://app_servers;
            }
        }
    }

    The burst parameter allows short traffic spikes above the steady-state rate before requests are rejected with 429. nodelay processes burst requests immediately rather than queuing them at the rate limit interval, which reduces perceived latency for legitimate burst traffic while still enforcing the overall rate. Using $binary_remote_addr instead of $remote_addr reduces per-entry memory from up to 15 bytes to 4 bytes for IPv4, making the shared memory zone more efficient under large client populations.


    Open File Cache for Static Asset Delivery

    Every static file request normally requires multiple system calls: open(), fstat(), and close(). Under high concurrency serving thousands of static assets such as images, fonts, and CSS files, this syscall overhead accumulates. The open_file_cache directive instructs Nginx to cache file descriptors, metadata, and directory lookup results in memory:

    http {
        open_file_cache          max=10000 inactive=30s;
        open_file_cache_valid    60s;
        open_file_cache_min_uses 2;
        open_file_cache_errors   on;
    }

    File descriptors and metadata for up to 10,000 files are cached. Entries are evicted after 30 seconds of inactivity. A file must be accessed at least twice before entering the cache, preventing one-off large file requests from displacing frequently accessed assets. open_file_cache_errors on caches negative lookups as well — preventing redundant filesystem calls for missing files that are repeatedly probed by bots scanning for common CMS paths.


    Buffer Tuning for Proxy Throughput

    Nginx buffers upstream responses before forwarding them to clients. If buffers are too small, Nginx overflows to temporary disk files, introducing disk I/O latency into the response path. Properly sized buffers keep the entire transaction in memory and allow Nginx to release the upstream connection sooner:

    location /api/ {
        proxy_pass               http://app_servers;
        proxy_buffering          on;
        proxy_buffer_size        16k;
        proxy_buffers            8 32k;
        proxy_busy_buffers_size  64k;
        proxy_temp_path          /var/cache/nginx/temp;
        proxy_max_temp_file_size 0;    # disable temp file overflow for API responses
    }

    proxy_buffer_size 16k covers the upstream response header. The pool of 8 x 32k buffers handles the body. Setting proxy_max_temp_file_size 0 forces all buffering to remain in memory for this location, appropriate when API responses are known to be compact. For large media downloads, leave temp files enabled and place the temp path on fast NVMe storage to minimize the penalty when the buffer overflows.


    Performance-Aware Access Logging

    Performance tuning requires observability. Nginx's flexible log format captures upstream response time, cache status, and connection timing data that feeds directly into monitoring dashboards and latency alerting:

    http {
        log_format performance
            '$remote_addr - $remote_user [$time_local] '
            '"$request" $status $body_bytes_sent '
            'rt=$request_time '
            'uct=$upstream_connect_time '
            'uht=$upstream_header_time '
            'urt=$upstream_response_time '
            'cs=$upstream_cache_status';
    
        access_log /var/log/nginx/solvethenetwork-access.log
            performance buffer=64k flush=5s;
    }

    The buffer=64k flush=5s parameters batch log writes rather than flushing after every request, reducing disk I/O at high request rates. The $upstream_response_time field is the most actionable metric for identifying slow backend responses that inflate overall TTFB. Feeding this log into a tool like Loki or Elasticsearch lets the operations team correlate latency spikes with deployments, traffic surges, or database query regressions in near real time.


    Frequently Asked Questions

    Why is Nginx faster than Apache for handling many concurrent connections?

    Nginx uses an event-driven, non-blocking architecture where a small number of worker processes each manage thousands of connections using the OS event loop (epoll on Linux). Apache's prefork model spawns a separate process or thread per connection, which consumes far more memory and CPU for context switching. Under high concurrency, Nginx's resource usage stays nearly flat while Apache's grows linearly.

    Does enabling gzip compression on Nginx significantly increase CPU usage?

    At compression level 5 or below, the CPU overhead is minimal for most workloads. Text assets like HTML, CSS, and JavaScript compress quickly, and the bandwidth saved typically outweighs the CPU cost. For very high-throughput servers (tens of thousands of requests per second), consider pre-compressing static files with gzip_static so Nginx serves the .gz files directly without any runtime compression.

    How is Nginx proxy caching different from using a CDN?

    Nginx proxy caching operates at your origin infrastructure level, storing upstream responses on the server's local disk or memory. A CDN caches at globally distributed edge nodes closer to end users. Both reduce upstream load and improve TTFB, but a CDN also reduces geographic latency. Nginx proxy cache is valuable for protecting your origin from traffic spikes and for caching responses that cannot be served by a CDN due to authentication or dynamic content requirements.

    What is the recommended value for worker_processes in production?

    The recommended value is 'auto', which sets one worker per logical CPU core. This maximizes CPU utilization without over-subscribing. On dedicated Nginx hosts, this is almost always the correct setting. If Nginx shares the host with other CPU-intensive services, you may reduce this to leave headroom for those processes.

    How does HTTP/2 improve performance over HTTP/1.1 in Nginx?

    HTTP/2 multiplexes multiple requests and responses over a single TCP connection simultaneously, eliminating the per-request connection overhead and the head-of-line blocking present in HTTP/1.1. It also uses HPACK header compression to reduce the byte overhead of repeated headers (such as cookies and user-agent strings) across requests. For pages loading many assets, this can cut page load time significantly.

    What is the difference between keepalive_timeout and the keepalive directive in an upstream block?

    keepalive_timeout controls how long Nginx keeps an idle connection open to a downstream client (browser). The keepalive directive in an upstream block controls the size of the idle connection pool Nginx maintains to backend servers. Both eliminate repeated TCP handshake overhead, but they operate at opposite ends of the proxy — client-facing and server-facing respectively.

    How can I verify that gzip compression is working on my Nginx server?

    Send a request with the Accept-Encoding: gzip header and inspect the response. Using curl, run: curl -H 'Accept-Encoding: gzip' -I https://solvethenetwork.com and look for 'Content-Encoding: gzip' in the response headers. You can also compare the Content-Length of compressed versus uncompressed responses to confirm the size reduction.

    What does proxy_cache_use_stale do and when should I enable it?

    proxy_cache_use_stale tells Nginx to serve a cached (potentially outdated) response when the upstream is unavailable, returns an error, is updating the cache entry, or times out. It is recommended for any production site where uptime matters more than absolute data freshness. It prevents users from seeing 502/503 errors during brief upstream outages or deployments, at the cost of occasionally serving content that is slightly stale.

    Does open_file_cache work for dynamically generated content?

    No. open_file_cache only caches file system metadata and open file descriptors for files served directly by Nginx from disk — static files like HTML, images, CSS, and JavaScript. Proxied responses handled by the upstream pass through the proxy cache system instead. open_file_cache is most impactful on servers serving large numbers of small static assets.

    Should I use least_conn or round-robin for upstream load balancing?

    Use least_conn when your backend requests have variable processing times — for example, a mix of fast and slow API endpoints hitting the same pool. least_conn prevents a slow request from clogging one backend while others sit idle. Use round-robin (the default) when requests are roughly uniform in cost and all backend servers have equivalent capacity. For session-persistent workloads, consider ip_hash instead.

    Related Articles