InfraRunBook
    Back to articles

    Envoy Rate Limiting Not Working

    Envoy
    Published: Apr 19, 2026
    Updated: Apr 19, 2026

    Envoy rate limiting can silently fail for several non-obvious reasons. This guide walks through every common root cause—from a misconfigured filter chain to descriptor mismatches—with real commands and fixes.

    Envoy Rate Limiting Not Working

    Symptoms

    You've configured Envoy rate limiting and requests are sailing through without any throttling. No 429s. No

    x-ratelimit-remaining
    headers dropping. The service behind your proxy is getting hammered just as if the rate limit config never existed. Or maybe you're seeing the opposite: every request returns a 429 regardless of volume, which usually points to a different class of misconfiguration.

    In my experience, Envoy rate limiting is one of those features that fails silently more often than it fails loudly. There's no single log line that says "your rate limit is broken." Instead, you piece it together from admin endpoints, gRPC trace logs, and descriptor comparisons. This guide walks through every meaningful failure mode I've seen in production, with real commands and outputs at each step.


    Root Cause 1: Rate Limit Service Unreachable

    This is the most common cause and also the one most likely to be invisible. When Envoy can't reach the upstream rate limit service (typically the lyft/ratelimit gRPC service), the default behavior is to allow all traffic. This is governed by the

    failure_mode_deny
    flag, which defaults to
    false
    . So connectivity failures silently become an open door.

    Why it happens: The rate limit service cluster is misconfigured in Envoy's static or dynamic config. The service itself is down. A network policy or firewall rule is blocking the gRPC port (default 8081). DNS isn't resolving. Or the cluster name in the

    rate_limit_service
    block doesn't match any defined cluster.

    How to identify it: Start at the Envoy admin interface. Hit the stats endpoint and look for ratelimit-related counters:

    curl -s http://192.168.10.25:9901/stats | grep ratelimit
    
    cluster.ratelimit_cluster.upstream_cx_connect_fail: 47
    cluster.ratelimit_cluster.upstream_cx_connect_timeout: 0
    cluster.ratelimit_cluster.upstream_rq_timeout: 0
    http.ingress_http.ratelimit.error: 47
    http.ingress_http.ratelimit.failure_mode_allowed: 47
    http.ingress_http.ratelimit.ok: 0
    http.ingress_http.ratelimit.over_limit: 0

    The smoking gun here is

    ratelimit.failure_mode_allowed
    climbing in lockstep with
    upstream_cx_connect_fail
    . That tells you Envoy tried to contact the rate limit service, failed, and then let the request through anyway. Now check the cluster health:

    curl -s http://192.168.10.25:9901/clusters | grep ratelimit
    
    ratelimit_cluster::192.168.10.30:8081::cx_active::0
    ratelimit_cluster::192.168.10.30:8081::cx_connect_fail::47
    ratelimit_cluster::192.168.10.30:8081::health_flags::failed_active_hc

    Zero active connections and a failed health check confirm the service is unreachable. You can also verify network-level connectivity from the Envoy host:

    nc -zv 192.168.10.30 8081
    nc: connect to 192.168.10.30 port 8081 (tcp) failed: Connection refused

    How to fix it: First, verify the rate limit service is actually running and listening:

    ssh infrarunbook-admin@sw-infrarunbook-01
    systemctl status ratelimit
    
    ss -tlnp | grep 8081
    LISTEN 0 128 0.0.0.0:8081 0.0.0.0:* users:(("ratelimit",pid=3842,fd=7))

    If the service is up, check your Envoy cluster configuration. The cluster name in

    rate_limit_service
    must exactly match a cluster defined elsewhere:

    rate_limit_service:
      grpc_service:
        envoy_grpc:
          cluster_name: ratelimit_cluster   # must match exactly
      transport_api_version: V3
    
    # Somewhere in your clusters section:
    clusters:
      - name: ratelimit_cluster            # this name must match
        type: STRICT_DNS
        connect_timeout: 1s
        http2_protocol_options: {}         # required — gRPC runs over HTTP/2
        load_assignment:
          cluster_name: ratelimit_cluster
          endpoints:
            - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: 192.168.10.30
                      port_value: 8081

    Notice the

    http2_protocol_options: {}
    line. Omitting it means Envoy tries to speak HTTP/1.1 to the gRPC rate limit service, which will fail or produce garbage. Also consider setting
    failure_mode_deny: true
    in non-critical paths during development so failures become visible rather than silent:

    http_filters:
      - name: envoy.filters.http.ratelimit
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.http.ratelimit.v3.RateLimit
          domain: solvethenetwork_api
          failure_mode_deny: true   # fail closed during testing
          rate_limit_service:
            grpc_service:
              envoy_grpc:
                cluster_name: ratelimit_cluster
            transport_api_version: V3

    Root Cause 2: Descriptor Not Matching

    This one trips up even experienced Envoy operators. Your Envoy config generates rate limit descriptors. Your rate limit service has a config file with rate limit rules. If those two don't match exactly — down to the key names and values — requests pass through without being counted.

    Why it happens: Descriptors in Envoy are built from

    actions
    defined on a virtual host or route. Those actions produce key-value pairs that get sent to the rate limit service. The rate limit service then looks up those exact key-value pairs in its config. If your Envoy action generates
    {"remote_address": "192.168.1.44"}
    but your ratelimit config has a rule for
    {"header_match": "x-api-key"}
    , no rule fires. There's no error. Requests just aren't counted.

    How to identify it: Enable debug logging on Envoy to see the exact descriptors being generated and sent:

    curl -s -X POST "http://192.168.10.25:9901/logging?filter=debug"
    curl -s -X POST "http://192.168.10.25:9901/logging?http=debug"

    Then send a test request and watch the logs:

    journalctl -u envoy -f | grep -i ratelimit
    
    [debug][http] [source/common/http/filter_manager.cc:867] [C4][S12345] request headers complete (end_stream=false)
    [debug][ratelimit] Descriptor entry key: remote_address value: 192.168.1.44
    [debug][ratelimit] Descriptor entry key: destination_cluster value: api_backend
    [debug][ratelimit] grpc request to ratelimit service: domain=solvethenetwork_api descriptors=[{entries:[{key:remote_address value:192.168.1.44},{key:destination_cluster value:api_backend}]}]

    Now compare that against your rate limit service config. Here's an example of a broken mismatch:

    # What Envoy sends:
    domain: solvethenetwork_api
    descriptors:
      - entries:
        - key: remote_address
          value: 192.168.1.44
    
    # What the rate limit service expects:
    domain: solvethenetwork_api
    descriptors:
      - key: client_ip          # <-- different key name!
        rate_limit:
          requests_per_unit: 100
          unit: MINUTE

    The keys don't match.

    remote_address
    vs
    client_ip
    . No rule fires.

    How to fix it: Align the descriptor keys. In Envoy's route config, the action type controls what key name is used.

    remote_address
    actions always produce a
    remote_address
    key.
    request_headers
    actions produce the descriptor key you specify. Make sure your ratelimit service config uses the exact same key:

    # Envoy virtual host config:
    virtual_hosts:
      - name: api_vhost
        domains: ["api.solvethenetwork.com"]
        rate_limits:
          - actions:
            - remote_address: {}   # produces key: remote_address
    
    # Matching ratelimit service config (config.yaml):
    domain: solvethenetwork_api
    descriptors:
      - key: remote_address       # matches what Envoy sends
        rate_limit:
          requests_per_unit: 100
          unit: MINUTE

    Also watch for domain mismatches. The

    domain
    field in your Envoy ratelimit filter config must match the
    domain
    key in your rate limit service config file. A single character difference means no rules match.


    Root Cause 3: Global vs Local Rate Limit Confusion

    Envoy has two fundamentally different rate limiting mechanisms and they work nothing alike. Global rate limiting talks to an external gRPC service and enforces cluster-wide limits. Local rate limiting is done entirely in-process, per Envoy instance, using a token bucket algorithm. I've seen operators configure one while expecting the behavior of the other, and the results are always confusing.

    Why it happens: Someone configures a local rate limit thinking it will apply across all Envoy replicas, or they configure global rate limiting but accidentally target the wrong filter type in their config. The filter names are different, the config structure is different, and they don't share any state with each other.

    The global rate limit filter is:

    envoy.filters.http.ratelimit

    The local rate limit filter is:
    envoy.filters.http.local_ratelimit

    How to identify it: Check which filter is actually active in your chain:

    curl -s http://192.168.10.25:9901/config_dump | \
      python3 -c "import sys,json; d=json.load(sys.stdin); \
      [print(f['name']) for cfg in d['configs'] \
      for lc in cfg.get('dynamic_listeners',[]) \
      for fc in lc.get('active_state',{}).get('listener',{}).get('filter_chains',[]) \
      for f in fc.get('filters',[]) \
      for hf in f.get('typed_config',{}).get('http_filters',[])]"

    Or more pragmatically, just grep the config dump:

    curl -s http://192.168.10.25:9901/config_dump | grep -A2 'ratelimit'
    
    "name": "envoy.filters.http.local_ratelimit",
    "typed_config": {
      "@type": "type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit"

    If you expected global rate limiting, finding

    local_ratelimit
    explains everything. Local rate limits have no awareness of other Envoy instances. With three replicas each allowing 100 req/min, your effective limit is 300 req/min, not 100.

    Check the stats too — local and global rate limits emit different metric namespaces:

    curl -s http://192.168.10.25:9901/stats | grep ratelimit
    
    # Global rate limit stats look like:
    http.ingress_http.ratelimit.ok: 452
    http.ingress_http.ratelimit.over_limit: 0
    
    # Local rate limit stats look like:
    http.ingress_http.local_rate_limit.enabled: 452
    http.ingress_http.local_rate_limit.enforced: 452
    http.ingress_http.local_rate_limit.rate_limited: 0

    How to fix it: Decide which model you actually need. If you need cluster-wide enforcement — for example, limiting a downstream client to 1000 req/min regardless of how many Envoy pods are running — you need global rate limiting with an external ratelimit service. If per-instance limits are acceptable (common for CPU protection, not API quota enforcement), local rate limiting is simpler and has no external dependency.

    If you need to migrate from local to global, here's the difference in filter config:

    # Local rate limit (per-instance, no external service)
    http_filters:
      - name: envoy.filters.http.local_ratelimit
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
          stat_prefix: local_rate_limiter
          token_bucket:
            max_tokens: 100
            tokens_per_fill: 100
            fill_interval: 60s
          filter_enabled:
            default_value:
              numerator: 100
              denominator: HUNDRED
          filter_enforced:
            default_value:
              numerator: 100
              denominator: HUNDRED
    
    # Global rate limit (cluster-wide, requires external service)
    http_filters:
      - name: envoy.filters.http.ratelimit
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.http.ratelimit.v3.RateLimit
          domain: solvethenetwork_api
          stage: 0
          failure_mode_deny: false
          rate_limit_service:
            grpc_service:
              envoy_grpc:
                cluster_name: ratelimit_cluster
            transport_api_version: V3

    Root Cause 4: Header Not Being Sent to Rate Limit Service

    When you configure rate limits based on request headers — an API key, a user ID, a tenant header — Envoy needs to actually receive that header and then forward it as part of the descriptor to the rate limit service. If the header is stripped upstream, never set by the client, or the descriptor action references the wrong header name, no rate limit applies.

    Why it happens: A few scenarios cause this. The client isn't sending the header at all (common during development when API keys aren't yet enforced at the client). An upstream proxy or load balancer is stripping the header before it reaches Envoy. The header name in the Envoy

    request_headers
    action uses a different capitalization or name than what's actually sent. Or there's a
    skip_if_absent
    setting causing the descriptor entry to be omitted when the header is missing.

    How to identify it: First, verify what headers are actually arriving at Envoy. You can do this by temporarily routing to a debug backend or using Envoy's access log with header logging enabled:

    access_log:
      - name: envoy.access_loggers.file
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
          path: /var/log/envoy/access.log
          log_format:
            text_format: "[%START_TIME%] %REQ(:METHOD)% %REQ(X-API-KEY)% %REQ(X-USER-ID)% %RESPONSE_CODE%\n"
    tail -f /var/log/envoy/access.log
    
    [2026-04-20T14:22:01.443Z] GET - - 200
    [2026-04-20T14:22:02.101Z] GET - - 200
    [2026-04-20T14:22:02.887Z] GET - - 200

    Those dashes where the API key should be confirm the header isn't arriving. Now check the descriptor action in your Envoy config:

    rate_limits:
      - actions:
        - request_headers:
            header_name: x-api-key
            descriptor_key: api_key
            skip_if_absent: true   # <-- when header is missing, descriptor entry is skipped

    With

    skip_if_absent: true
    , if
    x-api-key
    isn't present, the entire descriptor entry is omitted. Your ratelimit rule requires that key. No entry means no match means no rate limit.

    You can also watch the debug logs for descriptor generation with missing headers:

    [debug][ratelimit] Descriptor entry with header x-api-key: header not present, skipping entry
    [debug][ratelimit] grpc request to ratelimit service: domain=solvethenetwork_api descriptors=[{entries:[]}]
    [debug][ratelimit] Response: OK (no descriptor matched)

    How to fix it: The fix depends on intent. If you want to rate limit requests that lack the header (anonymous traffic), change your ratelimit config to handle the missing case explicitly, or use a different action type like

    remote_address
    as a fallback. If you want to block requests with no API key outright, handle that in a different filter before the rate limit filter runs.

    If the header is being stripped by an upstream proxy, you need to investigate that proxy's config. For nginx in front of Envoy, check that

    proxy_pass_request_headers on
    is set and that no explicit
    proxy_set_header
    blocks are overriding the header.

    Also verify header name casing. HTTP/2 requires lowercase header names, and Envoy normalizes to lowercase. If your action references

    X-API-Key
    instead of
    x-api-key
    , it won't match:

    # Wrong — will never match in HTTP/2 normalized form
    request_headers:
      header_name: X-API-Key
      descriptor_key: api_key
    
    # Correct
    request_headers:
      header_name: x-api-key
      descriptor_key: api_key

    Root Cause 5: Filter Not in Chain

    The most embarrassingly simple cause: the rate limit filter simply isn't in the HTTP filter chain. The config was written but applied to the wrong listener, or it was added to a filter chain that doesn't handle the traffic you're testing with. No filter in the chain means no rate limiting, no errors, nothing.

    Why it happens: Envoy configs can have multiple listeners, multiple filter chains per listener, and multiple route configurations. It's easy to add the ratelimit filter to a management listener instead of the ingress listener, or to add it to the static bootstrap config when traffic is served by an xDS-managed listener. In Kubernetes with Istio or similar control planes, the generated Envoy config may not include your custom filter at all if you didn't inject it via the right extension mechanism.

    How to identify it: Dump the live config and search for the rate limit filter:

    curl -s http://192.168.10.25:9901/config_dump > /tmp/envoy_config_dump.json
    
    # Check which listeners exist
    jq '.configs[] | select(."@type" | contains("ListenersConfigDump")) | 
      .dynamic_listeners[].name' /tmp/envoy_config_dump.json
    
    "ingress_listener_8080"
    "egress_listener_8443"
    "prometheus_listener_9090"
    
    # Find which listeners have the ratelimit filter
    jq '.configs[] | select(."@type" | contains("ListenersConfigDump")) |
      .dynamic_listeners[] | .name as $ln |
      .. | strings | select(contains("ratelimit")) | {listener: $ln, filter: .}' \
      /tmp/envoy_config_dump.json

    If that last command returns nothing, the ratelimit filter isn't in any active listener. If it returns only

    prometheus_listener_9090
    , you added it to the wrong listener.

    Another quick check using the stats endpoint — if the filter were active and processing requests, you'd see ratelimit stats incrementing:

    curl -s http://192.168.10.25:9901/stats | grep 'ratelimit\|local_rate_limit'
    # Empty output = filter is not loaded or not processing any requests

    How to fix it: Locate the correct listener in your config and ensure the ratelimit filter appears in its

    http_filters
    list, before the
    envoy.filters.http.router
    filter. The router must always be last:

    listeners:
      - name: ingress_listener_8080
        address:
          socket_address:
            address: 0.0.0.0
            port_value: 8080
        filter_chains:
          - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                stat_prefix: ingress_http
                http_filters:
                  - name: envoy.filters.http.ratelimit    # must be here
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.ratelimit.v3.RateLimit
                      domain: solvethenetwork_api
                      rate_limit_service:
                        grpc_service:
                          envoy_grpc:
                            cluster_name: ratelimit_cluster
                        transport_api_version: V3
                  - name: envoy.filters.http.router      # router must be last
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router

    After making changes, validate the config before reloading:

    envoy --mode validate -c /etc/envoy/envoy.yaml
    configuration '/etc/envoy/envoy.yaml' OK

    Root Cause 6: Stage Mismatch

    Envoy's global rate limit filter supports a

    stage
    parameter (0–10). Rate limit actions on routes also have a stage value. A rate limit filter with stage 0 only processes rate limit actions also configured for stage 0. If your filter is stage 0 and your route actions are stage 1, no rate limit runs. This is a feature meant for applying different rate limits at different points in request processing — but it's a footgun when stages drift out of sync.

    How to identify it: Compare the stage value in your filter config against the stage in your route's rate_limits actions:

    # Filter config — check the stage value:
    http_filters:
      - name: envoy.filters.http.ratelimit
        typed_config:
          stage: 0    # this filter handles stage 0 actions
    
    # Route config — check all rate_limits entries:
    routes:
      - match:
          prefix: "/api"
        route:
          cluster: api_backend
        rate_limits:
          - stage: 1    # <-- mismatch! filter is stage 0, action is stage 1
            actions:
              - remote_address: {}

    How to fix it: Align the stage values. If you're not intentionally using multi-stage rate limiting, set all stage values to 0 or omit the field entirely (it defaults to 0).


    Root Cause 7: Rate Limit Service Domain Mismatch

    The ratelimit service groups rules by domain. Envoy sends a domain identifier with every gRPC request. If the domain in Envoy's filter config doesn't match any domain defined in the ratelimit service's config files, no rules are evaluated. The ratelimit service returns OK for all requests, and traffic flows freely.

    How to identify it: Check the ratelimit service logs directly:

    ssh infrarunbook-admin@sw-infrarunbook-01
    journalctl -u ratelimit -n 50
    
    time="2026-04-20T14:30:01Z" level=debug msg="starting rate limit lookup" domain=solvethenetwork_api
    time="2026-04-20T14:30:01Z" level=warning msg="unknown domain" domain=solvethenetwork_api
    time="2026-04-20T14:30:01Z" level=debug msg="returning OK, domain not found"

    That warning is unambiguous. Now check what domains the ratelimit service actually knows about:

    ls /etc/ratelimit/config/
    api_limits.yaml  admin_limits.yaml
    
    head -1 /etc/ratelimit/config/api_limits.yaml
    domain: solvethenetwork_v2    # Envoy is sending solvethenetwork_api

    How to fix it: Update the domain in either the ratelimit service config or the Envoy filter config so they match. After updating the ratelimit service config, reload it — the ratelimit service typically watches for config file changes, but a restart is safer:

    systemctl restart ratelimit
    curl -s http://192.168.10.30:8080/json | python3 -m json.tool
    # Should show your new domain and rules

    Prevention

    Most of these failures share a common theme: things break silently. Rate limiting is a control-plane feature where the absence of enforcement looks exactly like a healthy system from the outside. Prevention is almost entirely about making failures visible before they become incidents.

    Set

    failure_mode_deny: true
    in non-production environments. Yes, it will cause outages during testing — that's the point. You want to find connectivity failures during your integration tests, not when a client abuse spike hits production. In production, keep it at
    false
    if availability matters more than rate limit enforcement, but monitor the
    ratelimit.failure_mode_allowed
    counter and alert when it rises.

    Add ratelimit stats to your dashboard from day one. The key counters are

    ratelimit.ok
    ,
    ratelimit.over_limit
    ,
    ratelimit.error
    , and
    ratelimit.failure_mode_allowed
    . If
    over_limit
    is always zero and
    ok
    is climbing, either your limits are too high or your config is broken. Both are worth investigating.

    Write a synthetic test that deliberately exceeds your configured rate limit and verify you get 429s. Run this test in CI against a real ratelimit service instance. If the test starts passing when it should be getting throttled, your rate limit is broken — and your test will catch it before production does.

    Keep domain names and descriptor keys in a shared constants file or configuration management system. When the domain name lives in two places (Envoy filter config and ratelimit service config), they will eventually drift. Treat them like API contracts.

    When deploying changes to rate limit rules, use the ratelimit service's

    /json
    debug endpoint to verify the loaded config matches your intent before routing traffic through it. One minute of verification saves hours of debugging.

    Frequently Asked Questions

    Why does Envoy allow all traffic when the rate limit service is down?

    By default, Envoy's rate limit filter sets failure_mode_deny to false, which means connectivity failures to the rate limit service result in requests being allowed through. This is a deliberate availability-over-enforcement default. You can change it to failure_mode_deny: true to fail closed, though this means any ratelimit service outage will block traffic.

    How do I verify what descriptors Envoy is sending to the rate limit service?

    Enable debug logging via the admin API: curl -X POST http://192.168.10.25:9901/logging?ratelimit=debug. Then watch the Envoy logs for lines showing descriptor entry key/value pairs and the full gRPC request being sent. Compare these against your ratelimit service config to identify mismatches.

    What is the difference between Envoy global and local rate limiting?

    Global rate limiting sends descriptor information to an external gRPC service (like lyft/ratelimit) that maintains shared counters across all Envoy instances. Local rate limiting uses a per-instance token bucket with no external service and no shared state. If you run multiple Envoy replicas and need a cluster-wide limit, you must use global rate limiting.

    Why does my rate limit based on a request header never trigger?

    Three common reasons: the header isn't being sent by the client, the header is being stripped by an upstream proxy before reaching Envoy, or the header name in your descriptor action doesn't match what's actually in the request (check for capitalization — HTTP/2 normalizes headers to lowercase). Use Envoy's access log with header logging to verify what headers are arriving.

    How do I check if the rate limit filter is actually loaded in the active listener?

    Dump the live config with curl -s http://192.168.10.25:9901/config_dump and search for 'ratelimit' in the output. If the filter is not in the http_filters list of your active listener's HttpConnectionManager, it won't process any requests regardless of what's in your static config files.

    Related Articles