Envoy Rate Limiting Not Working

Symptoms

You've configured Envoy rate limiting and requests are sailing through without any throttling. No 429s. No

x-ratelimit-remaining

headers dropping. The service behind your proxy is getting hammered just as if the rate limit config never existed. Or maybe you're seeing the opposite: every request returns a 429 regardless of volume, which usually points to a different class of misconfiguration.

In my experience, Envoy rate limiting is one of those features that fails silently more often than it fails loudly. There's no single log line that says "your rate limit is broken." Instead, you piece it together from admin endpoints, gRPC trace logs, and descriptor comparisons. This guide walks through every meaningful failure mode I've seen in production, with real commands and outputs at each step.

Root Cause 1: Rate Limit Service Unreachable

This is the most common cause and also the one most likely to be invisible. When Envoy can't reach the upstream rate limit service (typically the lyft/ratelimit gRPC service), the default behavior is to allow all traffic. This is governed by the

failure_mode_deny

flag, which defaults to

false

. So connectivity failures silently become an open door.

Why it happens: The rate limit service cluster is misconfigured in Envoy's static or dynamic config. The service itself is down. A network policy or firewall rule is blocking the gRPC port (default 8081). DNS isn't resolving. Or the cluster name in the

rate_limit_service

block doesn't match any defined cluster.

How to identify it: Start at the Envoy admin interface. Hit the stats endpoint and look for ratelimit-related counters:

curl -s http://192.168.10.25:9901/stats | grep ratelimit

cluster.ratelimit_cluster.upstream_cx_connect_fail: 47
cluster.ratelimit_cluster.upstream_cx_connect_timeout: 0
cluster.ratelimit_cluster.upstream_rq_timeout: 0
http.ingress_http.ratelimit.error: 47
http.ingress_http.ratelimit.failure_mode_allowed: 47
http.ingress_http.ratelimit.ok: 0
http.ingress_http.ratelimit.over_limit: 0

The smoking gun here is

ratelimit.failure_mode_allowed

climbing in lockstep with

upstream_cx_connect_fail

. That tells you Envoy tried to contact the rate limit service, failed, and then let the request through anyway. Now check the cluster health:

curl -s http://192.168.10.25:9901/clusters | grep ratelimit

ratelimit_cluster::192.168.10.30:8081::cx_active::0
ratelimit_cluster::192.168.10.30:8081::cx_connect_fail::47
ratelimit_cluster::192.168.10.30:8081::health_flags::failed_active_hc

Zero active connections and a failed health check confirm the service is unreachable. You can also verify network-level connectivity from the Envoy host:

nc -zv 192.168.10.30 8081
nc: connect to 192.168.10.30 port 8081 (tcp) failed: Connection refused

How to fix it: First, verify the rate limit service is actually running and listening:

ssh infrarunbook-admin@sw-infrarunbook-01
systemctl status ratelimit

ss -tlnp | grep 8081
LISTEN 0 128 0.0.0.0:8081 0.0.0.0:* users:(("ratelimit",pid=3842,fd=7))

If the service is up, check your Envoy cluster configuration. The cluster name in

rate_limit_service

must exactly match a cluster defined elsewhere:

rate_limit_service:
  grpc_service:
    envoy_grpc:
      cluster_name: ratelimit_cluster   # must match exactly
  transport_api_version: V3

# Somewhere in your clusters section:
clusters:
  - name: ratelimit_cluster            # this name must match
    type: STRICT_DNS
    connect_timeout: 1s
    http2_protocol_options: {}         # required — gRPC runs over HTTP/2
    load_assignment:
      cluster_name: ratelimit_cluster
      endpoints:
        - lb_endpoints:
          - endpoint:
              address:
                socket_address:
                  address: 192.168.10.30
                  port_value: 8081

Notice the

http2_protocol_options: {}

line. Omitting it means Envoy tries to speak HTTP/1.1 to the gRPC rate limit service, which will fail or produce garbage. Also consider setting

failure_mode_deny: true

in non-critical paths during development so failures become visible rather than silent:

http_filters:
  - name: envoy.filters.http.ratelimit
    typed_config:
      "@type": type.googleapis.com/envoy.extensions.filters.http.ratelimit.v3.RateLimit
      domain: solvethenetwork_api
      failure_mode_deny: true   # fail closed during testing
      rate_limit_service:
        grpc_service:
          envoy_grpc:
            cluster_name: ratelimit_cluster
        transport_api_version: V3

Root Cause 2: Descriptor Not Matching

This one trips up even experienced Envoy operators. Your Envoy config generates rate limit descriptors. Your rate limit service has a config file with rate limit rules. If those two don't match exactly — down to the key names and values — requests pass through without being counted.

Why it happens: Descriptors in Envoy are built from

actions

defined on a virtual host or route. Those actions produce key-value pairs that get sent to the rate limit service. The rate limit service then looks up those exact key-value pairs in its config. If your Envoy action generates

{"remote_address": "192.168.1.44"}

but your ratelimit config has a rule for

{"header_match": "x-api-key"}

, no rule fires. There's no error. Requests just aren't counted.

How to identify it: Enable debug logging on Envoy to see the exact descriptors being generated and sent:

curl -s -X POST "http://192.168.10.25:9901/logging?filter=debug"
curl -s -X POST "http://192.168.10.25:9901/logging?http=debug"

Then send a test request and watch the logs:

journalctl -u envoy -f | grep -i ratelimit

[debug][http] [source/common/http/filter_manager.cc:867] [C4][S12345] request headers complete (end_stream=false)
[debug][ratelimit] Descriptor entry key: remote_address value: 192.168.1.44
[debug][ratelimit] Descriptor entry key: destination_cluster value: api_backend
[debug][ratelimit] grpc request to ratelimit service: domain=solvethenetwork_api descriptors=[{entries:[{key:remote_address value:192.168.1.44},{key:destination_cluster value:api_backend}]}]

Now compare that against your rate limit service config. Here's an example of a broken mismatch:

# What Envoy sends:
domain: solvethenetwork_api
descriptors:
  - entries:
    - key: remote_address
      value: 192.168.1.44

# What the rate limit service expects:
domain: solvethenetwork_api
descriptors:
  - key: client_ip          # <-- different key name!
    rate_limit:
      requests_per_unit: 100
      unit: MINUTE

The keys don't match.

remote_address

client_ip

. No rule fires.

How to fix it: Align the descriptor keys. In Envoy's route config, the action type controls what key name is used.

remote_address

actions always produce a

remote_address

key.

request_headers

actions produce the descriptor key you specify. Make sure your ratelimit service config uses the exact same key:

# Envoy virtual host config:
virtual_hosts:
  - name: api_vhost
    domains: ["api.solvethenetwork.com"]
    rate_limits:
      - actions:
        - remote_address: {}   # produces key: remote_address

# Matching ratelimit service config (config.yaml):
domain: solvethenetwork_api
descriptors:
  - key: remote_address       # matches what Envoy sends
    rate_limit:
      requests_per_unit: 100
      unit: MINUTE

Also watch for domain mismatches. The

domain

field in your Envoy ratelimit filter config must match the

domain

key in your rate limit service config file. A single character difference means no rules match.

Root Cause 3: Global vs Local Rate Limit Confusion

Envoy has two fundamentally different rate limiting mechanisms and they work nothing alike. Global rate limiting talks to an external gRPC service and enforces cluster-wide limits. Local rate limiting is done entirely in-process, per Envoy instance, using a token bucket algorithm. I've seen operators configure one while expecting the behavior of the other, and the results are always confusing.

Why it happens: Someone configures a local rate limit thinking it will apply across all Envoy replicas, or they configure global rate limiting but accidentally target the wrong filter type in their config. The filter names are different, the config structure is different, and they don't share any state with each other.

The global rate limit filter is:

envoy.filters.http.ratelimit

The local rate limit filter is:

envoy.filters.http.local_ratelimit

How to identify it: Check which filter is actually active in your chain:

curl -s http://192.168.10.25:9901/config_dump | \
  python3 -c "import sys,json; d=json.load(sys.stdin); \
  [print(f['name']) for cfg in d['configs'] \
  for lc in cfg.get('dynamic_listeners',[]) \
  for fc in lc.get('active_state',{}).get('listener',{}).get('filter_chains',[]) \
  for f in fc.get('filters',[]) \
  for hf in f.get('typed_config',{}).get('http_filters',[])]"

Or more pragmatically, just grep the config dump:

curl -s http://192.168.10.25:9901/config_dump | grep -A2 'ratelimit'

"name": "envoy.filters.http.local_ratelimit",
"typed_config": {
  "@type": "type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit"

If you expected global rate limiting, finding

local_ratelimit

explains everything. Local rate limits have no awareness of other Envoy instances. With three replicas each allowing 100 req/min, your effective limit is 300 req/min, not 100.

Check the stats too — local and global rate limits emit different metric namespaces:

curl -s http://192.168.10.25:9901/stats | grep ratelimit

# Global rate limit stats look like:
http.ingress_http.ratelimit.ok: 452
http.ingress_http.ratelimit.over_limit: 0

# Local rate limit stats look like:
http.ingress_http.local_rate_limit.enabled: 452
http.ingress_http.local_rate_limit.enforced: 452
http.ingress_http.local_rate_limit.rate_limited: 0

How to fix it: Decide which model you actually need. If you need cluster-wide enforcement — for example, limiting a downstream client to 1000 req/min regardless of how many Envoy pods are running — you need global rate limiting with an external ratelimit service. If per-instance limits are acceptable (common for CPU protection, not API quota enforcement), local rate limiting is simpler and has no external dependency.

If you need to migrate from local to global, here's the difference in filter config:

# Local rate limit (per-instance, no external service)
http_filters:
  - name: envoy.filters.http.local_ratelimit
    typed_config:
      "@type": type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
      stat_prefix: local_rate_limiter
      token_bucket:
        max_tokens: 100
        tokens_per_fill: 100
        fill_interval: 60s
      filter_enabled:
        default_value:
          numerator: 100
          denominator: HUNDRED
      filter_enforced:
        default_value:
          numerator: 100
          denominator: HUNDRED

# Global rate limit (cluster-wide, requires external service)
http_filters:
  - name: envoy.filters.http.ratelimit
    typed_config:
      "@type": type.googleapis.com/envoy.extensions.filters.http.ratelimit.v3.RateLimit
      domain: solvethenetwork_api
      stage: 0
      failure_mode_deny: false
      rate_limit_service:
        grpc_service:
          envoy_grpc:
            cluster_name: ratelimit_cluster
        transport_api_version: V3

Root Cause 4: Header Not Being Sent to Rate Limit Service

When you configure rate limits based on request headers — an API key, a user ID, a tenant header — Envoy needs to actually receive that header and then forward it as part of the descriptor to the rate limit service. If the header is stripped upstream, never set by the client, or the descriptor action references the wrong header name, no rate limit applies.

Why it happens: A few scenarios cause this. The client isn't sending the header at all (common during development when API keys aren't yet enforced at the client). An upstream proxy or load balancer is stripping the header before it reaches Envoy. The header name in the Envoy

request_headers

action uses a different capitalization or name than what's actually sent. Or there's a

skip_if_absent

setting causing the descriptor entry to be omitted when the header is missing.

How to identify it: First, verify what headers are actually arriving at Envoy. You can do this by temporarily routing to a debug backend or using Envoy's access log with header logging enabled:

access_log:
  - name: envoy.access_loggers.file
    typed_config:
      "@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
      path: /var/log/envoy/access.log
      log_format:
        text_format: "[%START_TIME%] %REQ(:METHOD)% %REQ(X-API-KEY)% %REQ(X-USER-ID)% %RESPONSE_CODE%\n"

tail -f /var/log/envoy/access.log

[2026-04-20T14:22:01.443Z] GET - - 200
[2026-04-20T14:22:02.101Z] GET - - 200
[2026-04-20T14:22:02.887Z] GET - - 200

Those dashes where the API key should be confirm the header isn't arriving. Now check the descriptor action in your Envoy config:

rate_limits:
  - actions:
    - request_headers:
        header_name: x-api-key
        descriptor_key: api_key
        skip_if_absent: true   # <-- when header is missing, descriptor entry is skipped

With

skip_if_absent: true

, if

x-api-key

isn't present, the entire descriptor entry is omitted. Your ratelimit rule requires that key. No entry means no match means no rate limit.

You can also watch the debug logs for descriptor generation with missing headers:

[debug][ratelimit] Descriptor entry with header x-api-key: header not present, skipping entry
[debug][ratelimit] grpc request to ratelimit service: domain=solvethenetwork_api descriptors=[{entries:[]}]
[debug][ratelimit] Response: OK (no descriptor matched)

How to fix it: The fix depends on intent. If you want to rate limit requests that lack the header (anonymous traffic), change your ratelimit config to handle the missing case explicitly, or use a different action type like

remote_address

as a fallback. If you want to block requests with no API key outright, handle that in a different filter before the rate limit filter runs.

If the header is being stripped by an upstream proxy, you need to investigate that proxy's config. For nginx in front of Envoy, check that

proxy_pass_request_headers on

is set and that no explicit

proxy_set_header

blocks are overriding the header.

Also verify header name casing. HTTP/2 requires lowercase header names, and Envoy normalizes to lowercase. If your action references

X-API-Key

instead of

x-api-key

, it won't match:

# Wrong — will never match in HTTP/2 normalized form
request_headers:
  header_name: X-API-Key
  descriptor_key: api_key

# Correct
request_headers:
  header_name: x-api-key
  descriptor_key: api_key

Root Cause 5: Filter Not in Chain

The most embarrassingly simple cause: the rate limit filter simply isn't in the HTTP filter chain. The config was written but applied to the wrong listener, or it was added to a filter chain that doesn't handle the traffic you're testing with. No filter in the chain means no rate limiting, no errors, nothing.

Why it happens: Envoy configs can have multiple listeners, multiple filter chains per listener, and multiple route configurations. It's easy to add the ratelimit filter to a management listener instead of the ingress listener, or to add it to the static bootstrap config when traffic is served by an xDS-managed listener. In Kubernetes with Istio or similar control planes, the generated Envoy config may not include your custom filter at all if you didn't inject it via the right extension mechanism.

How to identify it: Dump the live config and search for the rate limit filter:

curl -s http://192.168.10.25:9901/config_dump > /tmp/envoy_config_dump.json

# Check which listeners exist
jq '.configs[] | select(."@type" | contains("ListenersConfigDump")) | 
  .dynamic_listeners[].name' /tmp/envoy_config_dump.json

"ingress_listener_8080"
"egress_listener_8443"
"prometheus_listener_9090"

# Find which listeners have the ratelimit filter
jq '.configs[] | select(."@type" | contains("ListenersConfigDump")) |
  .dynamic_listeners[] | .name as $ln |
  .. | strings | select(contains("ratelimit")) | {listener: $ln, filter: .}' \
  /tmp/envoy_config_dump.json

If that last command returns nothing, the ratelimit filter isn't in any active listener. If it returns only

prometheus_listener_9090

, you added it to the wrong listener.

Another quick check using the stats endpoint — if the filter were active and processing requests, you'd see ratelimit stats incrementing:

curl -s http://192.168.10.25:9901/stats | grep 'ratelimit\|local_rate_limit'
# Empty output = filter is not loaded or not processing any requests

How to fix it: Locate the correct listener in your config and ensure the ratelimit filter appears in its

http_filters

list, before the

envoy.filters.http.router

filter. The router must always be last:

listeners:
  - name: ingress_listener_8080
    address:
      socket_address:
        address: 0.0.0.0
        port_value: 8080
    filter_chains:
      - filters:
        - name: envoy.filters.network.http_connection_manager
          typed_config:
            "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
            stat_prefix: ingress_http
            http_filters:
              - name: envoy.filters.http.ratelimit    # must be here
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.filters.http.ratelimit.v3.RateLimit
                  domain: solvethenetwork_api
                  rate_limit_service:
                    grpc_service:
                      envoy_grpc:
                        cluster_name: ratelimit_cluster
                    transport_api_version: V3
              - name: envoy.filters.http.router      # router must be last
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router

After making changes, validate the config before reloading:

envoy --mode validate -c /etc/envoy/envoy.yaml
configuration '/etc/envoy/envoy.yaml' OK

Root Cause 6: Stage Mismatch

Envoy's global rate limit filter supports a

stage

parameter (0–10). Rate limit actions on routes also have a stage value. A rate limit filter with stage 0 only processes rate limit actions also configured for stage 0. If your filter is stage 0 and your route actions are stage 1, no rate limit runs. This is a feature meant for applying different rate limits at different points in request processing — but it's a footgun when stages drift out of sync.

How to identify it: Compare the stage value in your filter config against the stage in your route's rate_limits actions:

# Filter config — check the stage value:
http_filters:
  - name: envoy.filters.http.ratelimit
    typed_config:
      stage: 0    # this filter handles stage 0 actions

# Route config — check all rate_limits entries:
routes:
  - match:
      prefix: "/api"
    route:
      cluster: api_backend
    rate_limits:
      - stage: 1    # <-- mismatch! filter is stage 0, action is stage 1
        actions:
          - remote_address: {}

How to fix it: Align the stage values. If you're not intentionally using multi-stage rate limiting, set all stage values to 0 or omit the field entirely (it defaults to 0).

Root Cause 7: Rate Limit Service Domain Mismatch

The ratelimit service groups rules by domain. Envoy sends a domain identifier with every gRPC request. If the domain in Envoy's filter config doesn't match any domain defined in the ratelimit service's config files, no rules are evaluated. The ratelimit service returns OK for all requests, and traffic flows freely.

How to identify it: Check the ratelimit service logs directly:

ssh infrarunbook-admin@sw-infrarunbook-01
journalctl -u ratelimit -n 50

time="2026-04-20T14:30:01Z" level=debug msg="starting rate limit lookup" domain=solvethenetwork_api
time="2026-04-20T14:30:01Z" level=warning msg="unknown domain" domain=solvethenetwork_api
time="2026-04-20T14:30:01Z" level=debug msg="returning OK, domain not found"

That warning is unambiguous. Now check what domains the ratelimit service actually knows about:

ls /etc/ratelimit/config/
api_limits.yaml  admin_limits.yaml

head -1 /etc/ratelimit/config/api_limits.yaml
domain: solvethenetwork_v2    # Envoy is sending solvethenetwork_api

How to fix it: Update the domain in either the ratelimit service config or the Envoy filter config so they match. After updating the ratelimit service config, reload it — the ratelimit service typically watches for config file changes, but a restart is safer:

systemctl restart ratelimit
curl -s http://192.168.10.30:8080/json | python3 -m json.tool
# Should show your new domain and rules

Prevention

Most of these failures share a common theme: things break silently. Rate limiting is a control-plane feature where the absence of enforcement looks exactly like a healthy system from the outside. Prevention is almost entirely about making failures visible before they become incidents.

Set

failure_mode_deny: true

in non-production environments. Yes, it will cause outages during testing — that's the point. You want to find connectivity failures during your integration tests, not when a client abuse spike hits production. In production, keep it at

false

if availability matters more than rate limit enforcement, but monitor the

ratelimit.failure_mode_allowed

counter and alert when it rises.

Add ratelimit stats to your dashboard from day one. The key counters are

ratelimit.ok

ratelimit.over_limit

ratelimit.error

, and

ratelimit.failure_mode_allowed

. If

over_limit

is always zero and

ok

is climbing, either your limits are too high or your config is broken. Both are worth investigating.

Write a synthetic test that deliberately exceeds your configured rate limit and verify you get 429s. Run this test in CI against a real ratelimit service instance. If the test starts passing when it should be getting throttled, your rate limit is broken — and your test will catch it before production does.

Keep domain names and descriptor keys in a shared constants file or configuration management system. When the domain name lives in two places (Envoy filter config and ratelimit service config), they will eventually drift. Treat them like API contracts.

When deploying changes to rate limit rules, use the ratelimit service's

/json

debug endpoint to verify the loaded config matches your intent before routing traffic through it. One minute of verification saves hours of debugging.

Envoy Rate Limiting Not Working

Symptoms

Root Cause 1: Rate Limit Service Unreachable

Root Cause 2: Descriptor Not Matching

Root Cause 3: Global vs Local Rate Limit Confusion

Root Cause 4: Header Not Being Sent to Rate Limit Service

Root Cause 5: Filter Not in Chain

Root Cause 6: Stage Mismatch

Root Cause 7: Rate Limit Service Domain Mismatch

Prevention

Frequently Asked Questions

Why does Envoy allow all traffic when the rate limit service is down?

How do I verify what descriptors Envoy is sending to the rate limit service?

What is the difference between Envoy global and local rate limiting?

Why does my rate limit based on a request header never trigger?

How do I check if the rate limit filter is actually loaded in the active listener?

Related Articles