Symptoms
You've configured Envoy rate limiting and requests are sailing through without any throttling. No 429s. No
x-ratelimit-remainingheaders dropping. The service behind your proxy is getting hammered just as if the rate limit config never existed. Or maybe you're seeing the opposite: every request returns a 429 regardless of volume, which usually points to a different class of misconfiguration.
In my experience, Envoy rate limiting is one of those features that fails silently more often than it fails loudly. There's no single log line that says "your rate limit is broken." Instead, you piece it together from admin endpoints, gRPC trace logs, and descriptor comparisons. This guide walks through every meaningful failure mode I've seen in production, with real commands and outputs at each step.
Root Cause 1: Rate Limit Service Unreachable
This is the most common cause and also the one most likely to be invisible. When Envoy can't reach the upstream rate limit service (typically the lyft/ratelimit gRPC service), the default behavior is to allow all traffic. This is governed by the
failure_mode_denyflag, which defaults to
false. So connectivity failures silently become an open door.
Why it happens: The rate limit service cluster is misconfigured in Envoy's static or dynamic config. The service itself is down. A network policy or firewall rule is blocking the gRPC port (default 8081). DNS isn't resolving. Or the cluster name in the
rate_limit_serviceblock doesn't match any defined cluster.
How to identify it: Start at the Envoy admin interface. Hit the stats endpoint and look for ratelimit-related counters:
curl -s http://192.168.10.25:9901/stats | grep ratelimit
cluster.ratelimit_cluster.upstream_cx_connect_fail: 47
cluster.ratelimit_cluster.upstream_cx_connect_timeout: 0
cluster.ratelimit_cluster.upstream_rq_timeout: 0
http.ingress_http.ratelimit.error: 47
http.ingress_http.ratelimit.failure_mode_allowed: 47
http.ingress_http.ratelimit.ok: 0
http.ingress_http.ratelimit.over_limit: 0
The smoking gun here is
ratelimit.failure_mode_allowedclimbing in lockstep with
upstream_cx_connect_fail. That tells you Envoy tried to contact the rate limit service, failed, and then let the request through anyway. Now check the cluster health:
curl -s http://192.168.10.25:9901/clusters | grep ratelimit
ratelimit_cluster::192.168.10.30:8081::cx_active::0
ratelimit_cluster::192.168.10.30:8081::cx_connect_fail::47
ratelimit_cluster::192.168.10.30:8081::health_flags::failed_active_hc
Zero active connections and a failed health check confirm the service is unreachable. You can also verify network-level connectivity from the Envoy host:
nc -zv 192.168.10.30 8081
nc: connect to 192.168.10.30 port 8081 (tcp) failed: Connection refused
How to fix it: First, verify the rate limit service is actually running and listening:
ssh infrarunbook-admin@sw-infrarunbook-01
systemctl status ratelimit
ss -tlnp | grep 8081
LISTEN 0 128 0.0.0.0:8081 0.0.0.0:* users:(("ratelimit",pid=3842,fd=7))
If the service is up, check your Envoy cluster configuration. The cluster name in
rate_limit_servicemust exactly match a cluster defined elsewhere:
rate_limit_service:
grpc_service:
envoy_grpc:
cluster_name: ratelimit_cluster # must match exactly
transport_api_version: V3
# Somewhere in your clusters section:
clusters:
- name: ratelimit_cluster # this name must match
type: STRICT_DNS
connect_timeout: 1s
http2_protocol_options: {} # required — gRPC runs over HTTP/2
load_assignment:
cluster_name: ratelimit_cluster
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: 192.168.10.30
port_value: 8081
Notice the
http2_protocol_options: {}line. Omitting it means Envoy tries to speak HTTP/1.1 to the gRPC rate limit service, which will fail or produce garbage. Also consider setting
failure_mode_deny: truein non-critical paths during development so failures become visible rather than silent:
http_filters:
- name: envoy.filters.http.ratelimit
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.ratelimit.v3.RateLimit
domain: solvethenetwork_api
failure_mode_deny: true # fail closed during testing
rate_limit_service:
grpc_service:
envoy_grpc:
cluster_name: ratelimit_cluster
transport_api_version: V3
Root Cause 2: Descriptor Not Matching
This one trips up even experienced Envoy operators. Your Envoy config generates rate limit descriptors. Your rate limit service has a config file with rate limit rules. If those two don't match exactly — down to the key names and values — requests pass through without being counted.
Why it happens: Descriptors in Envoy are built from
actionsdefined on a virtual host or route. Those actions produce key-value pairs that get sent to the rate limit service. The rate limit service then looks up those exact key-value pairs in its config. If your Envoy action generates
{"remote_address": "192.168.1.44"}but your ratelimit config has a rule for
{"header_match": "x-api-key"}, no rule fires. There's no error. Requests just aren't counted.
How to identify it: Enable debug logging on Envoy to see the exact descriptors being generated and sent:
curl -s -X POST "http://192.168.10.25:9901/logging?filter=debug"
curl -s -X POST "http://192.168.10.25:9901/logging?http=debug"
Then send a test request and watch the logs:
journalctl -u envoy -f | grep -i ratelimit
[debug][http] [source/common/http/filter_manager.cc:867] [C4][S12345] request headers complete (end_stream=false)
[debug][ratelimit] Descriptor entry key: remote_address value: 192.168.1.44
[debug][ratelimit] Descriptor entry key: destination_cluster value: api_backend
[debug][ratelimit] grpc request to ratelimit service: domain=solvethenetwork_api descriptors=[{entries:[{key:remote_address value:192.168.1.44},{key:destination_cluster value:api_backend}]}]
Now compare that against your rate limit service config. Here's an example of a broken mismatch:
# What Envoy sends:
domain: solvethenetwork_api
descriptors:
- entries:
- key: remote_address
value: 192.168.1.44
# What the rate limit service expects:
domain: solvethenetwork_api
descriptors:
- key: client_ip # <-- different key name!
rate_limit:
requests_per_unit: 100
unit: MINUTE
The keys don't match.
remote_addressvs
client_ip. No rule fires.
How to fix it: Align the descriptor keys. In Envoy's route config, the action type controls what key name is used.
remote_addressactions always produce a
remote_addresskey.
request_headersactions produce the descriptor key you specify. Make sure your ratelimit service config uses the exact same key:
# Envoy virtual host config:
virtual_hosts:
- name: api_vhost
domains: ["api.solvethenetwork.com"]
rate_limits:
- actions:
- remote_address: {} # produces key: remote_address
# Matching ratelimit service config (config.yaml):
domain: solvethenetwork_api
descriptors:
- key: remote_address # matches what Envoy sends
rate_limit:
requests_per_unit: 100
unit: MINUTE
Also watch for domain mismatches. The
domainfield in your Envoy ratelimit filter config must match the
domainkey in your rate limit service config file. A single character difference means no rules match.
Root Cause 3: Global vs Local Rate Limit Confusion
Envoy has two fundamentally different rate limiting mechanisms and they work nothing alike. Global rate limiting talks to an external gRPC service and enforces cluster-wide limits. Local rate limiting is done entirely in-process, per Envoy instance, using a token bucket algorithm. I've seen operators configure one while expecting the behavior of the other, and the results are always confusing.
Why it happens: Someone configures a local rate limit thinking it will apply across all Envoy replicas, or they configure global rate limiting but accidentally target the wrong filter type in their config. The filter names are different, the config structure is different, and they don't share any state with each other.
The global rate limit filter is:
envoy.filters.http.ratelimit
The local rate limit filter is:
envoy.filters.http.local_ratelimit
How to identify it: Check which filter is actually active in your chain:
curl -s http://192.168.10.25:9901/config_dump | \
python3 -c "import sys,json; d=json.load(sys.stdin); \
[print(f['name']) for cfg in d['configs'] \
for lc in cfg.get('dynamic_listeners',[]) \
for fc in lc.get('active_state',{}).get('listener',{}).get('filter_chains',[]) \
for f in fc.get('filters',[]) \
for hf in f.get('typed_config',{}).get('http_filters',[])]"
Or more pragmatically, just grep the config dump:
curl -s http://192.168.10.25:9901/config_dump | grep -A2 'ratelimit'
"name": "envoy.filters.http.local_ratelimit",
"typed_config": {
"@type": "type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit"
If you expected global rate limiting, finding
local_ratelimitexplains everything. Local rate limits have no awareness of other Envoy instances. With three replicas each allowing 100 req/min, your effective limit is 300 req/min, not 100.
Check the stats too — local and global rate limits emit different metric namespaces:
curl -s http://192.168.10.25:9901/stats | grep ratelimit
# Global rate limit stats look like:
http.ingress_http.ratelimit.ok: 452
http.ingress_http.ratelimit.over_limit: 0
# Local rate limit stats look like:
http.ingress_http.local_rate_limit.enabled: 452
http.ingress_http.local_rate_limit.enforced: 452
http.ingress_http.local_rate_limit.rate_limited: 0
How to fix it: Decide which model you actually need. If you need cluster-wide enforcement — for example, limiting a downstream client to 1000 req/min regardless of how many Envoy pods are running — you need global rate limiting with an external ratelimit service. If per-instance limits are acceptable (common for CPU protection, not API quota enforcement), local rate limiting is simpler and has no external dependency.
If you need to migrate from local to global, here's the difference in filter config:
# Local rate limit (per-instance, no external service)
http_filters:
- name: envoy.filters.http.local_ratelimit
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
stat_prefix: local_rate_limiter
token_bucket:
max_tokens: 100
tokens_per_fill: 100
fill_interval: 60s
filter_enabled:
default_value:
numerator: 100
denominator: HUNDRED
filter_enforced:
default_value:
numerator: 100
denominator: HUNDRED
# Global rate limit (cluster-wide, requires external service)
http_filters:
- name: envoy.filters.http.ratelimit
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.ratelimit.v3.RateLimit
domain: solvethenetwork_api
stage: 0
failure_mode_deny: false
rate_limit_service:
grpc_service:
envoy_grpc:
cluster_name: ratelimit_cluster
transport_api_version: V3
Root Cause 4: Header Not Being Sent to Rate Limit Service
When you configure rate limits based on request headers — an API key, a user ID, a tenant header — Envoy needs to actually receive that header and then forward it as part of the descriptor to the rate limit service. If the header is stripped upstream, never set by the client, or the descriptor action references the wrong header name, no rate limit applies.
Why it happens: A few scenarios cause this. The client isn't sending the header at all (common during development when API keys aren't yet enforced at the client). An upstream proxy or load balancer is stripping the header before it reaches Envoy. The header name in the Envoy
request_headersaction uses a different capitalization or name than what's actually sent. Or there's a
skip_if_absentsetting causing the descriptor entry to be omitted when the header is missing.
How to identify it: First, verify what headers are actually arriving at Envoy. You can do this by temporarily routing to a debug backend or using Envoy's access log with header logging enabled:
access_log:
- name: envoy.access_loggers.file
typed_config:
"@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
path: /var/log/envoy/access.log
log_format:
text_format: "[%START_TIME%] %REQ(:METHOD)% %REQ(X-API-KEY)% %REQ(X-USER-ID)% %RESPONSE_CODE%\n"
tail -f /var/log/envoy/access.log
[2026-04-20T14:22:01.443Z] GET - - 200
[2026-04-20T14:22:02.101Z] GET - - 200
[2026-04-20T14:22:02.887Z] GET - - 200
Those dashes where the API key should be confirm the header isn't arriving. Now check the descriptor action in your Envoy config:
rate_limits:
- actions:
- request_headers:
header_name: x-api-key
descriptor_key: api_key
skip_if_absent: true # <-- when header is missing, descriptor entry is skipped
With
skip_if_absent: true, if
x-api-keyisn't present, the entire descriptor entry is omitted. Your ratelimit rule requires that key. No entry means no match means no rate limit.
You can also watch the debug logs for descriptor generation with missing headers:
[debug][ratelimit] Descriptor entry with header x-api-key: header not present, skipping entry
[debug][ratelimit] grpc request to ratelimit service: domain=solvethenetwork_api descriptors=[{entries:[]}]
[debug][ratelimit] Response: OK (no descriptor matched)
How to fix it: The fix depends on intent. If you want to rate limit requests that lack the header (anonymous traffic), change your ratelimit config to handle the missing case explicitly, or use a different action type like
remote_addressas a fallback. If you want to block requests with no API key outright, handle that in a different filter before the rate limit filter runs.
If the header is being stripped by an upstream proxy, you need to investigate that proxy's config. For nginx in front of Envoy, check that
proxy_pass_request_headers onis set and that no explicit
proxy_set_headerblocks are overriding the header.
Also verify header name casing. HTTP/2 requires lowercase header names, and Envoy normalizes to lowercase. If your action references
X-API-Keyinstead of
x-api-key, it won't match:
# Wrong — will never match in HTTP/2 normalized form
request_headers:
header_name: X-API-Key
descriptor_key: api_key
# Correct
request_headers:
header_name: x-api-key
descriptor_key: api_key
Root Cause 5: Filter Not in Chain
The most embarrassingly simple cause: the rate limit filter simply isn't in the HTTP filter chain. The config was written but applied to the wrong listener, or it was added to a filter chain that doesn't handle the traffic you're testing with. No filter in the chain means no rate limiting, no errors, nothing.
Why it happens: Envoy configs can have multiple listeners, multiple filter chains per listener, and multiple route configurations. It's easy to add the ratelimit filter to a management listener instead of the ingress listener, or to add it to the static bootstrap config when traffic is served by an xDS-managed listener. In Kubernetes with Istio or similar control planes, the generated Envoy config may not include your custom filter at all if you didn't inject it via the right extension mechanism.
How to identify it: Dump the live config and search for the rate limit filter:
curl -s http://192.168.10.25:9901/config_dump > /tmp/envoy_config_dump.json
# Check which listeners exist
jq '.configs[] | select(."@type" | contains("ListenersConfigDump")) |
.dynamic_listeners[].name' /tmp/envoy_config_dump.json
"ingress_listener_8080"
"egress_listener_8443"
"prometheus_listener_9090"
# Find which listeners have the ratelimit filter
jq '.configs[] | select(."@type" | contains("ListenersConfigDump")) |
.dynamic_listeners[] | .name as $ln |
.. | strings | select(contains("ratelimit")) | {listener: $ln, filter: .}' \
/tmp/envoy_config_dump.json
If that last command returns nothing, the ratelimit filter isn't in any active listener. If it returns only
prometheus_listener_9090, you added it to the wrong listener.
Another quick check using the stats endpoint — if the filter were active and processing requests, you'd see ratelimit stats incrementing:
curl -s http://192.168.10.25:9901/stats | grep 'ratelimit\|local_rate_limit'
# Empty output = filter is not loaded or not processing any requests
How to fix it: Locate the correct listener in your config and ensure the ratelimit filter appears in its
http_filterslist, before the
envoy.filters.http.routerfilter. The router must always be last:
listeners:
- name: ingress_listener_8080
address:
socket_address:
address: 0.0.0.0
port_value: 8080
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_http
http_filters:
- name: envoy.filters.http.ratelimit # must be here
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.ratelimit.v3.RateLimit
domain: solvethenetwork_api
rate_limit_service:
grpc_service:
envoy_grpc:
cluster_name: ratelimit_cluster
transport_api_version: V3
- name: envoy.filters.http.router # router must be last
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
After making changes, validate the config before reloading:
envoy --mode validate -c /etc/envoy/envoy.yaml
configuration '/etc/envoy/envoy.yaml' OK
Root Cause 6: Stage Mismatch
Envoy's global rate limit filter supports a
stageparameter (0–10). Rate limit actions on routes also have a stage value. A rate limit filter with stage 0 only processes rate limit actions also configured for stage 0. If your filter is stage 0 and your route actions are stage 1, no rate limit runs. This is a feature meant for applying different rate limits at different points in request processing — but it's a footgun when stages drift out of sync.
How to identify it: Compare the stage value in your filter config against the stage in your route's rate_limits actions:
# Filter config — check the stage value:
http_filters:
- name: envoy.filters.http.ratelimit
typed_config:
stage: 0 # this filter handles stage 0 actions
# Route config — check all rate_limits entries:
routes:
- match:
prefix: "/api"
route:
cluster: api_backend
rate_limits:
- stage: 1 # <-- mismatch! filter is stage 0, action is stage 1
actions:
- remote_address: {}
How to fix it: Align the stage values. If you're not intentionally using multi-stage rate limiting, set all stage values to 0 or omit the field entirely (it defaults to 0).
Root Cause 7: Rate Limit Service Domain Mismatch
The ratelimit service groups rules by domain. Envoy sends a domain identifier with every gRPC request. If the domain in Envoy's filter config doesn't match any domain defined in the ratelimit service's config files, no rules are evaluated. The ratelimit service returns OK for all requests, and traffic flows freely.
How to identify it: Check the ratelimit service logs directly:
ssh infrarunbook-admin@sw-infrarunbook-01
journalctl -u ratelimit -n 50
time="2026-04-20T14:30:01Z" level=debug msg="starting rate limit lookup" domain=solvethenetwork_api
time="2026-04-20T14:30:01Z" level=warning msg="unknown domain" domain=solvethenetwork_api
time="2026-04-20T14:30:01Z" level=debug msg="returning OK, domain not found"
That warning is unambiguous. Now check what domains the ratelimit service actually knows about:
ls /etc/ratelimit/config/
api_limits.yaml admin_limits.yaml
head -1 /etc/ratelimit/config/api_limits.yaml
domain: solvethenetwork_v2 # Envoy is sending solvethenetwork_api
How to fix it: Update the domain in either the ratelimit service config or the Envoy filter config so they match. After updating the ratelimit service config, reload it — the ratelimit service typically watches for config file changes, but a restart is safer:
systemctl restart ratelimit
curl -s http://192.168.10.30:8080/json | python3 -m json.tool
# Should show your new domain and rules
Prevention
Most of these failures share a common theme: things break silently. Rate limiting is a control-plane feature where the absence of enforcement looks exactly like a healthy system from the outside. Prevention is almost entirely about making failures visible before they become incidents.
Set
failure_mode_deny: truein non-production environments. Yes, it will cause outages during testing — that's the point. You want to find connectivity failures during your integration tests, not when a client abuse spike hits production. In production, keep it at
falseif availability matters more than rate limit enforcement, but monitor the
ratelimit.failure_mode_allowedcounter and alert when it rises.
Add ratelimit stats to your dashboard from day one. The key counters are
ratelimit.ok,
ratelimit.over_limit,
ratelimit.error, and
ratelimit.failure_mode_allowed. If
over_limitis always zero and
okis climbing, either your limits are too high or your config is broken. Both are worth investigating.
Write a synthetic test that deliberately exceeds your configured rate limit and verify you get 429s. Run this test in CI against a real ratelimit service instance. If the test starts passing when it should be getting throttled, your rate limit is broken — and your test will catch it before production does.
Keep domain names and descriptor keys in a shared constants file or configuration management system. When the domain name lives in two places (Envoy filter config and ratelimit service config), they will eventually drift. Treat them like API contracts.
When deploying changes to rate limit rules, use the ratelimit service's
/jsondebug endpoint to verify the loaded config matches your intent before routing traffic through it. One minute of verification saves hours of debugging.
