BGP Route Reflectors and Confederations...

The Problem Nobody Talks About Until It's Too Late

Every BGP engineer hits the same wall eventually. You've got a growing internal network, you need full reachability, and you know the rule: iBGP peers don't re-advertise routes to other iBGP peers. That rule exists for good reason — it prevents routing loops inside your AS. But it comes with a painful side effect. To achieve full reachability, every BGP router in your AS must peer directly with every other BGP router. That's a full mesh.

The math is brutal. With n routers, you need n*(n-1)/2 sessions. Ten routers? 45 sessions. Fifty routers? 1,225 sessions. A hundred routers? Nearly five thousand sessions — each one consuming memory, CPU, and configuration complexity. I've seen engineers try to manage full meshes manually at 30 or 40 nodes, and it always ends the same way: missed peerings, stale configs, and routing black holes that take hours to diagnose.

The IETF recognized this problem decades ago, and the answers came in two flavors: Route Reflectors (RFC 4456) and BGP Confederations (RFC 5065). Both solve the scalability problem, but they take fundamentally different approaches. Understanding when to use each — and the edge cases where each one bites you — is what separates engineers who just configure BGP from engineers who actually understand it.

Route Reflectors: Breaking the Full Mesh

A Route Reflector (RR) is just a BGP router that's been given permission to break the split-horizon rule. Normally, a router that learns a route via iBGP won't pass it to another iBGP peer. An RR relaxes this constraint for its clients. When a client sends a route to the RR, the RR reflects it — back to other clients and to any non-client iBGP peers. When a non-client sends a route to the RR, the RR forwards it to all clients.

The terminology matters here. In an RR topology, you have three types of routers: the route reflector itself, RR clients (routers that peer with the RR in client mode), and non-clients (regular iBGP peers that peer with the RR but aren't configured as clients). The RR and its clients form a cluster, identified by a cluster ID — typically the RR's router-ID unless you configure it explicitly.

The reflection rules are straightforward once you internalize them. A route received from an EBGP peer gets sent to all clients and all non-clients. A route received from a non-client iBGP peer gets sent to clients only. A route received from a client gets sent to all other clients and all non-client iBGP peers. That last rule catches people off guard sometimes — yes, the RR sends the reflected route back to the originating client. The client won't accept it because it recognizes its own router-ID in the ORIGINATOR_ID attribute, but the RR sends it anyway.

Loop Prevention: ORIGINATOR_ID and CLUSTER_LIST

Since RRs are allowed to re-advertise iBGP routes, you need a mechanism to prevent loops. BGP uses two attributes for this, and both are worth understanding deeply before you build anything non-trivial.

ORIGINATOR_ID is set by the first RR that reflects a route. Its value is the router-ID of the router that originally injected the route into iBGP. If a router receives a route update and sees its own router-ID in ORIGINATOR_ID, it silently discards it. Simple and effective.

CLUSTER_LIST is a sequence of cluster IDs that the route has passed through. Each RR prepends its own cluster ID to this list when it reflects a route. If an RR receives a route with its own cluster ID already present in CLUSTER_LIST, it discards the route. This is the mechanism that prevents loops in hierarchical RR designs where one RR feeds routes to another.

Here's what a basic RR configuration looks like:

router bgp 65001
 bgp router-id 10.0.0.1
 bgp cluster-id 10.0.0.1
 !
 neighbor 10.0.1.1 remote-as 65001
 neighbor 10.0.1.1 route-reflector-client
 !
 neighbor 10.0.1.2 remote-as 65001
 neighbor 10.0.1.2 route-reflector-client
 !
 neighbor 10.0.2.1 remote-as 65001
 ! (non-client - no route-reflector-client keyword)

On the client side, there's nothing special to configure. The client just peers with the RR like any other iBGP neighbor. The RR does all the work. This is by design — it means you can deploy RRs without touching client configurations, which matters a lot in large networks where client routers might be managed by different teams or locked behind change-freeze policies.

Hierarchical Route Reflectors

In large networks, a single RR becomes a bottleneck and a single point of failure. The common answer is to run two or more RRs per cluster — they peer with each other as non-clients, and each independently reflects routes to the same set of clients. This gives you redundancy without changing how clients behave.

For very large autonomous systems — large-scale ISPs or massive data center fabrics — you'll see hierarchical RR designs. Tier-2 RRs peer with edge routers and reflect routes up to Tier-1 RRs. The Tier-1 RRs reflect between Tier-2 RRs and to core infrastructure. Each tier forms its own cluster, identified by its own cluster ID. The CLUSTER_LIST attribute does the loop-prevention work across tiers automatically.

In my experience, the hardest part of hierarchical RR design isn't the BGP configuration itself — it's getting the cluster IDs right and keeping clear documentation of which router belongs to which cluster. I've seen networks where someone manually assigned cluster IDs without tracking them, and six months later nobody could explain why certain routes were being silently dropped. Always document your cluster topology as rigorously as your physical topology.

BGP Confederations: Splitting the AS

Where route reflectors solve the full mesh problem by changing advertisement rules, BGP Confederations solve it by splitting your AS into smaller pieces. The idea is this: your AS has a single public-facing AS number — the confederation identifier. Internally, you divide the network into sub-ASes, each with its own private AS number. Routers within a sub-AS run iBGP normally. Between sub-ASes, routers run a modified form of eBGP called confederation eBGP.

The routing behavior between sub-ASes is mostly eBGP — routes are re-advertised freely, the split-horizon rule doesn't apply. But confederation eBGP differs from regular eBGP in a few critical ways. NEXT_HOP is not changed at confederation boundaries by default. MED is preserved across sub-AS boundaries, whereas it's normally reset between regular eBGP peers. LOCAL_PREF is also preserved, which makes sense since you're still operating a single administrative domain regardless of how many internal sub-ASes you've carved out.

Externally, none of this internal structure is visible. When a route leaves your confederation and goes to an external peer, the confederation sub-AS numbers are stripped entirely. The external world sees your confederation AS number in the AS_PATH — nothing else. Your internal topology stays private.

Confederation-Specific Path Attributes

BGP uses two special path attributes inside a confederation: AS_CONFED_SEQUENCE and AS_CONFED_SET. These work like AS_PATH segments, but they carry sub-AS numbers rather than public AS numbers. When a route crosses sub-AS boundaries internally, the sub-AS numbers accumulate in these attributes. When the route exits the confederation toward an external peer, these confederation segments are stripped completely — external peers never see the internal sub-AS topology.

Loop prevention within a confederation works exactly like regular BGP: if a router sees its own sub-AS number in AS_CONFED_SEQUENCE, it rejects the route.

Confederation Configuration Example

Here's a minimal confederation setup. The confederation identifier — the public AS — is 65000. Two sub-ASes: 64512 and 64513, using private AS numbers from the range defined in RFC 6996.

! sw-infrarunbook-01 in sub-AS 64512
router bgp 64512
 bgp confederation identifier 65000
 bgp confederation peers 64513
 bgp router-id 10.1.0.1
 !
 ! iBGP peer within the same sub-AS
 neighbor 10.1.0.2 remote-as 64512
 !
 ! Confederation eBGP peer in sub-AS 64513
 neighbor 10.2.0.1 remote-as 64513
 !
 ! External eBGP peer - sees AS 65000 only
 neighbor 192.168.100.1 remote-as 12345

! Router in sub-AS 64513
router bgp 64513
 bgp confederation identifier 65000
 bgp confederation peers 64512
 bgp router-id 10.2.0.1
 !
 neighbor 10.2.0.2 remote-as 64513
 !
 neighbor 10.1.0.1 remote-as 64512

Notice the

bgp confederation peers

statement. This tells the router which remote AS numbers are part of the same confederation. Without it, the router treats those sessions as regular external eBGP — which means NEXT_HOP gets changed, LOCAL_PREF doesn't propagate, and you end up with broken reachability that's genuinely confusing to diagnose because the sessions come up fine and routes appear to be exchanged. The symptom is usually one-way traffic or wrong NEXT_HOP values showing up in the RIB.

Why It Matters: What Actually Breaks in Production

The technical mechanics are interesting, but the real question is always: what breaks in production if you get this wrong?

The most common operational problem with route reflectors is suboptimal path selection. An RR only forwards its best path. It doesn't forward all paths to its clients — the clients have no visibility into paths the RR didn't select. If the RR's best path isn't the best path from a client's perspective (because the client has a different IGP cost to the exit point), the client uses a suboptimal route. In topologies where IGP costs differ significantly across the network, this becomes a genuine performance issue, not just a theoretical concern.

The fix is BGP Add-Path (RFC 7911), which allows an RR to advertise multiple paths to clients so they can run their own best-path selection. It adds configuration complexity — you need to enable it on both the RR and each client, and you need to think carefully about how many paths to advertise — but in asymmetric topologies, it's often essential.

With confederations, the most common mistake is forgetting to update the

bgp confederation peers

statement when you add a new sub-AS. You need to update that command on every router that will form confederation eBGP sessions with the new sub-AS. Miss one, and that router treats the new sub-AS as a regular external neighbor. NEXT_HOP gets rewritten, LOCAL_PREF doesn't flow, and routes stop working in one direction while appearing completely fine in the other. It's a subtle asymmetric failure that can waste hours.

Troubleshooting is also different between the two. With RRs, you're typically asking: why isn't this route being reflected? The usual suspects are ORIGINATOR_ID matching (the route looped back to its originator), CLUSTER_LIST rejection (misconfigured cluster IDs in hierarchical setups), or the RR simply not selecting the route as best. With confederations, you're more often asking: why is this attribute value wrong on the far side of the sub-AS boundary? — which takes you straight back to checking confederation peer declarations and NEXT_HOP propagation behavior.

! Useful verification commands

! Check RR is establishing sessions and see peer types
show bgp ipv4 unicast summary

! Inspect a specific prefix - look for ORIGINATOR_ID and CLUSTER_LIST
show bgp ipv4 unicast 10.0.0.0/8

! See exactly what an RR is advertising to a client
show bgp ipv4 unicast neighbors 10.1.0.2 advertised-routes

! In a confederation, verify confederation path attributes on a route
show bgp ipv4 unicast 172.16.0.0/12 detail

! Confirm which neighbors are confederation peers vs external
show bgp ipv4 unicast neighbors 10.2.0.1 | include BGP state|Confederation

Real-World Design Patterns

Data Center BGP Fabrics

Modern data center designs — particularly those following the Clos spine-leaf model — often run BGP in the underlay. You'll see route reflectors deployed on spine switches, with leaf switches as clients. Each spine is an RR, and the spines peer with each other as non-clients. This gives you a clean, scalable design where leaf switches only need two BGP sessions regardless of how many leaves are in the fabric.

In this model, Add-Path is almost always enabled so leaves can receive multiple paths and do ECMP properly. The whole fabric typically runs inside a single private AS, often using 32-bit private AS numbers from the 4200000000–4294967294 range defined in RFC 6996. The RR cluster ID is usually the spine's router-ID, and redundancy comes from running two spines as co-located RRs for the same cluster of leaf clients.

Large ISP Core Networks

ISP core networks with hundreds of BGP-speaking routers typically use hierarchical RR designs. Route reflectors sit in major PoPs or regional hubs. Regional RRs peer with local edge and peering routers as clients. A small set of top-level RRs peer with all regional RRs and with each other as non-clients. This keeps the session count manageable and concentrates BGP control-plane load on a small number of well-resourced route servers.

Confederations are less common in modern greenfield ISP designs, but you do still see them in large networks built in the early 2000s when confederation support was more mature than RR support on certain vendor platforms. They also show up when an operator wants genuine policy separation between parts of their network — for example, customer routes in one sub-AS and peering routes in another, with different communities and filtering applied at the sub-AS boundary. That's a legitimate use case that RRs don't handle as cleanly.

Using Both Together

There's nothing stopping you from combining RRs and confederations. Within each sub-AS in a confederation, you might run a full mesh if the sub-AS is small, or deploy RRs if it's large enough to warrant it. This combination is common in large-scale carrier deployments: confederations provide administrative and policy separation, RRs provide scalability within each subdivision. The two mechanisms operate at different levels and don't interfere with each other.

Common Misconceptions

The route reflector is in the data path. It isn't. The RR handles the BGP control plane — it reflects routing information. Once a client learns a route and installs it in its FIB, traffic flows directly between endpoints through the actual network topology. The RR doesn't see that traffic. Confusing control plane and data plane here leads to misguided designs where engineers try to optimize RR placement for throughput instead of for BGP session reachability and convergence time.

Route reflectors change the best path. They don't. An RR selects its own best path using the normal BGP path selection algorithm, then reflects that best path to clients. The RR's view of the best path becomes the clients' only view — unless you deploy Add-Path. Without it, your clients are completely dependent on the RR's local topology perspective, which won't match the client's own IGP costs to exit points.

Confederations provide network isolation. Not really. Externally, it's still one AS. You can apply routing policy at sub-AS boundaries, but there's no hard isolation. A misconfigured filter or a leaked route inside the confederation affects the entire confederation. If you need genuine isolation — separate routing domains exchanging routes under controlled policy — you want VRFs, not confederations.

Confederations are obsolete. This one comes up in discussions with engineers who've only worked in modern data center environments. Confederations are still actively used in large ISP and carrier networks, they're current in the RFCs, and every major platform supports them. They're not the default choice for new greenfield designs, but calling them obsolete ignores a significant fraction of production networks running them successfully today.

You need to configure RR client behavior on the client router itself. You don't. RR client designation is entirely a configuration on the RR. The client sees a normal iBGP session and has no knowledge that it's a client. This is a deliberate design choice that makes RR deployment non-disruptive to existing client configurations.

Choosing Between Them

In practice, route reflectors are the default choice for most new designs. They're simpler to configure, easier to troubleshoot, and uniformly well-supported across all major BGP implementations. Add-Path handles the suboptimal path issue, and hierarchical designs scale to very large networks. If you're building from scratch, start with RRs.

Confederations make sense when you have a genuine need for policy differentiation between parts of your AS — the kind that communities and route-maps on an RR don't cleanly express — or when you're inheriting a network that already uses them. Migrating away from an established confederation topology is a significant undertaking that rarely justifies the disruption unless there's a specific operational problem driving the change.

Get your cluster design right, document your cluster IDs explicitly, plan for RR redundancy from day one, and evaluate Add-Path if you're operating asymmetric topologies. Those four things will save you from the most common failure modes and give you a BGP control plane that the next engineer to inherit this network will actually be able to reason about.

BGP Route Reflectors and Confederations Explained

The Problem Nobody Talks About Until It's Too Late

Route Reflectors: Breaking the Full Mesh

Loop Prevention: ORIGINATOR_ID and CLUSTER_LIST

Hierarchical Route Reflectors

BGP Confederations: Splitting the AS

Confederation-Specific Path Attributes

Confederation Configuration Example

Why It Matters: What Actually Breaks in Production

Real-World Design Patterns

Data Center BGP Fabrics

Large ISP Core Networks

Using Both Together

Common Misconceptions

Choosing Between Them

Related Articles

Frequently Asked Questions

What is the main difference between a BGP Route Reflector and a BGP Confederation?

Does a BGP Route Reflector affect the forwarding path of traffic?

What are ORIGINATOR_ID and CLUSTER_LIST used for in BGP Route Reflection?

Why would a router silently drop a route in a hierarchical Route Reflector design?

What happens to confederation sub-AS numbers when a route is advertised to an external BGP peer?

What is BGP Add-Path and when is it needed with Route Reflectors?

Related Articles