The Architecture at a Glance
BGP EVPN with VXLAN on a leaf-spine fabric is the modern standard for scalable data center networking. If you're still running STP-heavy Layer 2 domains stretched across every switch in your stack, you're fighting a losing battle as your environment grows. BGP EVPN VXLAN solves that problem cleanly — it gives you Layer 2 extension across a fully routed underlay without the operational misery of legacy approaches like OTV or VPLS.
The architecture divides into two distinct planes. The underlay is a pure Layer 3 routed fabric, typically built with eBGP between spines and leaves. The overlay is where BGP EVPN runs, advertising MAC and IP reachability so each VTEP (VXLAN Tunnel Endpoint) knows where remote hosts live. VXLAN handles the actual encapsulation in the data plane — wrapping Layer 2 frames in UDP so they traverse the routed underlay as if it were just a transport pipe.
Arista EOS handles this architecture well. The platform has had mature, first-class EVPN support for years, and the configuration model is clean once you internalize the separation between underlay and overlay. Let's build it from the ground up.
Building the Underlay — Getting the Basics Right
The underlay is your foundation, and getting it wrong means you'll chase ghost VXLAN issues for hours before discovering the real culprit was a flapping BGP session. In a standard leaf-spine deployment, each leaf peers eBGP with both spines. Spines don't peer with each other — traffic flows leaf-spine-leaf, full stop. This keeps the topology simple and gives you deterministic ECMP behavior without the complexity of multi-tier designs.
ASN assignment matters more than people realize. A common approach is assigning each spine a shared ASN — say 65000 — and giving each leaf a unique ASN starting at 65001. Some teams use fully unique ASNs everywhere. In my experience, the shared spine ASN with unique leaf ASNs works cleanly in most deployments, but you need to be deliberate about it if you ever plan to connect two fabrics. BGP's AS-path loop prevention will bite you if you're not careful with ASN reuse across domains.
Loopback interfaces carry the VTEP addresses. Each leaf gets a loopback that serves as its VXLAN source interface, and those loopbacks must be reachable from every other leaf — that's the entire job of the underlay. Advertise them into the underlay BGP process and set your
maximum-pathsappropriately on both spines and leaves to take full advantage of ECMP.
! sw-infrarunbook-01 — Leaf-01 underlay configuration
! ASN 65001, peering with Spine-01 and Spine-02 (both ASN 65000)
router bgp 65001
router-id 10.0.1.1
maximum-paths 4 ecmp 4
neighbor SPINES peer group
neighbor SPINES remote-as 65000
neighbor SPINES send-community extended
neighbor SPINES maximum-routes 12000
neighbor 10.1.0.0 peer-group SPINES
neighbor 10.1.0.2 peer-group SPINES
!
address-family ipv4
neighbor SPINES activate
network 10.0.1.1/32
That
send-community extendedline is non-negotiable. EVPN relies heavily on BGP extended communities, and if you omit it, your overlay sessions will establish but carry nothing useful. I've seen this exact omission waste an afternoon of troubleshooting — the session shows as Established, the routes appear to be exchanged, but the EVPN table stays empty. Always include it on every BGP peer group that participates in EVPN.
VXLAN Data Plane — The Encapsulation Side
VXLAN is fundamentally UDP encapsulation with a 24-bit VNI header. When a frame arrives at a leaf from a locally attached server, the leaf looks up the destination MAC, finds which remote VTEP owns it, wraps the original Ethernet frame in a VXLAN header (UDP port 4789 by default), adds an outer IP header using its local loopback as source and the remote VTEP loopback as destination, and ships it across the underlay. The receiving VTEP strips the encapsulation and delivers the original frame locally. To the end hosts, none of this is visible.
The VNI is your Layer 2 segment identifier — think VLAN ID but with a 16 million address space. Each VLAN maps to a Layer 2 VNI (L2VNI). When inter-VNI routing is involved, you also need Layer 3 VNIs (L3VNI) tied to a VRF. The distinction between L2VNI and L3VNI trips up a lot of people who are new to EVPN — keep them in separate, non-overlapping ranges so the purpose is obvious from the number alone.
! VXLAN interface configuration on sw-infrarunbook-01
interface Vxlan1
vxlan source-interface Loopback0
vxlan udp-port 4789
vxlan vlan 10 vni 10010
vxlan vlan 20 vni 10020
vxlan vlan 30 vni 10030
vxlan vrf TENANT-A vni 50001
vxlan vrf TENANT-B vni 50002
vxlan arp proxy
VNI consistency across all leaves is mandatory and entirely your responsibility. VLAN 10 on leaf-01 must map to VNI 10010, and VLAN 10 on leaf-02 must also map to VNI 10010. There's no protocol mechanism that detects mismatches. Two leaves can have inconsistent VNI mappings, maintain healthy BGP sessions, and silently fail to forward traffic between hosts that should share a segment. Make VNI consistency validation a hard requirement in your provisioning workflow.
One more thing on the data plane before we move on: MTU. VXLAN adds 50 bytes of overhead to every frame. If your underlay interfaces are running at the default 1500-byte MTU, you'll silently drop or fragment large frames. Set your underlay interface MTU to at least 1574, and ideally 9214 if your hardware and cabling support jumbo frames end-to-end. This is one of those things that works fine in a lab with small test packets and only surfaces under real workload.
BGP EVPN Control Plane — Where the Magic Happens
BGP EVPN is the control plane that gives every VTEP a complete picture of MAC and IP reachability across the fabric — without flooding. It uses BGP's multi-protocol extensions to carry this information in the L2VPN EVPN address family. If you already understand BGP, EVPN isn't a new protocol to learn; it's a new set of NLRI types being carried by a protocol you already know.
The overlay BGP sessions run between loopbacks, on top of the underlay. In most leaf-spine deployments you'll use eBGP for the overlay as well, with spines acting as route reflectors. Each leaf establishes an EVPN session to both spines. The spines reflect EVPN routes between leaves without needing to maintain a full mesh. Adding a new leaf only requires configuration on the new leaf itself — if you use
bgp listen rangeon the spines, the spine configuration doesn't change at all.
! BGP EVPN overlay on sw-infrarunbook-01 (Leaf-01)
! Sessions go to spine loopbacks, not directly connected IPs
router bgp 65001
neighbor EVPN-SPINES peer group
neighbor EVPN-SPINES remote-as 65000
neighbor EVPN-SPINES update-source Loopback0
neighbor EVPN-SPINES ebgp-multihop 3
neighbor EVPN-SPINES send-community extended
neighbor 10.0.0.1 peer-group EVPN-SPINES
neighbor 10.0.0.2 peer-group EVPN-SPINES
!
address-family evpn
neighbor EVPN-SPINES activate
!
address-family ipv4
no neighbor EVPN-SPINES activate
! BGP EVPN overlay on Spine-01 — route reflector role
! Dynamic listen range allows leaves to join without spine config changes
router bgp 65000
bgp listen range 10.0.1.0/24 peer-group EVPN-LEAVES remote-as 65001
neighbor EVPN-LEAVES peer group
neighbor EVPN-LEAVES update-source Loopback0
neighbor EVPN-LEAVES route-reflector-client
neighbor EVPN-LEAVES send-community extended
!
address-family evpn
neighbor EVPN-LEAVES activate
The
ebgp-multihop 3on the leaf is required because the EVPN sessions source from loopbacks that aren't directly connected — the session travels through the physical eBGP hop first. Without it, the session won't establish. Also note that the EVPN peer group is explicitly deactivated in the IPv4 address family. You want those sessions carrying only EVPN NLRI, not mixing in IPv4 unicast routes from the underlay.
EVPN Route Types You Actually Need to Know
EVPN defines five route types, but in a standard leaf-spine fabric you'll work with three of them regularly. Type 2 (MAC/IP Advertisement) is the workhorse. When a host comes online, the local leaf learns its MAC via normal data plane learning and its IP via ARP snooping, then generates a Type 2 route and advertises it to the route reflectors. Every other VTEP in the fabric now knows exactly where that host's MAC and IP live without ever flooding a single frame.
Type 3 (Inclusive Multicast Ethernet Tag) handles BUM traffic — Broadcast, Unknown unicast, and Multicast. A Type 3 route effectively says "I'm a VTEP participating in this VNI, include me in your replication list." Most deployments use ingress replication: the sending VTEP makes a unicast copy of BUM frames for each remote VTEP that has advertised a Type 3 route for that VNI. It's not the most bandwidth-efficient approach at very large scale, but it's operationally simple and works well for the vast majority of enterprise data center deployments.
Type 5 (IP Prefix Route) is used for advertising external prefixes into the EVPN fabric. When a border leaf connects to a WAN edge or external firewall, it imports those prefixes and re-advertises them as Type 5 routes across the overlay. Every leaf in the fabric can then reach external destinations by sending traffic to the border leaf's VTEP. This is the integration point between your EVPN fabric and the rest of the world.
Symmetric IRB — The Right Way to Route Between VNIs
Integrated Routing and Bridging (IRB) is how inter-VLAN routing works within the EVPN fabric. There are two models. Asymmetric IRB has the ingress leaf do all the routing — it needs every VLAN configured locally even if no hosts in that VLAN are attached to it. That approach doesn't scale. Symmetric IRB is what you should deploy.
In symmetric IRB, the ingress leaf routes the packet from the source VLAN's SVI into the Layer 3 VNI associated with the tenant VRF. The packet traverses the underlay encapsulated with the L3VNI. The egress leaf receives it, looks up the destination IP in the VRF routing table, and bridges it out the appropriate local VLAN to the destination host. Both ingress and egress leaves perform a routing lookup — hence symmetric. Each leaf only needs the VLANs it actually serves, plus the L3VNI for the VRF.
! VRF and SVI configuration for TENANT-A on sw-infrarunbook-01
vrf instance TENANT-A
!
ip routing vrf TENANT-A
!
interface Vlan10
vrf TENANT-A
ip address virtual 10.10.10.1/24
!
interface Vlan20
vrf TENANT-A
ip address virtual 10.10.20.1/24
!
router bgp 65001
vrf TENANT-A
rd 10.0.1.1:50001
route-target import evpn 65000:50001
route-target export evpn 65000:50001
redistribute connected
The
ip address virtualkeyword is specific to Arista EOS and implements the distributed anycast gateway model. Every leaf uses the same virtual MAC and the same gateway IP for a given SVI. A host in VLAN 10 always uses 10.10.10.1 as its default gateway, regardless of which leaf it's physically attached to. When that host migrates — live VM migration, for example — it doesn't need to re-ARP because the gateway MAC is identical on the new leaf. This is one of those features that sounds like a small detail but matters enormously in practice.
The route-target values tie the VRF together across all leaves. Every leaf importing and exporting the same route-target for TENANT-A will share routes within that VRF. Hosts in TENANT-A on leaf-01 can reach hosts in TENANT-A on leaf-04 because both leaves export their connected routes with the same route-target, and both import routes carrying that same community. TENANT-B stays isolated because its route-targets don't overlap.
ARP Suppression and Why It Saves You at Scale
Without ARP suppression, every ARP request floods across the VXLAN fabric to all VTEPs participating in the VNI. At a few hundred hosts that's merely annoying. At tens of thousands of hosts it becomes a genuine performance problem and a source of instability. ARP suppression lets the local VTEP answer ARP requests on behalf of remote hosts by consulting the MAC/IP table that BGP EVPN has already built — no flooding required.
When a host ARPs for 10.10.10.50, instead of replicating that request to every VTEP in the VNI, the local leaf checks its EVPN-derived ARP table. If it has a Type 2 route for that IP with the associated MAC, it generates a proxy ARP reply locally. The ARP request never enters a VXLAN tunnel. BUM traffic volume drops significantly as host counts grow, and you avoid the burst storms that can occur when large numbers of hosts ARP simultaneously after a network event.
! ARP suppression on Arista EOS — single line, global across all VNIs
interface Vxlan1
vxlan arp proxy
! Verify ARP suppression table is populated
show vxlan address-table
show arp vrf TENANT-A
Complete Reference Configuration
Here's a full leaf configuration that ties everything together — two tenants, symmetric IRB, ARP suppression, and both underlay and overlay BGP in a single coherent block. This is close to what you'd deploy in a real environment at solvethenetwork.com.
! Full configuration: sw-infrarunbook-01
! Role: Leaf-01 | ASN 65001 | Loopback: 10.0.1.1/32
hostname sw-infrarunbook-01
!
ip routing
!
vrf instance TENANT-A
vrf instance TENANT-B
!
ip routing vrf TENANT-A
ip routing vrf TENANT-B
!
interface Loopback0
ip address 10.0.1.1/32
!
interface Ethernet1
description Uplink to Spine-01
mtu 9214
no switchport
ip address 10.1.0.1/31
!
interface Ethernet2
description Uplink to Spine-02
mtu 9214
no switchport
ip address 10.1.0.3/31
!
interface Ethernet3
description Server — VLAN 10 access
switchport access vlan 10
!
interface Vlan10
vrf TENANT-A
ip address virtual 10.10.10.1/24
!
interface Vlan20
vrf TENANT-A
ip address virtual 10.10.20.1/24
!
interface Vlan30
vrf TENANT-B
ip address virtual 10.30.30.1/24
!
interface Vxlan1
vxlan source-interface Loopback0
vxlan udp-port 4789
vxlan vlan 10 vni 10010
vxlan vlan 20 vni 10020
vxlan vlan 30 vni 10030
vxlan vrf TENANT-A vni 50001
vxlan vrf TENANT-B vni 50002
vxlan arp proxy
!
router bgp 65001
router-id 10.0.1.1
maximum-paths 4 ecmp 4
!
neighbor SPINES peer group
neighbor SPINES remote-as 65000
neighbor SPINES send-community extended
neighbor SPINES maximum-routes 12000
neighbor 10.1.0.0 peer-group SPINES
neighbor 10.1.0.2 peer-group SPINES
!
neighbor EVPN-SPINES peer group
neighbor EVPN-SPINES remote-as 65000
neighbor EVPN-SPINES update-source Loopback0
neighbor EVPN-SPINES ebgp-multihop 3
neighbor EVPN-SPINES send-community extended
neighbor 10.0.0.1 peer-group EVPN-SPINES
neighbor 10.0.0.2 peer-group EVPN-SPINES
!
address-family evpn
neighbor EVPN-SPINES activate
!
address-family ipv4
neighbor SPINES activate
no neighbor EVPN-SPINES activate
network 10.0.1.1/32
!
vrf TENANT-A
rd 10.0.1.1:50001
route-target import evpn 65000:50001
route-target export evpn 65000:50001
redistribute connected
!
vrf TENANT-B
rd 10.0.1.1:50002
route-target import evpn 65000:50002
route-target export evpn 65000:50002
redistribute connected
Once this is deployed, verify the fabric is functioning end to end:
! Confirm EVPN overlay sessions are established
show bgp evpn summary
! Inspect MAC/IP routes (Type 2) — should show remote host MACs and IPs
show bgp evpn route-type mac-ip
! Inspect multicast routes (Type 3) — should show all remote VTEPs per VNI
show bgp evpn route-type imet
! Verify active VXLAN tunnels to remote VTEPs
show vxlan vtep
! Check VNI state and VLAN mappings
show vxlan vni
! Confirm ARP suppression table is populated with remote host entries
show vxlan address-table
Common Misconceptions That Will Burn You
The most persistent misconception I encounter is that EVPN is a complicated protocol. It's not. EVPN is BGP carrying MAC and IP information in addition to IP prefixes. If you understand how BGP advertises routes and how route-targets control import/export policy, you already understand the core of EVPN. The perceived complexity usually comes from the configuration model, not the protocol logic itself. Approach it incrementally — get the underlay working, then light up the overlay, then verify Type 2 routes before worrying about symmetric IRB.
The second misconception is assuming VNI consistency is enforced somewhere. It isn't. Your provisioning system — whether CloudVision, Ansible, Terraform, or a configuration template — must enforce consistent VLAN-to-VNI mappings across all leaves. Two leaves can have mismatched VNI assignments, maintain perfectly healthy BGP sessions with full route tables, and silently fail to forward traffic between hosts in the same logical segment. The control plane looks healthy. The data plane is broken. This is one of the harder failure modes to diagnose because the usual commands all return positive results.
Third: neglecting the underlay. In my experience, a meaningful fraction of EVPN troubleshooting sessions turn out to be underlay problems in disguise. An unstable BGP session, an MTU mismatch on a spine uplink, or suboptimal ECMP hashing causing hot paths — all of these manifest as intermittent overlay connectivity. Always rule out underlay issues first. Run
show bgp summary, check for any sessions that aren't Established, verify MTU consistency, and confirm ECMP is actually hashing across all available paths before diving into EVPN specifics.
Fourth: conflating the anycast gateway virtual MAC with the physical interface MAC. The virtual MAC used by
ip address virtualis the same across all leaves in the fabric — that's the whole point. But during migrations from non-EVPN environments, hosts may have cached ARP entries pointing to a previous physical gateway MAC. Those entries will cause blackholing until they expire or are cleared. If you're migrating hosts into a new EVPN fabric, account for ARP cache lifetime in your maintenance window planning.
Operational Considerations
BFD (Bidirectional Forwarding Detection) belongs on your underlay BGP sessions. Default BGP hold timers are measured in seconds and are too slow for data center environments where convergence requirements are tight. Enable BFD on the SPINES peer group and tune the intervals to what your hardware platform supports. Arista switches generally handle 300ms BFD intervals without issue, giving you sub-second failure detection on underlay links rather than waiting for the BGP hold timer to expire.
For monitoring, Arista's native support for gNMI and OpenConfig means you can stream fabric telemetry directly into your observability stack without relying on SNMP polling. VTEP tunnel state, VNI statistics, EVPN route counts, ARP suppression table sizes — all of it is available via streaming telemetry. This matters operationally because EVPN fabrics can exhibit asymmetric failure modes where some paths work and others don't, and polling-based monitoring often misses transient issues that telemetry streaming catches in real time.
CloudVision Portal pairs naturally with this architecture if you want centralized lifecycle management. It handles zero-touch provisioning of new leaves, enforces configuration consistency, and provides network-wide telemetry aggregation. That said, everything covered in this article is fully achievable without CVP. It's an operational force multiplier, not a protocol requirement. If you're running a smaller fabric or want to understand the underlying mechanics before introducing a management platform, you can build and operate a solid BGP EVPN VXLAN fabric with standard EOS CLI alone.
BGP EVPN VXLAN on Arista EOS is mature, well-documented at the protocol level, and operationally tractable once you've internalized the separation between underlay and overlay, L2VNI and L3VNI, control plane and data plane. Build the underlay correctly, validate VNI assignments rigorously, enable ARP suppression from day one, and the fabric will scale cleanly as you add racks without requiring architectural changes.
