InfraRunBook
    Back to articles

    Arista BGP EVPN VXLAN Leaf-Spine Architecture

    Arista
    Published: Apr 8, 2026
    Updated: Apr 8, 2026

    A senior engineer's guide to building a BGP EVPN VXLAN leaf-spine fabric on Arista EOS, covering underlay design, overlay control plane, symmetric IRB, ARP suppression, and production-ready configuration.

    Arista BGP EVPN VXLAN Leaf-Spine Architecture

    The Architecture at a Glance

    BGP EVPN with VXLAN on a leaf-spine fabric is the modern standard for scalable data center networking. If you're still running STP-heavy Layer 2 domains stretched across every switch in your stack, you're fighting a losing battle as your environment grows. BGP EVPN VXLAN solves that problem cleanly — it gives you Layer 2 extension across a fully routed underlay without the operational misery of legacy approaches like OTV or VPLS.

    The architecture divides into two distinct planes. The underlay is a pure Layer 3 routed fabric, typically built with eBGP between spines and leaves. The overlay is where BGP EVPN runs, advertising MAC and IP reachability so each VTEP (VXLAN Tunnel Endpoint) knows where remote hosts live. VXLAN handles the actual encapsulation in the data plane — wrapping Layer 2 frames in UDP so they traverse the routed underlay as if it were just a transport pipe.

    Arista EOS handles this architecture well. The platform has had mature, first-class EVPN support for years, and the configuration model is clean once you internalize the separation between underlay and overlay. Let's build it from the ground up.

    Building the Underlay — Getting the Basics Right

    The underlay is your foundation, and getting it wrong means you'll chase ghost VXLAN issues for hours before discovering the real culprit was a flapping BGP session. In a standard leaf-spine deployment, each leaf peers eBGP with both spines. Spines don't peer with each other — traffic flows leaf-spine-leaf, full stop. This keeps the topology simple and gives you deterministic ECMP behavior without the complexity of multi-tier designs.

    ASN assignment matters more than people realize. A common approach is assigning each spine a shared ASN — say 65000 — and giving each leaf a unique ASN starting at 65001. Some teams use fully unique ASNs everywhere. In my experience, the shared spine ASN with unique leaf ASNs works cleanly in most deployments, but you need to be deliberate about it if you ever plan to connect two fabrics. BGP's AS-path loop prevention will bite you if you're not careful with ASN reuse across domains.

    Loopback interfaces carry the VTEP addresses. Each leaf gets a loopback that serves as its VXLAN source interface, and those loopbacks must be reachable from every other leaf — that's the entire job of the underlay. Advertise them into the underlay BGP process and set your

    maximum-paths
    appropriately on both spines and leaves to take full advantage of ECMP.

    ! sw-infrarunbook-01 — Leaf-01 underlay configuration
    ! ASN 65001, peering with Spine-01 and Spine-02 (both ASN 65000)
    
    router bgp 65001
       router-id 10.0.1.1
       maximum-paths 4 ecmp 4
       neighbor SPINES peer group
       neighbor SPINES remote-as 65000
       neighbor SPINES send-community extended
       neighbor SPINES maximum-routes 12000
       neighbor 10.1.0.0 peer-group SPINES
       neighbor 10.1.0.2 peer-group SPINES
       !
       address-family ipv4
          neighbor SPINES activate
          network 10.0.1.1/32

    That

    send-community extended
    line is non-negotiable. EVPN relies heavily on BGP extended communities, and if you omit it, your overlay sessions will establish but carry nothing useful. I've seen this exact omission waste an afternoon of troubleshooting — the session shows as Established, the routes appear to be exchanged, but the EVPN table stays empty. Always include it on every BGP peer group that participates in EVPN.

    VXLAN Data Plane — The Encapsulation Side

    VXLAN is fundamentally UDP encapsulation with a 24-bit VNI header. When a frame arrives at a leaf from a locally attached server, the leaf looks up the destination MAC, finds which remote VTEP owns it, wraps the original Ethernet frame in a VXLAN header (UDP port 4789 by default), adds an outer IP header using its local loopback as source and the remote VTEP loopback as destination, and ships it across the underlay. The receiving VTEP strips the encapsulation and delivers the original frame locally. To the end hosts, none of this is visible.

    The VNI is your Layer 2 segment identifier — think VLAN ID but with a 16 million address space. Each VLAN maps to a Layer 2 VNI (L2VNI). When inter-VNI routing is involved, you also need Layer 3 VNIs (L3VNI) tied to a VRF. The distinction between L2VNI and L3VNI trips up a lot of people who are new to EVPN — keep them in separate, non-overlapping ranges so the purpose is obvious from the number alone.

    ! VXLAN interface configuration on sw-infrarunbook-01
    
    interface Vxlan1
       vxlan source-interface Loopback0
       vxlan udp-port 4789
       vxlan vlan 10 vni 10010
       vxlan vlan 20 vni 10020
       vxlan vlan 30 vni 10030
       vxlan vrf TENANT-A vni 50001
       vxlan vrf TENANT-B vni 50002
       vxlan arp proxy

    VNI consistency across all leaves is mandatory and entirely your responsibility. VLAN 10 on leaf-01 must map to VNI 10010, and VLAN 10 on leaf-02 must also map to VNI 10010. There's no protocol mechanism that detects mismatches. Two leaves can have inconsistent VNI mappings, maintain healthy BGP sessions, and silently fail to forward traffic between hosts that should share a segment. Make VNI consistency validation a hard requirement in your provisioning workflow.

    One more thing on the data plane before we move on: MTU. VXLAN adds 50 bytes of overhead to every frame. If your underlay interfaces are running at the default 1500-byte MTU, you'll silently drop or fragment large frames. Set your underlay interface MTU to at least 1574, and ideally 9214 if your hardware and cabling support jumbo frames end-to-end. This is one of those things that works fine in a lab with small test packets and only surfaces under real workload.

    BGP EVPN Control Plane — Where the Magic Happens

    BGP EVPN is the control plane that gives every VTEP a complete picture of MAC and IP reachability across the fabric — without flooding. It uses BGP's multi-protocol extensions to carry this information in the L2VPN EVPN address family. If you already understand BGP, EVPN isn't a new protocol to learn; it's a new set of NLRI types being carried by a protocol you already know.

    The overlay BGP sessions run between loopbacks, on top of the underlay. In most leaf-spine deployments you'll use eBGP for the overlay as well, with spines acting as route reflectors. Each leaf establishes an EVPN session to both spines. The spines reflect EVPN routes between leaves without needing to maintain a full mesh. Adding a new leaf only requires configuration on the new leaf itself — if you use

    bgp listen range
    on the spines, the spine configuration doesn't change at all.

    ! BGP EVPN overlay on sw-infrarunbook-01 (Leaf-01)
    ! Sessions go to spine loopbacks, not directly connected IPs
    
    router bgp 65001
       neighbor EVPN-SPINES peer group
       neighbor EVPN-SPINES remote-as 65000
       neighbor EVPN-SPINES update-source Loopback0
       neighbor EVPN-SPINES ebgp-multihop 3
       neighbor EVPN-SPINES send-community extended
       neighbor 10.0.0.1 peer-group EVPN-SPINES
       neighbor 10.0.0.2 peer-group EVPN-SPINES
       !
       address-family evpn
          neighbor EVPN-SPINES activate
       !
       address-family ipv4
          no neighbor EVPN-SPINES activate
    ! BGP EVPN overlay on Spine-01 — route reflector role
    ! Dynamic listen range allows leaves to join without spine config changes
    
    router bgp 65000
       bgp listen range 10.0.1.0/24 peer-group EVPN-LEAVES remote-as 65001
       neighbor EVPN-LEAVES peer group
       neighbor EVPN-LEAVES update-source Loopback0
       neighbor EVPN-LEAVES route-reflector-client
       neighbor EVPN-LEAVES send-community extended
       !
       address-family evpn
          neighbor EVPN-LEAVES activate

    The

    ebgp-multihop 3
    on the leaf is required because the EVPN sessions source from loopbacks that aren't directly connected — the session travels through the physical eBGP hop first. Without it, the session won't establish. Also note that the EVPN peer group is explicitly deactivated in the IPv4 address family. You want those sessions carrying only EVPN NLRI, not mixing in IPv4 unicast routes from the underlay.

    EVPN Route Types You Actually Need to Know

    EVPN defines five route types, but in a standard leaf-spine fabric you'll work with three of them regularly. Type 2 (MAC/IP Advertisement) is the workhorse. When a host comes online, the local leaf learns its MAC via normal data plane learning and its IP via ARP snooping, then generates a Type 2 route and advertises it to the route reflectors. Every other VTEP in the fabric now knows exactly where that host's MAC and IP live without ever flooding a single frame.

    Type 3 (Inclusive Multicast Ethernet Tag) handles BUM traffic — Broadcast, Unknown unicast, and Multicast. A Type 3 route effectively says "I'm a VTEP participating in this VNI, include me in your replication list." Most deployments use ingress replication: the sending VTEP makes a unicast copy of BUM frames for each remote VTEP that has advertised a Type 3 route for that VNI. It's not the most bandwidth-efficient approach at very large scale, but it's operationally simple and works well for the vast majority of enterprise data center deployments.

    Type 5 (IP Prefix Route) is used for advertising external prefixes into the EVPN fabric. When a border leaf connects to a WAN edge or external firewall, it imports those prefixes and re-advertises them as Type 5 routes across the overlay. Every leaf in the fabric can then reach external destinations by sending traffic to the border leaf's VTEP. This is the integration point between your EVPN fabric and the rest of the world.

    Symmetric IRB — The Right Way to Route Between VNIs

    Integrated Routing and Bridging (IRB) is how inter-VLAN routing works within the EVPN fabric. There are two models. Asymmetric IRB has the ingress leaf do all the routing — it needs every VLAN configured locally even if no hosts in that VLAN are attached to it. That approach doesn't scale. Symmetric IRB is what you should deploy.

    In symmetric IRB, the ingress leaf routes the packet from the source VLAN's SVI into the Layer 3 VNI associated with the tenant VRF. The packet traverses the underlay encapsulated with the L3VNI. The egress leaf receives it, looks up the destination IP in the VRF routing table, and bridges it out the appropriate local VLAN to the destination host. Both ingress and egress leaves perform a routing lookup — hence symmetric. Each leaf only needs the VLANs it actually serves, plus the L3VNI for the VRF.

    ! VRF and SVI configuration for TENANT-A on sw-infrarunbook-01
    
    vrf instance TENANT-A
    !
    ip routing vrf TENANT-A
    !
    interface Vlan10
       vrf TENANT-A
       ip address virtual 10.10.10.1/24
    !
    interface Vlan20
       vrf TENANT-A
       ip address virtual 10.10.20.1/24
    !
    router bgp 65001
       vrf TENANT-A
          rd 10.0.1.1:50001
          route-target import evpn 65000:50001
          route-target export evpn 65000:50001
          redistribute connected

    The

    ip address virtual
    keyword is specific to Arista EOS and implements the distributed anycast gateway model. Every leaf uses the same virtual MAC and the same gateway IP for a given SVI. A host in VLAN 10 always uses 10.10.10.1 as its default gateway, regardless of which leaf it's physically attached to. When that host migrates — live VM migration, for example — it doesn't need to re-ARP because the gateway MAC is identical on the new leaf. This is one of those features that sounds like a small detail but matters enormously in practice.

    The route-target values tie the VRF together across all leaves. Every leaf importing and exporting the same route-target for TENANT-A will share routes within that VRF. Hosts in TENANT-A on leaf-01 can reach hosts in TENANT-A on leaf-04 because both leaves export their connected routes with the same route-target, and both import routes carrying that same community. TENANT-B stays isolated because its route-targets don't overlap.

    ARP Suppression and Why It Saves You at Scale

    Without ARP suppression, every ARP request floods across the VXLAN fabric to all VTEPs participating in the VNI. At a few hundred hosts that's merely annoying. At tens of thousands of hosts it becomes a genuine performance problem and a source of instability. ARP suppression lets the local VTEP answer ARP requests on behalf of remote hosts by consulting the MAC/IP table that BGP EVPN has already built — no flooding required.

    When a host ARPs for 10.10.10.50, instead of replicating that request to every VTEP in the VNI, the local leaf checks its EVPN-derived ARP table. If it has a Type 2 route for that IP with the associated MAC, it generates a proxy ARP reply locally. The ARP request never enters a VXLAN tunnel. BUM traffic volume drops significantly as host counts grow, and you avoid the burst storms that can occur when large numbers of hosts ARP simultaneously after a network event.

    ! ARP suppression on Arista EOS — single line, global across all VNIs
    
    interface Vxlan1
       vxlan arp proxy
    
    ! Verify ARP suppression table is populated
    show vxlan address-table
    show arp vrf TENANT-A

    Complete Reference Configuration

    Here's a full leaf configuration that ties everything together — two tenants, symmetric IRB, ARP suppression, and both underlay and overlay BGP in a single coherent block. This is close to what you'd deploy in a real environment at solvethenetwork.com.

    ! Full configuration: sw-infrarunbook-01
    ! Role: Leaf-01 | ASN 65001 | Loopback: 10.0.1.1/32
    
    hostname sw-infrarunbook-01
    !
    ip routing
    !
    vrf instance TENANT-A
    vrf instance TENANT-B
    !
    ip routing vrf TENANT-A
    ip routing vrf TENANT-B
    !
    interface Loopback0
       ip address 10.0.1.1/32
    !
    interface Ethernet1
       description Uplink to Spine-01
       mtu 9214
       no switchport
       ip address 10.1.0.1/31
    !
    interface Ethernet2
       description Uplink to Spine-02
       mtu 9214
       no switchport
       ip address 10.1.0.3/31
    !
    interface Ethernet3
       description Server — VLAN 10 access
       switchport access vlan 10
    !
    interface Vlan10
       vrf TENANT-A
       ip address virtual 10.10.10.1/24
    !
    interface Vlan20
       vrf TENANT-A
       ip address virtual 10.10.20.1/24
    !
    interface Vlan30
       vrf TENANT-B
       ip address virtual 10.30.30.1/24
    !
    interface Vxlan1
       vxlan source-interface Loopback0
       vxlan udp-port 4789
       vxlan vlan 10 vni 10010
       vxlan vlan 20 vni 10020
       vxlan vlan 30 vni 10030
       vxlan vrf TENANT-A vni 50001
       vxlan vrf TENANT-B vni 50002
       vxlan arp proxy
    !
    router bgp 65001
       router-id 10.0.1.1
       maximum-paths 4 ecmp 4
       !
       neighbor SPINES peer group
       neighbor SPINES remote-as 65000
       neighbor SPINES send-community extended
       neighbor SPINES maximum-routes 12000
       neighbor 10.1.0.0 peer-group SPINES
       neighbor 10.1.0.2 peer-group SPINES
       !
       neighbor EVPN-SPINES peer group
       neighbor EVPN-SPINES remote-as 65000
       neighbor EVPN-SPINES update-source Loopback0
       neighbor EVPN-SPINES ebgp-multihop 3
       neighbor EVPN-SPINES send-community extended
       neighbor 10.0.0.1 peer-group EVPN-SPINES
       neighbor 10.0.0.2 peer-group EVPN-SPINES
       !
       address-family evpn
          neighbor EVPN-SPINES activate
       !
       address-family ipv4
          neighbor SPINES activate
          no neighbor EVPN-SPINES activate
          network 10.0.1.1/32
       !
       vrf TENANT-A
          rd 10.0.1.1:50001
          route-target import evpn 65000:50001
          route-target export evpn 65000:50001
          redistribute connected
       !
       vrf TENANT-B
          rd 10.0.1.1:50002
          route-target import evpn 65000:50002
          route-target export evpn 65000:50002
          redistribute connected

    Once this is deployed, verify the fabric is functioning end to end:

    ! Confirm EVPN overlay sessions are established
    show bgp evpn summary
    
    ! Inspect MAC/IP routes (Type 2) — should show remote host MACs and IPs
    show bgp evpn route-type mac-ip
    
    ! Inspect multicast routes (Type 3) — should show all remote VTEPs per VNI
    show bgp evpn route-type imet
    
    ! Verify active VXLAN tunnels to remote VTEPs
    show vxlan vtep
    
    ! Check VNI state and VLAN mappings
    show vxlan vni
    
    ! Confirm ARP suppression table is populated with remote host entries
    show vxlan address-table

    Common Misconceptions That Will Burn You

    The most persistent misconception I encounter is that EVPN is a complicated protocol. It's not. EVPN is BGP carrying MAC and IP information in addition to IP prefixes. If you understand how BGP advertises routes and how route-targets control import/export policy, you already understand the core of EVPN. The perceived complexity usually comes from the configuration model, not the protocol logic itself. Approach it incrementally — get the underlay working, then light up the overlay, then verify Type 2 routes before worrying about symmetric IRB.

    The second misconception is assuming VNI consistency is enforced somewhere. It isn't. Your provisioning system — whether CloudVision, Ansible, Terraform, or a configuration template — must enforce consistent VLAN-to-VNI mappings across all leaves. Two leaves can have mismatched VNI assignments, maintain perfectly healthy BGP sessions with full route tables, and silently fail to forward traffic between hosts in the same logical segment. The control plane looks healthy. The data plane is broken. This is one of the harder failure modes to diagnose because the usual commands all return positive results.

    Third: neglecting the underlay. In my experience, a meaningful fraction of EVPN troubleshooting sessions turn out to be underlay problems in disguise. An unstable BGP session, an MTU mismatch on a spine uplink, or suboptimal ECMP hashing causing hot paths — all of these manifest as intermittent overlay connectivity. Always rule out underlay issues first. Run

    show bgp summary
    , check for any sessions that aren't Established, verify MTU consistency, and confirm ECMP is actually hashing across all available paths before diving into EVPN specifics.

    Fourth: conflating the anycast gateway virtual MAC with the physical interface MAC. The virtual MAC used by

    ip address virtual
    is the same across all leaves in the fabric — that's the whole point. But during migrations from non-EVPN environments, hosts may have cached ARP entries pointing to a previous physical gateway MAC. Those entries will cause blackholing until they expire or are cleared. If you're migrating hosts into a new EVPN fabric, account for ARP cache lifetime in your maintenance window planning.

    Operational Considerations

    BFD (Bidirectional Forwarding Detection) belongs on your underlay BGP sessions. Default BGP hold timers are measured in seconds and are too slow for data center environments where convergence requirements are tight. Enable BFD on the SPINES peer group and tune the intervals to what your hardware platform supports. Arista switches generally handle 300ms BFD intervals without issue, giving you sub-second failure detection on underlay links rather than waiting for the BGP hold timer to expire.

    For monitoring, Arista's native support for gNMI and OpenConfig means you can stream fabric telemetry directly into your observability stack without relying on SNMP polling. VTEP tunnel state, VNI statistics, EVPN route counts, ARP suppression table sizes — all of it is available via streaming telemetry. This matters operationally because EVPN fabrics can exhibit asymmetric failure modes where some paths work and others don't, and polling-based monitoring often misses transient issues that telemetry streaming catches in real time.

    CloudVision Portal pairs naturally with this architecture if you want centralized lifecycle management. It handles zero-touch provisioning of new leaves, enforces configuration consistency, and provides network-wide telemetry aggregation. That said, everything covered in this article is fully achievable without CVP. It's an operational force multiplier, not a protocol requirement. If you're running a smaller fabric or want to understand the underlying mechanics before introducing a management platform, you can build and operate a solid BGP EVPN VXLAN fabric with standard EOS CLI alone.

    BGP EVPN VXLAN on Arista EOS is mature, well-documented at the protocol level, and operationally tractable once you've internalized the separation between underlay and overlay, L2VNI and L3VNI, control plane and data plane. Build the underlay correctly, validate VNI assignments rigorously, enable ARP suppression from day one, and the fabric will scale cleanly as you add racks without requiring architectural changes.

    Frequently Asked Questions

    What is the difference between L2VNI and L3VNI in an Arista EVPN VXLAN fabric?

    An L2VNI (Layer 2 VNI) maps to a specific VLAN and represents a bridging domain — it's how hosts in the same segment find each other across the fabric. An L3VNI (Layer 3 VNI) maps to a VRF and is used during symmetric IRB to carry routed packets between different VLANs across VXLAN tunnels. Every tenant VRF needs its own L3VNI, and it must be consistent across all leaves participating in that VRF.

    Why does Arista use 'ip address virtual' instead of a standard IP address on SVIs?

    The 'ip address virtual' keyword implements the distributed anycast gateway model. Every leaf in the fabric uses the same virtual MAC and IP for a given SVI gateway. This means a host always uses the same gateway MAC regardless of which leaf it's connected to, and migrating hosts don't need to re-ARP after moving between leaves. A standard IP address would be unique per switch and would break this model.

    Do the spines need to be VXLAN-capable in a BGP EVPN leaf-spine fabric?

    No. In the standard model, spines operate purely in the underlay — they route IP traffic and act as BGP EVPN route reflectors, but they don't terminate VXLAN tunnels. All VTEP functionality lives on the leaves. This means you can use spine hardware that doesn't support VXLAN encapsulation, which is often more cost-effective at scale since spines only need high-speed IP forwarding and BGP route-reflection capability.

    What causes ARP suppression to fail in an Arista VXLAN EVPN deployment?

    ARP suppression depends on BGP EVPN Type 2 routes (MAC/IP advertisements) being present in the local VTEP's table. If Type 2 routes aren't being generated — due to missing 'send-community extended' on BGP peers, incorrect route-target configuration, or EVPN sessions that aren't activated in the EVPN address family — the ARP suppression table stays empty and the switch falls back to flooding ARP requests across the fabric.

    How do you verify that EVPN route exchange is working correctly on Arista EOS?

    Start with 'show bgp evpn summary' to confirm sessions are established. Then use 'show bgp evpn route-type mac-ip' to verify Type 2 routes from remote VTEPs are present, and 'show bgp evpn route-type imet' to confirm Type 3 routes are being received. Finally, 'show vxlan vtep' shows active tunnel endpoints, and 'show vxlan address-table' confirms the ARP suppression table is populated with remote host MAC/IP pairs.

    Related Articles