InfraRunBook
    Back to articles

    Arista ZTP Provisioning Failing

    Arista
    Published: Apr 19, 2026
    Updated: Apr 19, 2026

    Diagnose and fix Arista ZTP provisioning failures with step-by-step coverage of DHCP option 67, unreachable scripts, syntax errors, interface state, and EOS version mismatches.

    Arista ZTP Provisioning Failing

    Symptoms

    You rack a brand-new Arista switch, connect the management port, power it on, and nothing happens. The switch boots into the EOS prompt with a factory-default configuration. Or maybe you see ZTP activity on the console but it never finishes — the switch polls DHCP, grabs an IP, then prints the same timeout messages on a loop. In other cases the switch downloads the script but immediately reboots, cycling through provisioning attempts that never land.

    Common symptoms include:

    • Console shows ZTP: Requesting DHCP lease on Management1 repeatedly with no forward progress
    • ZTP completes DHCP but then logs ZTP: Failed to download provisioning script
    • The script downloads cleanly but EOS reports ZTP: Script execution failed
    • The switch is reachable via SSH but sitting on a factory-default config instead of the intended provisioned state
    • CloudVision shows the device stuck in "Provisioning" state indefinitely or as unclaimed
    • Logs show ZTP is disabled even though you expected it to fire on first boot

    ZTP failures are frustrating because they're quiet in the wrong places. The switch doesn't scream — it just doesn't do what you expected. Let's go through every likely cause methodically.


    Root Cause 1: DHCP Option 67 Not Set

    Why It Happens

    Arista's ZTP process starts by sending a DHCP request on the management interface at first boot. The critical piece of that exchange is DHCP option 67 — the bootfile-name option. EOS uses option 67 to learn where to fetch the provisioning script. Without it, the switch gets an IP address but has absolutely no idea where to go next. It just stops. I've seen teams spend hours looking at the switch when the real problem is sitting in the DHCP server config the whole time.

    How to Identify It

    On the switch console, DHCP will succeed — the switch gets an IP — but the ZTP log immediately goes quiet or begins timing out on script retrieval. Check ZTP status directly:

    sw-infrarunbook-01# show zerotouch
    ZTP Status: Active
    ZTP Mode: Normal
    Last Action: DHCP lease obtained (192.168.10.50)
    Provisioning URL: None
    Script Download: Not attempted

    The Provisioning URL: None line is your smoking gun. The switch got its IP but received no option 67. On the DHCP server side — assuming ISC DHCP — verify the scope configuration:

    infrarunbook-admin@dhcp-srv:~$ cat /etc/dhcp/dhcpd.conf
    
    subnet 192.168.10.0 netmask 255.255.255.0 {
      range 192.168.10.50 192.168.10.100;
      option routers 192.168.10.1;
      option domain-name-servers 192.168.10.5;
      # option bootfile-name is missing entirely
    }

    You can also capture the DHCP exchange on the server or a span port. Look at the DHCPOFFER packet — option 67 will simply be absent from the response.

    How to Fix It

    Add option 67 to your DHCP scope pointing at your ZTP script. For HTTP delivery:

    subnet 192.168.10.0 netmask 255.255.255.0 {
      range 192.168.10.50 192.168.10.100;
      option routers 192.168.10.1;
      option domain-name-servers 192.168.10.5;
      option bootfile-name "http://192.168.10.10/ztp/provision.py";
    }

    After updating the config, restart the DHCP daemon and force a ZTP retry on the switch:

    infrarunbook-admin@dhcp-srv:~$ sudo systemctl restart isc-dhcp-server
    
    sw-infrarunbook-01# zerotouch cancel
    sw-infrarunbook-01# zerotouch run

    HTTP is generally more reliable than TFTP for script delivery. TFTP has block-size limitations and no native retry logic, so unless your environment mandates it, use HTTP. If you do use TFTP, the URL format is tftp://192.168.10.10/ztp/provision.py — the protocol prefix matters.


    Root Cause 2: ZTP Script Not Reachable

    Why It Happens

    The switch got option 67, knows where the script lives, but can't actually fetch it. This is one of the most common failure modes in real deployments. The URL looks correct in DHCP config, but something in the path — a firewall ACL, a wrong IP on the web server, a typo in the file path, an HTTP server not listening — breaks the download. In my experience, it almost always comes down to either a firewall rule that wasn't updated when the provisioning server moved, or a fat-finger in the file path.

    How to Identify It

    The ZTP log will show a download failure with an HTTP error code or a connection timeout:

    sw-infrarunbook-01# show zerotouch
    ZTP Status: Active
    ZTP Mode: Normal
    Last Action: Script download failed
    Provisioning URL: http://192.168.10.10/ztp/provision.py
    Error: HTTP 404 Not Found
    
    sw-infrarunbook-01# show zerotouch log
    Apr 19 10:14:22 ZTP: Attempting to retrieve http://192.168.10.10/ztp/provision.py
    Apr 19 10:14:23 ZTP: curl error: HTTP response code 404
    Apr 19 10:14:23 ZTP: Retrying in 30 seconds...

    From the switch, you can test connectivity to the provisioning server manually through the bash shell:

    sw-infrarunbook-01# bash curl -v http://192.168.10.10/ztp/provision.py
    * Trying 192.168.10.10...
    * Connected to 192.168.10.10 (192.168.10.10) port 80 (#0)
    > GET /ztp/provision.py HTTP/1.1
    > Host: 192.168.10.10
    > User-Agent: curl/7.68.0
    >
    < HTTP/1.1 404 Not Found
    < Content-Type: text/html

    Then check the web server document root and confirm the file actually exists at the expected path:

    infrarunbook-admin@ztp-srv:~$ ls -la /var/www/html/ztp/
    total 0
    drwxr-xr-x 2 www-data www-data  40 Apr 19 09:00 .
    drwxr-xr-x 4 www-data www-data 100 Apr 19 08:50 ..
    -rw-r--r-- 1 www-data www-data 2.1K Apr 19 09:00 provisioning.py
    # File is named provisioning.py, but option 67 points to provision.py

    How to Fix It

    Correct the filename mismatch — either rename the file or update the URL in DHCP option 67. Then verify the web server is listening and the file is readable:

    infrarunbook-admin@ztp-srv:~$ mv /var/www/html/ztp/provisioning.py /var/www/html/ztp/provision.py
    infraunbook-admin@ztp-srv:~$ chmod 644 /var/www/html/ztp/provision.py
    infraunbook-admin@ztp-srv:~$ curl -I http://localhost/ztp/provision.py
    HTTP/1.1 200 OK
    Content-Type: text/plain
    Content-Length: 2156

    If the issue is a firewall, make sure TCP port 80 (or 443 for HTTPS) is permitted from the management network to the ZTP server. Also confirm your management VRF routing is correct — if switches use a separate management VRF, your routes and firewall rules must account for that VRF context specifically.


    Root Cause 3: Script Syntax Error

    Why It Happens

    The script downloads successfully, EOS tries to execute it, and it crashes immediately. ZTP scripts on Arista are typically Python, and a syntax error causes the script to exit with a non-zero return code, which EOS interprets as a provisioning failure. This is particularly painful because the switch handled everything correctly up to execution — the failure is entirely in the script content. I've seen this happen repeatedly after last-minute edits to provisioning scripts, especially when someone edits directly on the server without testing first.

    How to Identify It

    The ZTP log will surface the Python traceback:

    sw-infrarunbook-01# show zerotouch log
    Apr 19 10:22:15 ZTP: Script downloaded successfully (2156 bytes)
    Apr 19 10:22:15 ZTP: Executing provisioning script...
    Apr 19 10:22:16 ZTP: Script execution failed (exit code 1)
    Apr 19 10:22:16 ZTP: Script output:
      File "/tmp/provision.py", line 47
        startup_config = """
                         ^
    SyntaxError: EOL while scanning string literal

    You can reproduce and verify this on the provisioning server before ever involving the switch:

    infrarunbook-admin@ztp-srv:~$ python3 -m py_compile /var/www/html/ztp/provision.py
      File "/var/www/html/ztp/provision.py", line 47
        startup_config = """
                         ^
    SyntaxError: EOL while scanning string literal

    How to Fix It

    Always lint ZTP scripts before deploying them. One-liner, no excuses:

    infrarunbook-admin@ztp-srv:~$ python3 -m py_compile /var/www/html/ztp/provision.py && echo "Syntax OK"
    Syntax OK
    infraunbook-admin@ztp-srv:~$ echo $?
    0

    In the example above, the triple-quoted string literal was never closed. Here's the before and after:

    # Before (broken) -- missing closing triple-quote:
    startup_config = """
    hostname sw-infrarunbook-01
    interface Management1
       ip address 192.168.10.50/24
    
    # After (fixed):
    startup_config = """
    hostname sw-infrarunbook-01
    interface Management1
       ip address 192.168.10.50/24
    """

    Beyond syntax, validate the script's logic too. A syntactically valid script can still fail — for example, if it writes to /mnt/flash/startup-config but the path resolution differs in the EOS version running on that device, or if it makes an eAPI call using a socket path that hasn't been enabled. Test scripts against a known-good device in a lab before rolling them to production. This isn't optional; it's the difference between a smooth rollout and a midnight firefight.


    Root Cause 4: Interface Not in ZTP Mode

    Why It Happens

    ZTP only runs automatically on a switch that has no startup configuration and hasn't had ZTP explicitly disabled. If someone previously connected to the switch and ran zerotouch cancel — maybe while troubleshooting an earlier failure — the switch won't run ZTP on the next boot. It'll sit at the default prompt waiting for manual intervention. This catches people off guard because the switch looks normal: it booted, it's responsive. But it has no config and ZTP isn't running.

    There's also a subtler version. The switch is in ZTP mode, but it's trying to run ZTP on the wrong interface. On multi-management-port platforms or when using out-of-band management, ZTP might bind to an interface that isn't physically connected or doesn't have DHCP service on that subnet.

    How to Identify It

    Check ZTP status directly:

    sw-infrarunbook-01# show zerotouch
    ZTP Status: Disabled
    ZTP Mode: N/A
    Reason: ZTP was cancelled by user

    Or check whether the ZTP disable marker file exists on flash:

    sw-infrarunbook-01# bash ls /mnt/flash/.ztp-disabled
    /mnt/flash/.ztp-disabled

    If that file is present, ZTP will not run regardless of configuration. For the wrong-interface issue, check which interface ZTP is actually using and compare against what's physically connected:

    sw-infrarunbook-01# show zerotouch
    ZTP Status: Active
    ZTP Mode: Normal
    Interface: Management0
    DHCP Status: No offer received
    
    sw-infrarunbook-01# show interfaces Management1
    Management1 is up, line protocol is up (connected)
      Hardware is DEC21140
      Internet address is unassigned
    
    # ZTP is polling Management0, but Management1 is the connected port

    How to Fix It

    To re-enable ZTP after it's been cancelled, simply run:

    sw-infrarunbook-01# zerotouch run
    ZTP: Starting ZTP process on Management1

    If the disable marker file exists and ZTP refuses to start, remove it manually and reload:

    sw-infrarunbook-01# bash rm /mnt/flash/.ztp-disabled
    sw-infrarunbook-01# reload

    For the wrong-interface issue, most Arista platforms default ZTP to Management1. Some DCS-7280 variants use Management0 as the primary out-of-band port. If your environment uses a non-default management port, verify against the platform datasheet and make sure your DHCP service is running on the correct network segment for that port.


    Root Cause 5: EOS Version Mismatch

    Why It Happens

    This one is subtle and takes longer to pin down. ZTP scripts often use eAPI calls, specific CLI commands, or configuration syntax that changed between EOS releases. If the switch boots with an older EOS image than what the script was written against, the script may fail when it calls an eAPI method that doesn't exist yet, or tries to apply configuration syntax that isn't recognized in that version. The reverse also happens — if the script assumes legacy behavior that was removed in a newer EOS release, you'll see failures on newer hardware.

    In my experience, this manifests most visibly when a ZTP script is written and tested against a specific EOS train, then deployed against hardware that shipped from the factory with a different version — sometimes months older. The script hits a command that wasn't added until 4.27.x, the switch is running 4.25.x, and the provisioning fails with a cryptic eAPI error that doesn't obviously point to a version problem.

    How to Identify It

    First, check what version of EOS is running on the switch:

    sw-infrarunbook-01# show version
    Arista DCS-7050CX3-32S
    Hardware version: 11.00
    Serial number: JPE12345678
    System MAC address: 00:1c:73:ab:cd:ef
    
    Software image version: 4.26.2F
    Architecture: x86_64
    Internal build version: 4.26.2F-12345678.4262F
    Internal build ID: abc12345-1234-5678-abcd-abc123456789
    
    Uptime: 0 weeks, 0 days, 0 hours and 3 minutes
    Total memory: 8167932 kB
    Free memory: 5921032 kB

    Then look at what commands your provisioning script invokes. If the script calls a CLI command or eAPI endpoint that was introduced in 4.27.x and the switch is running 4.26.2F, it'll fail. The ZTP log will show the specific error:

    sw-infrarunbook-01# show zerotouch log
    Apr 19 10:35:44 ZTP: Script downloaded successfully (3204 bytes)
    Apr 19 10:35:44 ZTP: Executing provisioning script...
    Apr 19 10:35:46 ZTP: Script execution failed (exit code 1)
    Apr 19 10:35:46 ZTP: Script output:
    Traceback (most recent call last):
      File "/tmp/provision.py", line 112, in <module>
        client.runCmds(1, ['show platform sand counters'])
      File "/usr/lib/python3/dist-packages/jsonrpclib/jsonrpc.py", line 288, in __call__
        return self.__send(self.__name, args)
    jsonrpclib.jsonrpc.AppError: CLI command 2 of 2 'show platform sand counters'
    failed: invalid command

    That invalid command error against a command you know is valid on newer EOS is the tell. Cross-reference the command against the EOS release notes to confirm when it was introduced.

    How to Fix It

    There are two approaches and you often need both. First, write your ZTP script to check the running EOS version at the top and branch logic accordingly. Second, if the script is supposed to upgrade EOS as part of provisioning, make sure that upgrade step completes and the script re-evaluates after reboot before running version-specific commands.

    Here's a practical version-check pattern for your ZTP script:

    import jsonrpclib
    import sys
    
    client = jsonrpclib.Server('unix:/var/run/command-api.sock')
    result = client.runCmds(1, ['show version'])
    eos_version = result[0]['version']
    
    REQUIRED_VERSION = '4.28.3M'
    
    if eos_version != REQUIRED_VERSION:
        print(f'EOS version mismatch: running {eos_version}, need {REQUIRED_VERSION}')
        client.runCmds(1, [
            'copy http://192.168.10.10/eos/EOS-4.28.3M.swi flash:EOS-4.28.3M.swi',
        ])
        client.runCmds(1, [
            'install source flash:EOS-4.28.3M.swi now'
        ])
        # Switch will reboot; ZTP re-runs automatically on next boot
        sys.exit(0)

    Also verify image-to-hardware compatibility before deploying. Arista publishes release notes that list supported platforms for each image. The filename conventions matter:

    infrarunbook-admin@ztp-srv:~$ ls /var/www/html/eos/
    EOS-4.28.3M.swi          # Universal image, broad hardware support
    EOS64-4.28.3M.swi        # 64-bit platforms only
    EOS-4.28.3M-INT.swi      # International regulatory variant

    Pushing EOS64-4.28.3M.swi to a platform that requires the universal image will result in a boot failure. Don't assume the filename is interchangeable — verify it against the target platform's hardware documentation before staging.


    Root Cause 6: TFTP Block Size and Transfer Timeouts

    Why It Happens

    Some environments use TFTP instead of HTTP for ZTP script delivery. TFTP's default block size of 512 bytes causes real problems with larger provisioning scripts — transfers are slow and prone to timeout on congested or lossy management networks. Worse, some TFTP server implementations don't support block size negotiation (RFC 2348), so the switch and server end up stuck: transfers start but never complete, and the switch keeps retrying.

    How to Identify It

    sw-infrarunbook-01# show zerotouch log
    Apr 19 11:02:10 ZTP: Attempting to retrieve tftp://192.168.10.10/ztp/provision.py
    Apr 19 11:02:40 ZTP: TFTP transfer timed out after 30 seconds
    Apr 19 11:02:40 ZTP: Retrying in 60 seconds...

    On the server, check whether block size negotiation is configured:

    infrarunbook-admin@ztp-srv:~$ grep -i blksize /etc/default/tftpd-hpa
    # Empty result -- blksize option not configured

    How to Fix It

    The cleanest fix is switching from TFTP to HTTP. Update DHCP option 67 to use an HTTP URL. HTTP handles larger files reliably, gives you proper error codes, and is far easier to debug. If you must keep TFTP, enable block size negotiation and increase the block size to reduce transfer overhead:

    infrarunbook-admin@ztp-srv:~$ cat /etc/default/tftpd-hpa
    TFTP_USERNAME="tftp"
    TFTP_DIRECTORY="/var/lib/tftpboot"
    TFTP_ADDRESS="0.0.0.0:69"
    TFTP_OPTIONS="--secure --blocksize 1468"
    
    infraunbook-admin@ztp-srv:~$ sudo systemctl restart tftpd-hpa

    The 1468-byte block size aligns with standard Ethernet MTU minus IP and UDP headers, which keeps frames from fragmenting across a typical management network.


    Root Cause 7: Management VRF Routing Not Configured

    Why It Happens

    On Arista platforms with a dedicated management VRF — which is the default on most production deployments — ZTP runs in the management VRF context. If the DHCP server or ZTP provisioning server isn't reachable from within that VRF, everything fails silently. This is common when the management network was recently reconfigured, when the provisioning server is on a different subnet without a proper route in the management VRF, or when someone applied a partial startup config that modified VRF routing before ZTP finished.

    How to Identify It

    sw-infrarunbook-01# bash ping -I Management1 192.168.10.10
    PING 192.168.10.10 (192.168.10.10) from Management1: 56 data bytes
    ping: sendmsg: Network is unreachable
    
    sw-infrarunbook-01# show ip route vrf MGMT
    VRF: MGMT
    Gateway of last resort is not set
    
    # No routes -- switch can't reach anything outside its directly connected subnet

    How to Fix It

    Add a default route to the management VRF. Either set it statically during recovery or ensure DHCP is correctly sending option 3 (default gateway) in the offer:

    sw-infrarunbook-01(config)# vrf instance MGMT
    sw-infrarunbook-01(config-vrf-MGMT)# exit
    sw-infrarunbook-01(config)# ip route vrf MGMT 0.0.0.0/0 192.168.10.1
    
    sw-infrarunbook-01# zerotouch run

    Long-term, ensure your DHCP scope includes option 3 alongside option 67 so that fresh switches always receive a default gateway as part of their initial lease.


    Prevention

    Most ZTP failures are repeatable and preventable. The pattern is almost always the same: someone changes one thing in isolation — a DHCP server, a script file, a web server config — and doesn't test the end-to-end flow before racking hardware. Build a workflow that validates each component before the switch ever arrives.

    Test your DHCP scope before you rack the switch. Use a test machine on the same management VLAN to request a lease and confirm option 67 is present in the response:

    infrarunbook-admin@test-host:~$ sudo dhclient -v eth0 2>&1 | grep -i "filename\|bootfile"
    DHCPOFFER of 192.168.10.51 from 192.168.10.5
    option bootfile-name: http://192.168.10.10/ztp/provision.py

    Always lint ZTP scripts before deploying. This belongs in your CI pipeline or at minimum in your deployment checklist. A script that hasn't been through python3 -m py_compile has no business being on the provisioning server.

    Version-gate your scripts. Every ZTP script should check the running EOS version at the top and fail fast with an informative message if there's a mismatch — rather than crashing halfway through configuration in ways that are hard to debug from a console.

    Monitor your HTTP server availability continuously. A dead web server silently blocks every new switch deployment. Add a simple health check to your monitoring platform that alerts on HTTP 5xx responses or connection failures to the ZTP server URL. You don't want to discover the provisioning server is down when you're standing next to a rack at 11pm.

    Document every ZTP cancellation. If someone runs zerotouch cancel on a switch during initial troubleshooting, that needs to go into your IPAM or CMDB immediately. A switch sitting in cancelled-ZTP state with no startup config is a trap for the next engineer who touches it. A one-line note saves a lot of confusion.

    Use CloudVision ZTP where your licensing allows it. CVP's ZTP workflow gives you real-time visibility into provisioning state, error categorization, and retry history through a proper dashboard rather than raw console logs. The time-to-diagnosis on a failed provisioning event drops significantly when you can see the full device state in one place rather than hunting through per-switch log output.

    Validate image-to-hardware compatibility before staging images. Cross-reference the EOS image filename against Arista's hardware compatibility matrix before pointing your provisioning scripts at it. A two-minute check prevents a full re-imaging cycle when ZTP installs an incompatible image and the switch fails to boot cleanly.

    Frequently Asked Questions

    How do I check if ZTP is active on an Arista switch?

    Run 'show zerotouch' from the EOS CLI. This displays ZTP status (Active or Disabled), the provisioning URL received via DHCP option 67, the last action taken, and any error encountered. For detailed event history, use 'show zerotouch log'.

    Why does my Arista switch keep rebooting during ZTP?

    A reboot loop during ZTP usually means the provisioning script is issuing a reload command after applying configuration, but ZTP is re-triggering because the startup-config wasn't written before the reload. Ensure your ZTP script explicitly writes the startup config to /mnt/flash/startup-config before reloading, or uses the 'copy running-config startup-config' eAPI call.

    What DHCP options are required for Arista ZTP to work?

    At minimum, DHCP option 67 (bootfile-name) is required, pointing to the provisioning script URL. Option 3 (default gateway) is also essential if the provisioning server is not on the same subnet as the management interface. Without a default gateway, the switch can't reach the script server even after receiving the URL.

    Can I use HTTPS instead of HTTP for ZTP script delivery on Arista?

    Yes, Arista EOS supports HTTPS for ZTP script retrieval. Set option 67 to an https:// URL. Be aware that the switch will validate the server certificate against its trusted CA bundle. If you're using a private CA, you'll need to pre-install the CA certificate on the switch, which typically requires an initial manual step or a two-stage provisioning approach.

    How do I re-run ZTP on an Arista switch after it was cancelled?

    Run 'zerotouch run' from the EOS CLI. If ZTP was disabled by the presence of the .ztp-disabled marker file on flash, remove it first with 'bash rm /mnt/flash/.ztp-disabled', then reload the switch. Note that ZTP will only run if no startup-config is present — if a startup-config exists, you'll need to remove it as well before ZTP will trigger.

    Related Articles