Arista ZTP Provisioning Failing

Symptoms

You rack a brand-new Arista switch, connect the management port, power it on, and nothing happens. The switch boots into the EOS prompt with a factory-default configuration. Or maybe you see ZTP activity on the console but it never finishes — the switch polls DHCP, grabs an IP, then prints the same timeout messages on a loop. In other cases the switch downloads the script but immediately reboots, cycling through provisioning attempts that never land.

Common symptoms include:

Console shows ZTP: Requesting DHCP lease on Management1 repeatedly with no forward progress
ZTP completes DHCP but then logs ZTP: Failed to download provisioning script
The script downloads cleanly but EOS reports ZTP: Script execution failed
The switch is reachable via SSH but sitting on a factory-default config instead of the intended provisioned state
CloudVision shows the device stuck in "Provisioning" state indefinitely or as unclaimed
Logs show ZTP is disabled even though you expected it to fire on first boot

ZTP failures are frustrating because they're quiet in the wrong places. The switch doesn't scream — it just doesn't do what you expected. Let's go through every likely cause methodically.

Root Cause 1: DHCP Option 67 Not Set

Why It Happens

Arista's ZTP process starts by sending a DHCP request on the management interface at first boot. The critical piece of that exchange is DHCP option 67 — the bootfile-name option. EOS uses option 67 to learn where to fetch the provisioning script. Without it, the switch gets an IP address but has absolutely no idea where to go next. It just stops. I've seen teams spend hours looking at the switch when the real problem is sitting in the DHCP server config the whole time.

How to Identify It

On the switch console, DHCP will succeed — the switch gets an IP — but the ZTP log immediately goes quiet or begins timing out on script retrieval. Check ZTP status directly:

sw-infrarunbook-01# show zerotouch
ZTP Status: Active
ZTP Mode: Normal
Last Action: DHCP lease obtained (192.168.10.50)
Provisioning URL: None
Script Download: Not attempted

The Provisioning URL: None line is your smoking gun. The switch got its IP but received no option 67. On the DHCP server side — assuming ISC DHCP — verify the scope configuration:

infrarunbook-admin@dhcp-srv:~$ cat /etc/dhcp/dhcpd.conf

subnet 192.168.10.0 netmask 255.255.255.0 {
  range 192.168.10.50 192.168.10.100;
  option routers 192.168.10.1;
  option domain-name-servers 192.168.10.5;
  # option bootfile-name is missing entirely
}

You can also capture the DHCP exchange on the server or a span port. Look at the DHCPOFFER packet — option 67 will simply be absent from the response.

How to Fix It

Add option 67 to your DHCP scope pointing at your ZTP script. For HTTP delivery:

subnet 192.168.10.0 netmask 255.255.255.0 {
  range 192.168.10.50 192.168.10.100;
  option routers 192.168.10.1;
  option domain-name-servers 192.168.10.5;
  option bootfile-name "http://192.168.10.10/ztp/provision.py";
}

After updating the config, restart the DHCP daemon and force a ZTP retry on the switch:

infrarunbook-admin@dhcp-srv:~$ sudo systemctl restart isc-dhcp-server

sw-infrarunbook-01# zerotouch cancel
sw-infrarunbook-01# zerotouch run

HTTP is generally more reliable than TFTP for script delivery. TFTP has block-size limitations and no native retry logic, so unless your environment mandates it, use HTTP. If you do use TFTP, the URL format is tftp://192.168.10.10/ztp/provision.py — the protocol prefix matters.

Root Cause 2: ZTP Script Not Reachable

Why It Happens

The switch got option 67, knows where the script lives, but can't actually fetch it. This is one of the most common failure modes in real deployments. The URL looks correct in DHCP config, but something in the path — a firewall ACL, a wrong IP on the web server, a typo in the file path, an HTTP server not listening — breaks the download. In my experience, it almost always comes down to either a firewall rule that wasn't updated when the provisioning server moved, or a fat-finger in the file path.

How to Identify It

The ZTP log will show a download failure with an HTTP error code or a connection timeout:

sw-infrarunbook-01# show zerotouch
ZTP Status: Active
ZTP Mode: Normal
Last Action: Script download failed
Provisioning URL: http://192.168.10.10/ztp/provision.py
Error: HTTP 404 Not Found

sw-infrarunbook-01# show zerotouch log
Apr 19 10:14:22 ZTP: Attempting to retrieve http://192.168.10.10/ztp/provision.py
Apr 19 10:14:23 ZTP: curl error: HTTP response code 404
Apr 19 10:14:23 ZTP: Retrying in 30 seconds...

From the switch, you can test connectivity to the provisioning server manually through the bash shell:

sw-infrarunbook-01# bash curl -v http://192.168.10.10/ztp/provision.py
* Trying 192.168.10.10...
* Connected to 192.168.10.10 (192.168.10.10) port 80 (#0)
> GET /ztp/provision.py HTTP/1.1
> Host: 192.168.10.10
> User-Agent: curl/7.68.0
>
< HTTP/1.1 404 Not Found
< Content-Type: text/html

Then check the web server document root and confirm the file actually exists at the expected path:

infrarunbook-admin@ztp-srv:~$ ls -la /var/www/html/ztp/
total 0
drwxr-xr-x 2 www-data www-data  40 Apr 19 09:00 .
drwxr-xr-x 4 www-data www-data 100 Apr 19 08:50 ..
-rw-r--r-- 1 www-data www-data 2.1K Apr 19 09:00 provisioning.py
# File is named provisioning.py, but option 67 points to provision.py

How to Fix It

Correct the filename mismatch — either rename the file or update the URL in DHCP option 67. Then verify the web server is listening and the file is readable:

infrarunbook-admin@ztp-srv:~$ mv /var/www/html/ztp/provisioning.py /var/www/html/ztp/provision.py
infraunbook-admin@ztp-srv:~$ chmod 644 /var/www/html/ztp/provision.py
infraunbook-admin@ztp-srv:~$ curl -I http://localhost/ztp/provision.py
HTTP/1.1 200 OK
Content-Type: text/plain
Content-Length: 2156

If the issue is a firewall, make sure TCP port 80 (or 443 for HTTPS) is permitted from the management network to the ZTP server. Also confirm your management VRF routing is correct — if switches use a separate management VRF, your routes and firewall rules must account for that VRF context specifically.

Root Cause 3: Script Syntax Error

Why It Happens

The script downloads successfully, EOS tries to execute it, and it crashes immediately. ZTP scripts on Arista are typically Python, and a syntax error causes the script to exit with a non-zero return code, which EOS interprets as a provisioning failure. This is particularly painful because the switch handled everything correctly up to execution — the failure is entirely in the script content. I've seen this happen repeatedly after last-minute edits to provisioning scripts, especially when someone edits directly on the server without testing first.

How to Identify It

The ZTP log will surface the Python traceback:

sw-infrarunbook-01# show zerotouch log
Apr 19 10:22:15 ZTP: Script downloaded successfully (2156 bytes)
Apr 19 10:22:15 ZTP: Executing provisioning script...
Apr 19 10:22:16 ZTP: Script execution failed (exit code 1)
Apr 19 10:22:16 ZTP: Script output:
  File "/tmp/provision.py", line 47
    startup_config = """
                     ^
SyntaxError: EOL while scanning string literal

You can reproduce and verify this on the provisioning server before ever involving the switch:

infrarunbook-admin@ztp-srv:~$ python3 -m py_compile /var/www/html/ztp/provision.py
  File "/var/www/html/ztp/provision.py", line 47
    startup_config = """
                     ^
SyntaxError: EOL while scanning string literal

How to Fix It

Always lint ZTP scripts before deploying them. One-liner, no excuses:

infrarunbook-admin@ztp-srv:~$ python3 -m py_compile /var/www/html/ztp/provision.py && echo "Syntax OK"
Syntax OK
infraunbook-admin@ztp-srv:~$ echo $?
0

In the example above, the triple-quoted string literal was never closed. Here's the before and after:

# Before (broken) -- missing closing triple-quote:
startup_config = """
hostname sw-infrarunbook-01
interface Management1
   ip address 192.168.10.50/24

# After (fixed):
startup_config = """
hostname sw-infrarunbook-01
interface Management1
   ip address 192.168.10.50/24
"""

Beyond syntax, validate the script's logic too. A syntactically valid script can still fail — for example, if it writes to /mnt/flash/startup-config but the path resolution differs in the EOS version running on that device, or if it makes an eAPI call using a socket path that hasn't been enabled. Test scripts against a known-good device in a lab before rolling them to production. This isn't optional; it's the difference between a smooth rollout and a midnight firefight.

Root Cause 4: Interface Not in ZTP Mode

Why It Happens

ZTP only runs automatically on a switch that has no startup configuration and hasn't had ZTP explicitly disabled. If someone previously connected to the switch and ran zerotouch cancel — maybe while troubleshooting an earlier failure — the switch won't run ZTP on the next boot. It'll sit at the default prompt waiting for manual intervention. This catches people off guard because the switch looks normal: it booted, it's responsive. But it has no config and ZTP isn't running.

There's also a subtler version. The switch is in ZTP mode, but it's trying to run ZTP on the wrong interface. On multi-management-port platforms or when using out-of-band management, ZTP might bind to an interface that isn't physically connected or doesn't have DHCP service on that subnet.

How to Identify It

Check ZTP status directly:

sw-infrarunbook-01# show zerotouch
ZTP Status: Disabled
ZTP Mode: N/A
Reason: ZTP was cancelled by user

Or check whether the ZTP disable marker file exists on flash:

sw-infrarunbook-01# bash ls /mnt/flash/.ztp-disabled
/mnt/flash/.ztp-disabled

If that file is present, ZTP will not run regardless of configuration. For the wrong-interface issue, check which interface ZTP is actually using and compare against what's physically connected:

sw-infrarunbook-01# show zerotouch
ZTP Status: Active
ZTP Mode: Normal
Interface: Management0
DHCP Status: No offer received

sw-infrarunbook-01# show interfaces Management1
Management1 is up, line protocol is up (connected)
  Hardware is DEC21140
  Internet address is unassigned

# ZTP is polling Management0, but Management1 is the connected port

How to Fix It

To re-enable ZTP after it's been cancelled, simply run:

sw-infrarunbook-01# zerotouch run
ZTP: Starting ZTP process on Management1

If the disable marker file exists and ZTP refuses to start, remove it manually and reload:

sw-infrarunbook-01# bash rm /mnt/flash/.ztp-disabled
sw-infrarunbook-01# reload

For the wrong-interface issue, most Arista platforms default ZTP to Management1. Some DCS-7280 variants use Management0 as the primary out-of-band port. If your environment uses a non-default management port, verify against the platform datasheet and make sure your DHCP service is running on the correct network segment for that port.

Root Cause 5: EOS Version Mismatch

Why It Happens

This one is subtle and takes longer to pin down. ZTP scripts often use eAPI calls, specific CLI commands, or configuration syntax that changed between EOS releases. If the switch boots with an older EOS image than what the script was written against, the script may fail when it calls an eAPI method that doesn't exist yet, or tries to apply configuration syntax that isn't recognized in that version. The reverse also happens — if the script assumes legacy behavior that was removed in a newer EOS release, you'll see failures on newer hardware.

In my experience, this manifests most visibly when a ZTP script is written and tested against a specific EOS train, then deployed against hardware that shipped from the factory with a different version — sometimes months older. The script hits a command that wasn't added until 4.27.x, the switch is running 4.25.x, and the provisioning fails with a cryptic eAPI error that doesn't obviously point to a version problem.

How to Identify It

First, check what version of EOS is running on the switch:

sw-infrarunbook-01# show version
Arista DCS-7050CX3-32S
Hardware version: 11.00
Serial number: JPE12345678
System MAC address: 00:1c:73:ab:cd:ef

Software image version: 4.26.2F
Architecture: x86_64
Internal build version: 4.26.2F-12345678.4262F
Internal build ID: abc12345-1234-5678-abcd-abc123456789

Uptime: 0 weeks, 0 days, 0 hours and 3 minutes
Total memory: 8167932 kB
Free memory: 5921032 kB

Then look at what commands your provisioning script invokes. If the script calls a CLI command or eAPI endpoint that was introduced in 4.27.x and the switch is running 4.26.2F, it'll fail. The ZTP log will show the specific error:

sw-infrarunbook-01# show zerotouch log
Apr 19 10:35:44 ZTP: Script downloaded successfully (3204 bytes)
Apr 19 10:35:44 ZTP: Executing provisioning script...
Apr 19 10:35:46 ZTP: Script execution failed (exit code 1)
Apr 19 10:35:46 ZTP: Script output:
Traceback (most recent call last):
  File "/tmp/provision.py", line 112, in <module>
    client.runCmds(1, ['show platform sand counters'])
  File "/usr/lib/python3/dist-packages/jsonrpclib/jsonrpc.py", line 288, in __call__
    return self.__send(self.__name, args)
jsonrpclib.jsonrpc.AppError: CLI command 2 of 2 'show platform sand counters'
failed: invalid command

That invalid command error against a command you know is valid on newer EOS is the tell. Cross-reference the command against the EOS release notes to confirm when it was introduced.

How to Fix It

There are two approaches and you often need both. First, write your ZTP script to check the running EOS version at the top and branch logic accordingly. Second, if the script is supposed to upgrade EOS as part of provisioning, make sure that upgrade step completes and the script re-evaluates after reboot before running version-specific commands.

Here's a practical version-check pattern for your ZTP script:

import jsonrpclib
import sys

client = jsonrpclib.Server('unix:/var/run/command-api.sock')
result = client.runCmds(1, ['show version'])
eos_version = result[0]['version']

REQUIRED_VERSION = '4.28.3M'

if eos_version != REQUIRED_VERSION:
    print(f'EOS version mismatch: running {eos_version}, need {REQUIRED_VERSION}')
    client.runCmds(1, [
        'copy http://192.168.10.10/eos/EOS-4.28.3M.swi flash:EOS-4.28.3M.swi',
    ])
    client.runCmds(1, [
        'install source flash:EOS-4.28.3M.swi now'
    ])
    # Switch will reboot; ZTP re-runs automatically on next boot
    sys.exit(0)

Also verify image-to-hardware compatibility before deploying. Arista publishes release notes that list supported platforms for each image. The filename conventions matter:

infrarunbook-admin@ztp-srv:~$ ls /var/www/html/eos/
EOS-4.28.3M.swi          # Universal image, broad hardware support
EOS64-4.28.3M.swi        # 64-bit platforms only
EOS-4.28.3M-INT.swi      # International regulatory variant

Pushing EOS64-4.28.3M.swi to a platform that requires the universal image will result in a boot failure. Don't assume the filename is interchangeable — verify it against the target platform's hardware documentation before staging.

Root Cause 6: TFTP Block Size and Transfer Timeouts

Why It Happens

Some environments use TFTP instead of HTTP for ZTP script delivery. TFTP's default block size of 512 bytes causes real problems with larger provisioning scripts — transfers are slow and prone to timeout on congested or lossy management networks. Worse, some TFTP server implementations don't support block size negotiation (RFC 2348), so the switch and server end up stuck: transfers start but never complete, and the switch keeps retrying.

How to Identify It

sw-infrarunbook-01# show zerotouch log
Apr 19 11:02:10 ZTP: Attempting to retrieve tftp://192.168.10.10/ztp/provision.py
Apr 19 11:02:40 ZTP: TFTP transfer timed out after 30 seconds
Apr 19 11:02:40 ZTP: Retrying in 60 seconds...

On the server, check whether block size negotiation is configured:

infrarunbook-admin@ztp-srv:~$ grep -i blksize /etc/default/tftpd-hpa
# Empty result -- blksize option not configured

How to Fix It

The cleanest fix is switching from TFTP to HTTP. Update DHCP option 67 to use an HTTP URL. HTTP handles larger files reliably, gives you proper error codes, and is far easier to debug. If you must keep TFTP, enable block size negotiation and increase the block size to reduce transfer overhead:

infrarunbook-admin@ztp-srv:~$ cat /etc/default/tftpd-hpa
TFTP_USERNAME="tftp"
TFTP_DIRECTORY="/var/lib/tftpboot"
TFTP_ADDRESS="0.0.0.0:69"
TFTP_OPTIONS="--secure --blocksize 1468"

infraunbook-admin@ztp-srv:~$ sudo systemctl restart tftpd-hpa

The 1468-byte block size aligns with standard Ethernet MTU minus IP and UDP headers, which keeps frames from fragmenting across a typical management network.

Root Cause 7: Management VRF Routing Not Configured

Why It Happens

On Arista platforms with a dedicated management VRF — which is the default on most production deployments — ZTP runs in the management VRF context. If the DHCP server or ZTP provisioning server isn't reachable from within that VRF, everything fails silently. This is common when the management network was recently reconfigured, when the provisioning server is on a different subnet without a proper route in the management VRF, or when someone applied a partial startup config that modified VRF routing before ZTP finished.

How to Identify It

sw-infrarunbook-01# bash ping -I Management1 192.168.10.10
PING 192.168.10.10 (192.168.10.10) from Management1: 56 data bytes
ping: sendmsg: Network is unreachable

sw-infrarunbook-01# show ip route vrf MGMT
VRF: MGMT
Gateway of last resort is not set

# No routes -- switch can't reach anything outside its directly connected subnet

How to Fix It

Add a default route to the management VRF. Either set it statically during recovery or ensure DHCP is correctly sending option 3 (default gateway) in the offer:

sw-infrarunbook-01(config)# vrf instance MGMT
sw-infrarunbook-01(config-vrf-MGMT)# exit
sw-infrarunbook-01(config)# ip route vrf MGMT 0.0.0.0/0 192.168.10.1

sw-infrarunbook-01# zerotouch run

Long-term, ensure your DHCP scope includes option 3 alongside option 67 so that fresh switches always receive a default gateway as part of their initial lease.

Prevention

Most ZTP failures are repeatable and preventable. The pattern is almost always the same: someone changes one thing in isolation — a DHCP server, a script file, a web server config — and doesn't test the end-to-end flow before racking hardware. Build a workflow that validates each component before the switch ever arrives.

Test your DHCP scope before you rack the switch. Use a test machine on the same management VLAN to request a lease and confirm option 67 is present in the response:

infrarunbook-admin@test-host:~$ sudo dhclient -v eth0 2>&1 | grep -i "filename\|bootfile"
DHCPOFFER of 192.168.10.51 from 192.168.10.5
option bootfile-name: http://192.168.10.10/ztp/provision.py

Always lint ZTP scripts before deploying. This belongs in your CI pipeline or at minimum in your deployment checklist. A script that hasn't been through python3 -m py_compile has no business being on the provisioning server.

Version-gate your scripts. Every ZTP script should check the running EOS version at the top and fail fast with an informative message if there's a mismatch — rather than crashing halfway through configuration in ways that are hard to debug from a console.

Monitor your HTTP server availability continuously. A dead web server silently blocks every new switch deployment. Add a simple health check to your monitoring platform that alerts on HTTP 5xx responses or connection failures to the ZTP server URL. You don't want to discover the provisioning server is down when you're standing next to a rack at 11pm.

Document every ZTP cancellation. If someone runs zerotouch cancel on a switch during initial troubleshooting, that needs to go into your IPAM or CMDB immediately. A switch sitting in cancelled-ZTP state with no startup config is a trap for the next engineer who touches it. A one-line note saves a lot of confusion.

Use CloudVision ZTP where your licensing allows it. CVP's ZTP workflow gives you real-time visibility into provisioning state, error categorization, and retry history through a proper dashboard rather than raw console logs. The time-to-diagnosis on a failed provisioning event drops significantly when you can see the full device state in one place rather than hunting through per-switch log output.

Validate image-to-hardware compatibility before staging images. Cross-reference the EOS image filename against Arista's hardware compatibility matrix before pointing your provisioning scripts at it. A two-minute check prevents a full re-imaging cycle when ZTP installs an incompatible image and the switch fails to boot cleanly.

Arista ZTP Provisioning Failing

Symptoms

Root Cause 1: DHCP Option 67 Not Set

Why It Happens

How to Identify It

How to Fix It

Root Cause 2: ZTP Script Not Reachable

Why It Happens

How to Identify It

How to Fix It

Root Cause 3: Script Syntax Error

Why It Happens

How to Identify It

How to Fix It

Root Cause 4: Interface Not in ZTP Mode

Why It Happens

How to Identify It

How to Fix It

Root Cause 5: EOS Version Mismatch

Why It Happens

How to Identify It

How to Fix It

Root Cause 6: TFTP Block Size and Transfer Timeouts

Why It Happens

How to Identify It

How to Fix It

Root Cause 7: Management VRF Routing Not Configured

Why It Happens

How to Identify It

How to Fix It

Prevention

Frequently Asked Questions

How do I check if ZTP is active on an Arista switch?

Why does my Arista switch keep rebooting during ZTP?

What DHCP options are required for Arista ZTP to work?

Can I use HTTPS instead of HTTP for ZTP script delivery on Arista?

How do I re-run ZTP on an Arista switch after it was cancelled?

Related Articles