Symptoms
SD-WAN looks correct on paper — rules configured, health checks defined, multiple WAN interfaces present — and yet traffic routes over the wrong link. Maybe your VoIP traffic is taking the high-latency cable circuit instead of the fiber line you designated for real-time applications. Maybe all sessions are hammering a single interface while the second link sits completely idle. Or connectivity drops intermittently and self-corrects every few seconds for no obvious reason.
Other symptoms that come up in the field: the FortiGate GUI shows all interfaces as healthy and all rules active, but
diagnose sys sdwan servicereveals sessions matching no rule and falling through to the default. Or an interface that is physically up and passing real traffic is being excluded from SD-WAN consideration entirely. These failures tend to look like ghost problems — everything appears configured correctly, but routing behavior does not match expectations.
In most cases, SD-WAN misrouting comes down to one of seven root causes. Let's work through each one with real CLI output so you can identify exactly which one you're hitting.
Root Cause 1: Performance SLA Failing
Why It Happens
The FortiGate evaluates interface quality using a Performance SLA configuration attached to each health check. You define thresholds for latency, jitter, and packet loss. When an interface breaches any of those thresholds, it gets marked as SLA-failed and SD-WAN rules that prefer SLA-passing interfaces route traffic away from it. If all members simultaneously fail the SLA, traffic either falls to the fallback interface or stops matching SD-WAN rules altogether.
The mistake I see most often is configuring SLA thresholds based on ideal circuit specifications rather than real-world measured values. A cable broadband link that averages 30ms during off-peak hours can easily spike to 80ms during evening congestion. If your SLA latency threshold is set at 40ms, you'll see SLA flaps that kick traffic to the wrong interface every evening — and users will file tickets right around 5pm like clockwork.
How to Identify It
Run this from the CLI:
diagnose sys sdwan health-checkA failing interface produces output like this:
Health Check(hc_isp1):
Seq(1): interface: wan1
state(alive|sla_fail): 0x2
log_avg_rtt: 68.41 msec
avg_rtt: 72.03 msec
avg_jitter: 5.87 msec
avg_loss: 0.00%
sla_map=0x0
sla 1: latency=50, jitter=5, loss=1
pass(latency): 0The
pass(latency): 0line tells you the interface isn't meeting its configured latency SLA.
sla_map=0x0confirms no SLA targets are being met. The
state(alive|sla_fail)value tells you the interface is physically alive but treated as degraded by the SD-WAN daemon.
How to Fix It
Measure actual circuit latency over several days, capture the 95th-percentile value during peak hours, and set your threshold 20–30% above that figure. Don't anchor to the average — averages lie. You want a threshold that triggers only on genuinely bad behavior, not normal congestion variance.
config system sdwan
config health-check
edit "hc_isp1"
set server "8.8.8.8"
set interval 1000
set failtime 5
set recoverytime 5
config sla
edit 1
set latency-threshold 100
set jitter-threshold 15
set packetloss-threshold 3
next
end
next
end
endAfter saving, watch the SLA state update. The
sla_mapfield changes from
0x0to
0x1once the interface starts meeting the revised threshold. Give it a full failtime + recoverytime cycle before declaring success.
Root Cause 2: Rule Priority Wrong
Why It Happens
SD-WAN service rules are evaluated in sequence number order — lowest number first, first match wins. If a broad catch-all rule has a lower sequence number than your specific application rule, traffic matches the catch-all and the specific rule never gets evaluated. This is one of those issues that catches engineers off guard because the GUI sometimes presents rules in a way that looks ordered correctly, while the underlying sequence numbers in the CLI configuration tell a different story entirely.
How to Identify It
Check the actual sequence numbers from the CLI:
config system sdwan
config service
show
end
endIf you see output like this, you have a problem:
config service
edit 1
set name "General_Browsing"
set mode load-balance
set dst "all"
next
edit 2
set name "VoIP_Priority"
set mode priority
set dst "10.50.1.0/24"
next
endRule 1 matches destination
all— which means VoIP traffic to 10.50.1.0/24 hits rule 1 first and commits there. Rule 2 never evaluates for that traffic. To see which rules are actually matching active sessions right now, run:
diagnose sys sdwan serviceThis shows active session-to-rule mappings. If VoIP sessions appear under rule 1 instead of rule 2, you've confirmed the ordering issue.
How to Fix It
Specific rules must come before broad rules. Delete and recreate in the correct order, or renumber so the specific rule has the lower sequence number:
config system sdwan
config service
edit 1
set name "VoIP_Priority"
set mode priority
set dst "10.50.1.0/24"
next
edit 2
set name "General_Browsing"
set mode load-balance
set dst "all"
next
end
endThe ordering principle is consistent: most specific destination before broadest destination; highest-priority application before general web traffic; explicit protocol or port before catch-all. After reordering, rerun
diagnose sys sdwan serviceand confirm sessions are now matching the expected rule numbers.
Root Cause 3: Health Check Probe Failing
Why It Happens
A health check probe can fail for reasons that have nothing to do with the quality of the WAN circuit itself. The probe server blocks ICMP. The probe destination isn't reachable from that specific interface due to routing asymmetry. A DNS name used as the probe target doesn't resolve through that link. The probe interval is too aggressive and the upstream server rate-limits it. When a probe fails consistently, FortiGate marks the interface as dead and removes it from SD-WAN routing consideration entirely — even if the interface can pass real user traffic without any issues at all.
In my experience, this is the most silently destructive failure mode in the list. A perfectly healthy 1Gbps circuit gets excluded from SD-WAN because someone pointed the health check at an internal server that doesn't respond to ping from the WAN side.
How to Identify It
Check detailed health check status:
diagnose sys sdwan health-check statusA dead interface looks like this:
Health Check(hc_isp2):
Seq(2): interface: wan2
state(dead): 0x0
log_avg_rtt: 0.00 msec
avg_loss: 100.00%100% packet loss on the probe. Now isolate whether it's the probe target or the WAN circuit itself by testing manually from the FortiGate using the interface's WAN IP as the source:
execute ping-options source 172.16.2.1
execute ping 8.8.8.8If this succeeds but the health check still shows dead, the probe server is the problem — not the circuit. The interface is fine; the target you're probing isn't responding to the FortiGate's health check packets.
How to Fix It
Change the probe target to something reliably reachable from that WAN interface. If ICMP is blocked end-to-end, switch to HTTP probing against an internal server or a known reliable external endpoint:
config system sdwan
config health-check
edit "hc_isp2"
set protocol http
set server "172.16.100.10"
set http-get "/"
set http-match "200"
set interval 2000
set failtime 3
set recoverytime 3
next
end
endAlways test the probe target manually before enabling any health check. One minute of manual testing from the CLI prevents hours of troubleshooting later. A health check pointed at an unreachable server will mark a healthy interface as dead within seconds of the failtime threshold being crossed, with no warning in the GUI.
Root Cause 4: Interface Member Not Added
Why It Happens
SD-WAN only controls traffic over interfaces that are explicitly added as members of the SD-WAN zone. A physical interface that is up, has an IP assigned, and has a static default route associated with it can still forward traffic — but it does so completely outside SD-WAN control. No health checking, no SLA enforcement, no intelligent routing decisions. This happens most often when a new WAN circuit gets provisioned and someone correctly configures the interface IP and static default route, but forgets the SD-WAN member entry. The circuit carries traffic fine. It's just not under SD-WAN management in any way.
It also shows up during migrations from traditional multi-WAN configurations where not all WAN interfaces get moved into the SD-WAN zone during the transition.
How to Identify It
List current SD-WAN members from the CLI:
config system sdwan
config members
show
end
endOutput showing only two members when you have three physical WAN interfaces confirms the problem:
config members
edit 1
set interface "wan1"
set gateway 10.0.0.1
next
edit 2
set interface "wan2"
set gateway 172.16.2.1
next
endCross-reference this against physical interfaces:
get system interfaceIf wan3 appears here with an IP but isn't in the members list, it's operating outside SD-WAN. Confirm with:
diagnose sys sdwan memberOnly interfaces the SD-WAN daemon is actively tracking appear in this output. Anything missing from this list is invisible to SD-WAN logic — it won't appear in health check output, SLA evaluation, or rule matching.
How to Fix It
Add the missing interface as an SD-WAN member with its correct gateway:
config system sdwan
config members
edit 3
set interface "wan3"
set gateway 192.168.3.1
set priority 1
set weight 1
next
end
endAfter adding, run
diagnose sys sdwan memberto confirm it appears in the daemon's active list:
Member(1): interface: wan1, gateway: 10.0.0.1
weight: 1, priority: 1, volume(Mbps): tx:4.12 rx:12.33
Member(2): interface: wan2, gateway: 172.16.2.1
weight: 1, priority: 1, volume(Mbps): tx:2.08 rx:6.21
Member(3): interface: wan3, gateway: 192.168.3.1
weight: 1, priority: 1, volume(Mbps): tx:0.00 rx:0.00One thing to check immediately after: firewall policies that previously referenced the physical interface by name need to reference the SD-WAN zone object instead. Policies that still point to the physical interface won't match traffic being routed through the SD-WAN zone.
Root Cause 5: Bandwidth Measurement Wrong
Why It Happens
When SD-WAN is configured in
volumeor
spillovermode, the FortiGate uses measured bandwidth figures to decide how to distribute sessions across links. If those measurements are inaccurate — stale after a reboot, inflated after a burst event, or simply wrong because of the probe method — the load-balancing algorithm assigns sessions incorrectly. A 10Mbps link gets treated like a 100Mbps link, gets overloaded, and the 100Mbps link sits underutilized while users complain about performance.
ICMP probes are excellent for measuring latency and packet loss. They're essentially useless for measuring bandwidth. A 64-byte ICMP echo doesn't generate anywhere near enough traffic to test throughput. Any bandwidth figure derived purely from ICMP probes is at best an educated guess. After a FortiGate reboot, bandwidth measurements start at near-zero and take time to climb toward realistic values, during which SD-WAN makes systematically bad routing decisions.
How to Identify It
Check what bandwidth figures the FortiGate is actually working with:
diagnose sys sdwan health-check statusLook specifically at the
upload_bwand
download_bwvalues reported per interface:
Health Check(hc_isp1):
Seq(1): interface: wan1
upload_bw: 512 Kbps
download_bw: 1024 KbpsIf wan1 is a 100Mbps circuit and the FortiGate reports 512 Kbps download bandwidth, the measurement is wrong and any volume-based load-balancing decisions built on that number will be skewed badly. Compare against the real-time throughput shown in
diagnose sys sdwan member, which reflects actual session volume and is typically more representative than health-check-derived bandwidth estimates.
How to Fix It
Switch to HTTP-based health check probes, which transfer more data through the link and give the FortiGate a better basis for bandwidth estimation than ICMP ever can:
config system sdwan
config health-check
edit "hc_isp1"
set protocol http
set server "172.16.100.10"
set http-get "/"
set interval 1000
next
end
endIf bandwidth measurement is chronically inaccurate or the measurement cycle isn't fast enough for your load-balancing needs, switch the SD-WAN rule from
volumeto
sessionmode. Session-based balancing distributes connections rather than bytes, removing the dependency on accurate bandwidth measurement entirely:
config system sdwan
config service
edit 1
set mode session
next
end
endAlternatively, configure spillover thresholds with manually defined values that reflect actual circuit capacity rather than relying on measured figures:
config system sdwan
config members
edit 1
set ingress-spillover-threshold 90000
set egress-spillover-threshold 90000
next
end
endRoot Cause 6: SD-WAN Rule Not Matching Traffic
Why It Happens
A rule in the correct position with the right priority won't fire if the match criteria don't align with the actual session being evaluated. Source address objects that don't cover the real client subnet, destination objects that miss the server's IP range, DSCP match values that don't appear on actual traffic, or application signatures that haven't fired yet because the session is too new — any one of these mismatches means the rule is effectively invisible to that session and traffic falls through to a catch-all or the default route.
How to Identify It
Enable flow debug for the client generating the problem traffic. Substitute 10.10.10.50 with the actual client IP you're troubleshooting:
diagnose debug reset
diagnose debug flow filter addr 10.10.10.50
diagnose debug flow show function-name enable
diagnose debug flow show iprope enable
diagnose debug enable
diagnose debug flow trace start 50In the output stream, look for SD-WAN rule evaluation messages:
id=65531 trace_id=3 func=iprope_fwd_check line=817 msg="check SD-WAN rule"
id=65531 trace_id=3 func=iprope_fwd_check line=830 msg="no SD-WAN rule matched, use default"That second line is your confirmation. The session evaluated every rule in sequence and matched nothing. After capturing the output you need, clean up so you don't flood the console:
diagnose debug reset
diagnose debug disableReview the rule's source address object and compare it against the actual client IP shown in the flow debug output. The mismatch is usually obvious once you look.
How to Fix It
Update the address object to correctly cover the client subnet, or create a new address object that matches the actual traffic you're trying to control. If you're using application-based SD-WAN rules and hitting early-session mismatches — where the application signature hasn't been identified in the first few packets — add a destination address or port-based filter as a fallback to catch sessions before the app ID fires.
Root Cause 7: Static Route Overriding SD-WAN
Why It Happens
SD-WAN layers on top of the routing table — it doesn't replace it. If a static route exists with a lower administrative distance than the SD-WAN zone's implicit route for the same destination, the static route wins and traffic never enters SD-WAN evaluation. Sessions follow the static route's outbound interface directly, bypassing all health checks, SLA enforcement, and rule logic completely. This shows up frequently in environments migrated from traditional multi-WAN configurations where static routes were the traffic-steering mechanism before SD-WAN was introduced — and someone didn't fully clean up the old config.
How to Identify It
get router info routing-table allLook for static routes (marked
S) covering the same destinations as your SD-WAN rules, with a low administrative distance value:
S* 0.0.0.0/0 [1/0] via 10.0.0.1, wan1
[10/0] via 172.16.2.1, wan2A static default with AD=1 pointing directly to wan1 means every session exits via wan1 regardless of what your SD-WAN rules say. SD-WAN only applies to traffic routed through the virtual-wan-link or SD-WAN zone interface, not to sessions committed directly to a physical interface by a static route.
How to Fix It
Remove the conflicting static route. Ensure each SD-WAN member has a gateway configured so the FortiGate can build its own implicit routes through the SD-WAN zone:
config router static
delete 1
endconfig system sdwan
config members
edit 1
set gateway 10.0.0.1
next
edit 2
set gateway 172.16.2.1
next
end
endAfter removing the static route, verify the routing table shows the SD-WAN zone as the exit path for default traffic:
get router info routing-table allThe SD-WAN implicit route should now appear and sessions will enter SD-WAN evaluation as expected. If traffic stops passing entirely after removing the static route, it usually means a member gateway wasn't configured — the SD-WAN daemon can't build its implicit route without a gateway to point to.
Prevention
Set SLA thresholds based on measured circuit behavior, not marketing specs. Capture 95th-percentile latency and loss figures over a representative week and set thresholds 20–30% above that baseline — high enough to survive normal variance without triggering constant SLA flaps, but low enough to catch genuine circuit degradation before users notice it.
Test every health check probe target manually before enabling it. Running
execute ping-options source <wan-ip>followed by
execute ping <probe-target>takes under a minute and immediately catches unreachable targets or ICMP-blocking servers before they silently exclude a healthy interface from SD-WAN.
After adding any new WAN circuit, make
diagnose sys sdwan memberyour first post-config check. If the interface isn't in that output, it isn't under SD-WAN control — full stop. Review SD-WAN rule order after every rule addition or modification. The sequence numbers matter and the GUI ordering can be misleading about what the FortiOS daemon actually processes first.
Build a short post-change validation routine that covers the three most common failure modes: run
diagnose sys sdwan health-checkto confirm all interfaces are alive and SLA-passing, run
diagnose sys sdwan serviceto confirm active sessions are matching the expected rules, and run
get router info routing-table allto confirm no static routes are overriding SD-WAN decisions. These three commands take under a minute and catch the majority of misconfigurations before users ever notice them.
For multi-site deployments managed from sw-infrarunbook-01 via FortiManager, use SD-WAN policy templates to enforce consistent health check and SLA settings across every site. Config drift between sites makes troubleshooting dramatically harder — what works at headquarters may behave differently at a branch if someone hand-edited the SD-WAN configuration locally and deviated from the baseline template.
