Symptoms
When an F5 BIG-IP virtual server stops responding, users and monitoring systems surface a predictable set of indicators. Recognizing these before you log in to the BIG-IP dramatically narrows your search space.
- Clients receive Connection timed out or Connection refused when reaching the VIP address
- Browser returns HTTP 503 Service Unavailable with no F5 error page injected
curl -Iv https://10.10.50.100
hangs indefinitely or resets immediately with a TCP RST- Monitoring probes (Nagios, Zabbix, Datadog) alert that the virtual server health check has failed
- Application servers log upstream connection failures — no new requests arriving from the load balancer
- SSL handshake never completes for HTTPS virtual servers; the TLS client hello receives no server hello
- Traffic capture on the client side shows SYN packets transmitted but no SYN-ACK returned
- BIG-IP statistics counters for the virtual server show zero increments on current connections despite active client attempts
Root Cause 1: VIP Not Enabled
Why It Happens
The most frequently missed cause of a non-responding virtual server is that the VIP itself is administratively disabled. This occurs after maintenance windows where engineers forget to re-enable objects before closing the change ticket, after a configuration push that inadvertently sets the virtual server state to disabled, or following automated deployment scripts that toggle virtual server state without a corresponding re-enable step. The BIG-IP data plane completely ignores a disabled virtual server — no SYN is acknowledged, no connection is established, and no log entry is generated for dropped attempts.
How to Identify It
SSH to the BIG-IP as infrarunbook-admin and check the virtual server state with TMSH:
tmsh show ltm virtual vs_solvethenetwork_443
A disabled virtual server produces output similar to:
Ltm::Virtual Server: vs_solvethenetwork_443
Availability : offline
State : disabled
Reason : Virtual server is disabled
Current Sessions : 0
Total Sessions : 847291
To scan all virtual servers for any that are disabled:
tmsh list ltm virtual | grep -E "ltm virtual|disabled"
ltm virtual vs_solvethenetwork_443 {
disabled
destination 10.10.50.100:443
ltm virtual vs_solvethenetwork_80 {
destination 10.10.50.100:80
How to Fix It
Re-enable the virtual server from TMSH and persist the configuration:
tmsh modify ltm virtual vs_solvethenetwork_443 enabled
tmsh save sys config
Confirm the state change took effect:
tmsh show ltm virtual vs_solvethenetwork_443 field-fmt | grep -E "availability|enabled"
status.availability-state available
status.enabled-state enabled
status.status-reason The virtual server is available
Root Cause 2: No Pool Members Up
Why It Happens
Even when the virtual server is enabled, it will refuse or drop new connections if every member of its default pool is marked down by the health monitor. Pool members go offline when the configured monitor fails its checks against the backend. Common triggers include application server restarts, a failed code deployment returning unexpected HTTP status codes, network ACL changes blocking the monitor source IP from reaching the pool member port, or health monitor timeouts configured too aggressively for the application's actual response time. With zero healthy members, BIG-IP has nowhere to forward new connections. Depending on the action-on-service-down setting, it will either reset the client connection or simply drop the SYN.
How to Identify It
Display pool and member availability:
tmsh show ltm pool pool_solvethenetwork_443 members
Ltm::Pool: pool_solvethenetwork_443
Availability : offline
State : enabled
Reason : The children pool member(s) are down
Ltm::Pool Member: 10.10.20.10:8443
Availability : offline
State : enabled
Reason : Pool member has been marked down by a monitor
Ltm::Pool Member: 10.10.20.11:8443
Availability : offline
State : enabled
Reason : Pool member has been marked down by a monitor
Ltm::Pool Member: 10.10.20.12:8443
Availability : offline
State : enabled
Reason : Pool member has been marked down by a monitor
Check what monitor is assigned and inspect its configuration:
tmsh list ltm pool pool_solvethenetwork_443 | grep monitor
tmsh list ltm monitor https mon_solvethenetwork_https
ltm monitor https mon_solvethenetwork_https {
defaults-from https
interval 5
recv "HTTP/1.1 200"
send "GET /health HTTP/1.1\r\nHost: solvethenetwork.com\r\nConnection: close\r\n\r\n"
timeout 16
}
How to Fix It
Verify the backend application is actually reachable from the BIG-IP self-IP on the server VLAN:
ping -c 4 10.10.20.10
curl -kv --interface 10.10.10.1 https://10.10.20.10:8443/health
If the application is genuinely down, restore the application service on all backend servers. Once the application responds with the expected content, the BIG-IP monitor will automatically mark members available within one monitor interval. You can force an immediate re-evaluation by bouncing the member state:
tmsh modify ltm pool pool_solvethenetwork_443 members modify { 10.10.20.10:8443 { state user-down } }
tmsh modify ltm pool pool_solvethenetwork_443 members modify { 10.10.20.10:8443 { state user-up } }
If the application is healthy but the monitor is misconfigured (wrong recv string, wrong URI, wrong port), correct the monitor and save:
tmsh modify ltm monitor https mon_solvethenetwork_https recv "200 OK"
tmsh save sys config
Root Cause 3: Routing Issue to VIP
Why It Happens
The virtual server is enabled and pool members are healthy, yet clients still cannot reach the VIP. The culprit is frequently a routing problem: the upstream router lacks a route to the VIP subnet, route redistribution has failed to inject the VIP into the routing domain, or a recent network change has invalidated the path. This is especially common when the VIP address lives on a different subnet than the BIG-IP management interface, when the F5 participates in dynamic routing (OSPF or BGP) and has lost its adjacency, or after a BIG-IP failover where the floating IP has moved but upstream ARP or routing tables have not refreshed.
How to Identify It
From a host on the client network (172.16.5.0/24), trace the path to the VIP:
traceroute 10.10.50.100
traceroute to 10.10.50.100 (10.10.50.100), 30 hops max, 60 byte packets
1 172.16.5.1 0.4 ms 0.3 ms 0.4 ms
2 * * *
3 * * *
4 * * *
The trace dying after the first hop means the distribution-layer router has no route to 10.10.50.100. On sw-infrarunbook-01, inspect the routing table:
show ip route 10.10.50.100
% Network not in table
On the BIG-IP, confirm the default gateway and routing table are intact:
tmsh show net route
tmsh list sys management-route
Net::Routes
Name Dest Gateway Type Interface
------------------------------------------------------
default 0.0.0.0/0 10.10.50.254 static external
10.10.20.0/24 10.10.10.1 -- connected internal
How to Fix It
Add a static host route on sw-infrarunbook-01 pointing to the F5 external self-IP as the next hop:
ip route 10.10.50.100 255.255.255.255 10.10.50.1
If the VIP subnet is a /24 rather than a host route, use:
ip route 10.10.50.0 255.255.255.0 10.10.50.1
For BGP or OSPF environments, verify the F5 routing adjacency:
tmsh show net routing bgp neighbor
tmsh show net routing ospf neighbor
Confirm end-to-end reachability after the fix:
traceroute 10.10.50.100
curl -Iv http://10.10.50.100/
traceroute to 10.10.50.100, 30 hops max
1 172.16.5.1 0.4 ms
2 10.10.50.1 0.6 ms
3 10.10.50.100 0.9 ms
Root Cause 4: Profile Misconfiguration
Why It Happens
BIG-IP virtual servers rely on profiles — HTTP, SSL/TLS, TCP, UDP, OneConnect, and others — to define exactly how traffic is processed at each layer. A misconfigured profile can silently break traffic without surfacing an obvious error on the virtual server availability indicator. The most common failure modes are: an SSL client profile referencing a certificate that has expired or been deleted from the certificate store; a cipher suite mismatch between the client profile and what connecting clients support; an HTTP profile with broken header insertion that causes backends to reject the request; a TCP profile with idle timeouts shorter than application keepalive intervals; or a FastL4 profile inadvertently replacing a Full Proxy profile, stripping application-layer visibility the backend depends on.
How to Identify It
List the profiles attached to the virtual server:
tmsh list ltm virtual vs_solvethenetwork_443 profiles
ltm virtual vs_solvethenetwork_443 {
profiles {
clientssl_solvethenetwork {
context clientside
}
http { }
tcp { }
}
}
Inspect the SSL profile, paying close attention to the referenced certificate and key:
tmsh list ltm profile client-ssl clientssl_solvethenetwork
ltm profile client-ssl clientssl_solvethenetwork {
cert solvethenetwork.com.crt
key solvethenetwork.com.key
chain solvethenetwork-chain.crt
ciphers DEFAULT:!RC4:!EXPORT
}
Check certificate expiry:
tmsh list sys file ssl-cert solvethenetwork.com.crt | grep expiration
expiration-date 1680000000
expiration-string Apr 1, 2023
An expired certificate causes TLS handshake failures. Examine
/var/log/ltmfor SSL error messages:
tail -100 /var/log/ltm | grep -iE "ssl|handshake|profile|err"
Apr 6 08:12:04 sw-infrarunbook-01 err tmm[14432]: 01260009:3: Connection error: ssl_hs_rxhello:97: unsupported version (70)
Apr 6 08:12:05 sw-infrarunbook-01 err tmm[14432]: 01260009:3: Connection error: ssl_hs_rxhello:97: peer does not support any known cipher suite
How to Fix It
Import the renewed certificate and key, then save:
tmsh install sys crypto cert solvethenetwork.com.crt from-local-file /var/tmp/solvethenetwork_2025.crt
tmsh install sys crypto key solvethenetwork.com.key from-local-file /var/tmp/solvethenetwork_2025.key
tmsh install sys crypto cert solvethenetwork-chain.crt from-local-file /var/tmp/solvethenetwork_chain_2025.crt
tmsh save sys config
For a cipher mismatch, update the SSL profile cipher string to include supported suites:
tmsh modify ltm profile client-ssl clientssl_solvethenetwork ciphers "ECDHE+AESGCM:ECDHE+AES:!RC4:!EXPORT:!aNULL"
tmsh save sys config
Verify the new certificate is active and not expired:
tmsh list sys file ssl-cert solvethenetwork.com.crt | grep expiration
expiration-date 1775000000
expiration-string Mar 20, 2026
Root Cause 5: Self-IP Conflict
Why It Happens
A Self-IP conflict occurs when another device on the network has been assigned the same IP address as the F5 BIG-IP's self-IP or floating self-IP. Because the self-IP resides on the same VLAN and subnet as the virtual server, a conflicting ARP entry on the upstream switch can redirect traffic destined for the VIP to the wrong device. The rogue device almost certainly does not know how to process the traffic, so all connections silently fail. This is especially dangerous in environments where IP Address Management (IPAM) is loosely enforced, where a new server has been provisioned without cross-checking existing allocations, or after a DR failover brings up a standby environment with the same IP assignments as production.
How to Identify It
From the BIG-IP, use
arpingto detect duplicate addresses on the external VLAN:
arping -I external 10.10.50.1 -c 5
ARPING 10.10.50.1 from 10.10.50.1 external
Unicast reply from 10.10.50.1 [00:50:56:AB:11:22] 1.012ms
Unicast reply from 10.10.50.1 [00:50:56:CD:33:44] 1.148ms
Unicast reply from 10.10.50.1 [00:50:56:AB:11:22] 0.994ms
Unicast reply from 10.10.50.1 [00:50:56:CD:33:44] 1.201ms
Sent 5 probes (5 broadcast(s))
Received 10 response(s)
Two distinct MAC addresses replying to a single IP is definitive proof of an address conflict. Verify on sw-infrarunbook-01:
show arp | include 10.10.50.1
Internet 10.10.50.1 0 0050.56ab.1122 ARPA Vlan50
Internet 10.10.50.1 0 0050.56cd.3344 ARPA Vlan50
Identify the offending device by tracing the MAC to a switch port:
show mac address-table | include 0050.56cd.3344
50 0050.56cd.3344 DYNAMIC Gi0/12
How to Fix It
Identify the device on Gi0/12 via your CMDB or LLDP neighbors, then reassign it to a non-conflicting IP address. After the change, clear the ARP cache on the upstream switch and verify resolution:
clear arp-cache interface Vlan50
clear ip arp 10.10.50.1
On the BIG-IP, gratuitously announce the correct ownership of the self-IP:
arping -A -I external -c 3 10.10.50.1
Confirm only a single MAC now responds:
arping -I external 10.10.50.1 -c 5
ARPING 10.10.50.1 from 10.10.50.1 external
Unicast reply from 10.10.50.1 [00:50:56:AB:11:22] 0.987ms
Unicast reply from 10.10.50.1 [00:50:56:AB:11:22] 0.943ms
Sent 5 probes (5 broadcast(s))
Received 5 response(s)
Root Cause 6: iRule Dropping or Rejecting Traffic
Why It Happens
iRules are Tcl-based scripts that execute inline during traffic processing. An iRule containing an overly broad drop or reject statement, a logic error in a conditional branch, or an access control list that inadvertently matches legitimate source IPs will silently discard all matching connections. Engineers often attach iRules for logging or geo-blocking purposes without fully testing every branch. A single missing
elseclause can result in all traffic outside the explicitly permitted condition being dropped.
How to Identify It
tmsh list ltm virtual vs_solvethenetwork_443 rules
ltm virtual vs_solvethenetwork_443 {
rules {
irule_geo_block
}
}
tmsh list ltm rule irule_geo_block
ltm rule irule_geo_block {
when CLIENT_ACCEPTED {
if { [IP::addr [IP::client_addr] equals 172.16.5.0/24] } {
log local0. "Dropping client [IP::client_addr]"
drop
}
}
}
The subnet 172.16.5.0/24 is the primary client network — every connection is being dropped. Check LTM logs for the iRule drop messages:
grep "irule_geo_block" /var/log/ltm | tail -20
Apr 6 09:01:12 sw-infrarunbook-01 notice tmm: Rule irule_geo_block: Dropping client 172.16.5.44
Apr 6 09:01:13 sw-infrarunbook-01 notice tmm: Rule irule_geo_block: Dropping client 172.16.5.45
How to Fix It
Detach the iRule while you fix the logic:
tmsh modify ltm virtual vs_solvethenetwork_443 rules none
tmsh save sys config
Correct the iRule to invert the logic (block everything except the allowed range, or block only the genuinely unwanted ranges), then re-attach:
tmsh modify ltm virtual vs_solvethenetwork_443 rules { irule_geo_block }
tmsh save sys config
Root Cause 7: Connection Limit Exceeded
Why It Happens
BIG-IP allows administrators to cap the maximum concurrent connections on a virtual server, a pool, or individual pool members. When this ceiling is reached, new connections are refused even though the virtual server shows as available and all pool members are healthy. Connection limits are commonly set conservatively during initial deployment and never revisited as traffic grows organically. A sudden traffic spike — a marketing campaign, a batch job, or an application connection leak — can exhaust the limit in seconds.
How to Identify It
tmsh show ltm virtual vs_solvethenetwork_443 | grep -iE "conn|limit"
Current Connections 10000
Maximum Connections 10000
Total Connections 14823912
Connection Limit 10000
When Current Connections equals Connection Limit, the VIP is saturated. Also check whether individual pool members have their own limits:
tmsh show ltm pool pool_solvethenetwork_443 members detail | grep -E "Current|Limit|Conn"
How to Fix It
Raise or remove the limit on the virtual server (0 means unlimited):
tmsh modify ltm virtual vs_solvethenetwork_443 connection-limit 0
tmsh save sys config
Investigate the source of excess connections — whether a legitimate traffic spike, an application-level connection leak, or a DDoS — before permanently removing the ceiling. If the application is leaking connections, identifying and patching the leak is the correct long-term fix.
Root Cause 8: SNAT Misconfiguration Causing Asymmetric Routing
Why It Happens
If a virtual server uses SNAT automap or a SNAT pool, the F5 rewrites the source IP of the client connection before forwarding to the pool member. If the translated source address is not routable back to the BIG-IP — because the SNAT pool IP is not on a directly connected segment, or the backend server's default gateway points elsewhere — the backend's response packets will be delivered to the wrong next hop. The client's TCP handshake will never complete because the SYN-ACK never returns to the BIG-IP.
How to Identify It
tmsh list ltm virtual vs_solvethenetwork_443 source-address-translation
ltm virtual vs_solvethenetwork_443 {
source-address-translation {
pool snatpool_solvethenetwork_external
type snat
}
}
tmsh list ltm snatpool snatpool_solvethenetwork_external
ltm snatpool snatpool_solvethenetwork_external {
members {
10.10.30.50
}
}
If 10.10.30.50 is not reachable from the server VLAN (10.10.20.0/24), return traffic will be mis-routed. Verify reachability from a pool member:
ping -c 4 10.10.30.50 # run from pool member 10.10.20.10
How to Fix It
Update the SNAT pool to use a self-IP address that resides on the same VLAN as the pool members, or switch to automap if the internal self-IP is already on that VLAN:
tmsh modify ltm virtual vs_solvethenetwork_443 source-address-translation { type automap }
tmsh save sys config
Prevention
Preventing virtual server outages on F5 BIG-IP requires a combination of operational discipline, monitoring depth, and change control hygiene.
- State verification after maintenance: After every maintenance window, run
tmsh show ltm virtual
and confirm all virtual servers show enabled and available before closing the change ticket. Automate this check in your post-change validation runbook. - Certificate lifecycle management: Alert at least 30 days before any SSL certificate expires. Use the BIG-IP's built-in expiry reporting (
tmsh list sys file ssl-cert | grep expiration
) and feed it into your monitoring platform. Automate renewal with ACME or internal PKI tooling wherever possible. - IPAM enforcement for Self-IPs: Register all F5 self-IPs, floating IPs, and VIP addresses in your IPAM system with a permanent reservation marked as infrastructure — do not reassign. Require IPAM allocation approval before any new server receives an IP on a shared subnet.
- Health monitor tuning: Align monitor intervals and timeouts with measured application response times under load. Use application-layer monitors (HTTP/HTTPS with a real health endpoint) rather than ICMP. Set timeout to at least 3x the interval.
- iRule peer review: Require peer review for every iRule change. Stage all iRule modifications on a non-production virtual server first. Use
tmsh show ltm rule
statistics after deployment to confirm events fire as expected. - Routing validation after failover: After every planned or unplanned HA failover event, immediately run a synthetic transaction from the client network and verify the traceroute path lands on the correct BIG-IP unit.
- Connection limit headroom: Review connection limits quarterly against observed peak counts. Maintain at least a 50% headroom buffer above the historical peak. Set alerts at 80% utilization so you act before the limit is hit.
- Centralized log alerting: Stream
/var/log/ltm
to a SIEM. Create alerts for: pool member down, ssl handshake failure, connection limit, virtual server disabled. Early detection cuts mean time to repair dramatically.
Frequently Asked Questions
Q: How do I check the health of all virtual servers on an F5 BIG-IP at once?
A: Run
tmsh show ltm virtualwithout specifying a name. This outputs every virtual server with its availability state, enabled state, and current connection count. Filter for offline objects with
tmsh show ltm virtual | grep -A4 "offline".
Q: What is the difference between a virtual server being "disabled" versus "unavailable"?
A: Disabled means an administrator has administratively turned off the object — the data plane ignores it completely regardless of pool health. Unavailable means the virtual server is enabled but has no healthy pool members to send traffic to. The remediation steps for each are entirely different.
Q: How do I test connectivity to the VIP directly from the BIG-IP without involving external clients?
A: Source the test from the BIG-IP's external self-IP to simulate an inbound client:
curl -kv --interface 10.10.50.1 https://10.10.50.100/. For packet-level inspection run
tmsh run util tcpdump -i external host 10.10.50.100 and tcp port 443.
Q: Can a virtual server show as "available" while all pool members are down?
A: Yes. The virtual server availability indicator reflects only the VS-level state. If the virtual server has no default pool configured and relies entirely on an iRule for traffic forwarding, it can appear green while pool members are all offline. Always check pool and member state independently using
tmsh show ltm pool.
Q: What logs should I check first when an F5 virtual server stops responding?
A: Start with
/var/log/ltm— this captures pool member state changes, monitor failures, SSL errors, and iRule exceptions. For system-level problems check
/var/log/messages. If APM (Access Policy Manager) is in use, also review
/var/log/apm. Use
tail -f /var/log/ltmand trigger a test connection to observe real-time events.
Q: How do I tell whether traffic is actually arriving at the F5 for the VIP?
A: Check the virtual server statistics counters:
tmsh show ltm virtual vs_solvethenetwork_443 stats. Watch whether the Client Bits In and Total Connections counters increment while clients report failures. If counters are static, traffic is not reaching the F5 at all — investigate routing, firewall, or ARP upstream. If counters are incrementing but connections are not completing, the problem is on the BIG-IP or backend.
Q: What is a floating self-IP and why does it matter during failover?
A: A floating self-IP is an address shared by both units in an HA pair that always lives on the active unit. Pool members must use the floating self-IP as their default gateway — not the unit-specific self-IP. If pool members point to a non-floating self-IP, after a failover their return traffic routes to the standby unit, causing asymmetric routing and dropped connections.
Q: How do I force a pool member online without restarting the application?
A: Use
tmsh modify ltm pool pool_solvethenetwork_443 members modify { 10.10.20.10:8443 { state user-up } }. This overrides the monitor state temporarily. The health monitor will still run and will mark the member down again if the application continues to fail — use this only as a short-term measure while investigating the root cause.
Q: Can a hardware or licensing issue cause a virtual server to stop responding?
A: Yes. An expired BIG-IP license can disable traffic processing features. Check license status with
tmsh show sys license. Also run
tmsh show sys performance all-statsto confirm the system is not CPU or memory constrained — a saturated TMM will cause packet drops even when configuration is correct.
Q: Should I reboot the BIG-IP to resolve a virtual server outage?
A: Almost never. A reboot triggers failover in an HA pair, disrupts all active connections, and does not fix configuration, routing, or certificate problems. Work through the diagnostic sequence — VIP state, pool health, routing, profiles, Self-IP conflict — before considering any service restart. A targeted config reload (
tmsh load sys config) is far safer than a full reboot if you suspect a configuration drift issue.
Q: How do I confirm a Self-IP conflict has been fully resolved?
A: After reassigning the conflicting device and clearing the switch ARP cache, run
arping -I external 10.10.50.1 -c 10from the BIG-IP. All replies must come from a single MAC address — the F5 interface MAC. Also verify traffic statistics on the virtual server begin incrementing normally by watching
tmsh show ltm virtual vs_solvethenetwork_443 statsin real time.
Q: What is the safest way to test an iRule change before applying it to a production virtual server?
A: Create a staging virtual server on a non-production VIP address (e.g., 10.10.50.199) with identical pool and profile configuration. Attach the new iRule to the staging VS only. Send synthetic test traffic and verify all expected paths through the iRule logic produce the correct outcome. Only after validation, apply the iRule to production and monitor LTM logs closely for the first 15 minutes.
