Symptoms
You've configured a persistence profile on your F5 BIG-IP and traffic still bounces between pool members on every request. Users are getting logged out mid-session. Shopping carts empty themselves. Application state vanishes. You check the virtual server configuration and the profile is there — attached, named correctly, everything looks fine on paper. Yet the load balancer keeps distributing requests to different nodes like persistence doesn't exist.
Other symptoms I've seen on tickets that turn out to be persistence failures:
- Session-based applications throwing "invalid session" or "session expired" errors even though the user just authenticated
- Intermittent 500 errors from the application tier when session state lives on a specific backend node
- Cookie persistence configured but no BIG-IP cookie visible in browser developer tools
tmsh show ltm persistence persist-records
returns empty or far fewer entries than active connections- Source IP persistence working for some clients but not others, typically NAT'd office users
- SSL persistence configured but requests still distributing round-robin under load
The tricky part is that persistence failures are silent. BIG-IP doesn't generate a syslog alert saying "persistence didn't match." You have to dig. Let's go through the causes I hit most often.
Cause 1: Cookie Not Being Inserted
Why It Happens
Cookie persistence on BIG-IP works by injecting a
Set-Cookieheader into the HTTP response. The cookie (default name
BIGipServer<pool-name>) encodes the pool member IP and port. On subsequent requests, BIG-IP reads that cookie and routes to the same member. The whole mechanism breaks down if the cookie never makes it into the response.
The most common reason is the virtual server handling HTTPS without an HTTP profile attached — BIG-IP can't parse or modify the HTTP layer without it. Another frequent culprit is the persistence profile being set only as the fallback, not the primary profile. In my experience, I've also seen this fail when the profile method is set to
passiveinstead of
insert, which means BIG-IP expects the application to set its own cookie rather than injecting one. An iRule calling
persist nonesomewhere will also silently kill cookie insertion.
How to Identify It
First, confirm the cookie persistence profile is actually the primary profile on the virtual server:
tmsh list ltm virtual vs_solvethenetwork_443 persist
Expected output when correctly configured:
ltm virtual vs_solvethenetwork_443 {
persist {
cookie_persist {
default yes
}
}
}
If you see only a fallback entry or the block is empty, that's your problem right there. Next, verify the HTTP profile is attached — cookie insertion requires BIG-IP to parse and rewrite HTTP:
tmsh list ltm virtual vs_solvethenetwork_443 profiles
No
httpprofile in that output means no cookie insertion, full stop. Now check the persistence profile configuration itself:
tmsh list ltm persistence cookie cookie_persist
ltm persistence cookie cookie_persist {
app-service none
cookie-name BIGipServerpool_web
expiration 0
method insert
override disabled
}
Confirm
methodis
insertand not
passiveor
rewrite. Finally, do a packet capture on the server-side VLAN and grep for the Set-Cookie header:
tcpdump -nni 0.0:nnn -s0 -A 'tcp port 80 and host 10.10.20.15' | grep -i set-cookie
If you see application cookies but no
BIGipServercookie, BIG-IP isn't inserting.
How to Fix It
Attach the HTTP profile if it's missing:
tmsh modify ltm virtual vs_solvethenetwork_443 profiles add { http { } }
If the method is wrong, correct the profile:
tmsh modify ltm persistence cookie cookie_persist method insert
If the profile wasn't set as primary, fix that:
tmsh modify ltm virtual vs_solvethenetwork_443 persist replace-all-with { cookie_persist { default yes } }
Cause 2: Source IP Persistence Timeout Too Short
Why It Happens
Source IP persistence maps a client IP address to a specific pool member and holds that mapping alive for a configured timeout. When the timeout expires, BIG-IP removes the record. The next connection from that client gets load balanced fresh. If your timeout is shorter than the gap between a user's requests, persistence breaks in a way that looks completely random.
The default timeout in the factory
source_addrprofile is 180 seconds. That sounds reasonable until you consider a user who authenticates, fills out a lengthy form, and submits five minutes later. Their persistence record expired at the three-minute mark. Their POST hits a different backend that has no session context. The application throws an error and the user blames the network team.
There's a related scenario worth diagnosing at the same time: large enterprise NAT pools where many users share a single public IP. Source IP persistence becomes useless in those environments because all those users get mapped to the same pool member, which gets hammered while others sit idle. That's not a timeout issue but the fix is the same — switch to a more granular persistence method.
How to Identify It
Check the timeout configured on the persistence profile:
tmsh list ltm persistence source-addr source_addr_persist
ltm persistence source-addr source_addr_persist {
app-service none
mask 255.255.255.255
timeout 180
}
Now check the active persistence records and observe whether they're aging out during user sessions:
tmsh show ltm persistence persist-records all-properties
Sys::Persist Records
Source Addr: 203.0.113.44 Dest: 10.10.20.15:80 Pool: pool_web Node: 10.10.20.20:8080 Age: 162
Source Addr: 203.0.113.71 Dest: 10.10.20.15:80 Pool: pool_web Node: 10.10.20.21:8080 Age: 178
If you're seeing records with ages near the timeout disappearing and correlating that timing with session errors, the timeout is the cause. You can also confirm by checking the persist record hit counter on the virtual server — a healthy persistence setup should show a high ratio of existing records being matched versus new records being created:
tmsh show ltm virtual vs_solvethenetwork_80 stats | grep -i persist
How to Fix It
Increase the timeout to match your application's session lifetime and add generous headroom:
tmsh modify ltm persistence source-addr source_addr_persist timeout 3600
For applications with very long idle sessions you can set timeout to
0for indefinite persistence, but use that carefully on high-traffic systems since it can exhaust the persistence table. If large NATs are contributing to the problem, switch to cookie persistence so individual users are tracked regardless of shared source IPs.
Cause 3: Universal Persistence Expression Wrong
Why It Happens
Universal persistence lets you persist on arbitrary data extracted from the request — a session token in a URL parameter, a custom header, a value buried in a JSON body. You write a TCL expression that BIG-IP evaluates per-request, and the result becomes the persistence key. When that expression is wrong, malformed, or returns an empty string for certain requests, those requests fall through to load balancing with no persistence applied at all.
I see this fail in a few ways. The expression is syntactically valid but extracts the wrong field name — a case sensitivity mismatch on a cookie name, for instance. It works on GET requests but fails on POSTs because the body isn't being buffered. It hits requests where the expected field simply doesn't exist yet (like the initial unauthenticated request before a token is issued), and no fallback persistence profile is configured. Or — and this one is subtle — the expression returns different values for the same logical session because the data it parses varies between requests.
How to Identify It
Look at the universal persistence profile to see which iRule it references:
tmsh list ltm persistence universal universal_persist
ltm persistence universal universal_persist {
app-service none
rule /Common/persist_rule
timeout 300
}
tmsh list ltm rule /Common/persist_rule
ltm rule /Common/persist_rule {
when HTTP_REQUEST {
set persist_key [HTTP::cookie "SESSIONID"]
if { $persist_key ne "" } {
persist uie $persist_key
}
}
}
Now check whether universal persistence records are actually being created during active traffic:
tmsh show ltm persistence persist-records all-properties | grep uie
If that returns nothing during active user sessions, the expression is returning empty strings. Enable temporary iRule logging to see exactly what's being extracted:
tmsh modify ltm rule /Common/persist_rule {
when HTTP_REQUEST {
set persist_key [HTTP::cookie "SESSIONID"]
log local0. "Persist key: '$persist_key' client: [IP::client_addr]"
if { $persist_key ne "" } {
persist uie $persist_key
}
}
}
Then tail the LTM log:
tail -f /var/log/ltm | grep "Persist key"
Apr 16 09:14:22 sw-infrarunbook-01 info tmm[21834]: Rule /Common/persist_rule: Persist key: '' client: 10.10.10.55
Apr 16 09:14:25 sw-infrarunbook-01 info tmm[21834]: Rule /Common/persist_rule: Persist key: 'abc123def456' client: 10.10.10.55
An empty string on the very first request is expected — the cookie doesn't exist until after login. But if you're seeing empty strings on requests from an already-authenticated session, the cookie name in the iRule doesn't match what the client is actually sending.
How to Fix It
Do a packet capture to verify the exact cookie or header name in client requests, then correct the expression. Add a fallback persistence profile to handle requests where the key isn't present yet:
tmsh modify ltm virtual vs_solvethenetwork_443 persist replace-all-with { universal_persist { default yes } } fallback-persistence source_addr_persist
If you need POST body content for the expression, add an HTTP buffering iRule to collect the full request before your persistence rule fires. And remove the debug logging rule once you've diagnosed the problem — logging every request at volume will noticeably impact TMM performance.
Cause 4: SSL Session ID Not Matching
Why It Happens
SSL persistence uses the TLS session ID to stick a client to a pool member. The idea is clean: each TLS session gets a unique ID, BIG-IP maps it to a member, and subsequent resumptions hit the same backend. In practice, this mechanism collapses fast in modern environments.
TLS 1.3 is the primary culprit. TLS 1.3 replaced session ID resumption with session tickets — opaque blobs the server sends the client. BIG-IP's SSL persistence was designed around TLS session IDs. With TLS 1.3 clients, the session ID field is either absent or carries no meaningful value for resumption matching, and SSL persistence simply doesn't fire. This isn't a BIG-IP bug, it's a protocol change that rendered the mechanism obsolete for modern clients.
Even with TLS 1.2, there are problems. Browsers rotate sessions aggressively. Server-side session IDs expire after around 300 seconds by default. Clients using connection pooling (like HTTP/1.1 keep-alive or HTTP/2) may hold a TCP connection open across what BIG-IP sees as separate TLS sessions, creating mismatches. And if the client-SSL profile has session resumption disabled, there's nothing to persist on at all.
How to Identify It
Check the SSL persistence profile configuration and the associated client-SSL profile together:
tmsh list ltm persistence ssl ssl_persist
ltm persistence ssl ssl_persist {
app-service none
timeout 300
}
tmsh list ltm profile client-ssl clientssl_solvethenetwork | grep -E "options|session-ticket|tls"
options { dont-insert-empty-fragments }
session-ticket disabled
If
no-tlsv1.3is absent from the options list and session tickets are disabled, TLS 1.3 clients won't get SSL persistence and TLS 1.2 clients won't have session resumption to persist on. Capture traffic and confirm the negotiated TLS version:
tcpdump -nni 0.0:nnn -s0 -w /var/tmp/tls_cap.pcap 'tcp port 443 and host 10.10.20.15'
Open that pcap in Wireshark and filter on
tls.handshake.type == 2to inspect Server Hello messages — the negotiated version will be in the record. Also confirm whether SSL-type persist records are being created at all:
tmsh show ltm persistence persist-records all-properties | grep -i ssl
Zero SSL records during active HTTPS traffic is definitive — SSL persistence is not functioning.
How to Fix It
Honestly, SSL persistence is a legacy mechanism and for any new deployment I'd recommend against it. Cookie persistence is more reliable, easier to troubleshoot, and works regardless of TLS version. That said, if you must keep SSL persistence operational, enable session tickets on the client-SSL profile:
tmsh modify ltm profile client-ssl clientssl_solvethenetwork session-ticket enabled
For TLS 1.3 clients that need persistence, the only real fix is switching to cookie persistence. Add a fallback in the meantime so TLS 1.3 clients aren't completely unprotected:
tmsh modify ltm virtual vs_solvethenetwork_443 fallback-persistence source_addr_persist
Cause 5: OneConnect Profile Interfering
Why It Happens
OneConnect is a BIG-IP optimization feature that multiplexes multiple client-side HTTP connections over a smaller pool of server-side connections. Instead of opening a new server-side TCP connection for each client request, BIG-IP reuses idle connections from its internal pool. This is excellent for server connection scalability. It's a disaster for persistence when the two features aren't configured to cooperate.
Here's the conflict: persistence wants to send a specific client to a specific pool member on every request. OneConnect wants to hand the request to any available idle server-side connection, regardless of which member it leads to. When OneConnect is active with its default configuration, it may pick up a connection that goes to member A for a client that persistence has mapped to member B. The persistence record exists, the lookup happened correctly, BIG-IP selected the right member — and then OneConnect overrode that decision at the connection reuse layer.
In my experience, this is the most disorienting persistence failure you'll encounter. The persist records look correct in
tmsh show ltm persistence persist-records. The profile is attached, the configuration looks right, the records exist and show the correct pool member. But traffic still doesn't stick. You're not missing something obvious. OneConnect is actively working against you.
How to Identify It
Check whether a OneConnect profile is attached alongside the persistence profile:
tmsh list ltm virtual vs_solvethenetwork_80 profiles
ltm virtual vs_solvethenetwork_80 {
profiles {
http { }
oneconnect { }
tcp { }
cookie_persist { }
}
}
If OneConnect is listed there alongside a persistence profile, check its
source-masksetting — that's the key parameter:
tmsh list ltm profile oneconnect oneconnect
ltm profile oneconnect oneconnect {
app-service none
max-age 86400
max-reuse 1000
max-size 1000
source-mask 0.0.0.0
}
A
source-maskof
0.0.0.0means OneConnect treats all client source IPs as interchangeable when selecting a server-side connection to reuse. It has no awareness of persistence mappings. To confirm the interference, watch live connection distribution across pool members while sending requests from a single client IP:
tmsh show ltm pool pool_web members stats | grep -E "addr|serverside.cur"
If connections are spreading across members despite active persistence records pointing to one specific member, OneConnect is the cause.
How to Fix It
There are two approaches and which one you choose depends on whether you want to keep OneConnect's performance benefits. The first option is setting the
source-maskto
255.255.255.255so OneConnect only reuses server-side connections that match the exact client source IP:
tmsh modify ltm profile oneconnect oneconnect source-mask 255.255.255.255
This scopes connection reuse per-client. A client mapped by persistence to member A will only pick up idle connections going to member A. OneConnect still works, just within persistence boundaries.
The second approach — which I prefer for session-critical applications — is to remove OneConnect from the virtual server entirely:
tmsh modify ltm virtual vs_solvethenetwork_80 profiles delete { oneconnect }
Removing OneConnect will increase server-side connection counts since BIG-IP now opens a fresh connection per client request instead of reusing. Monitor your pool member connection counts after this change and tune server-side TCP keepalive settings or connection limits if needed. The persistence reliability is worth the trade-off for most session-sensitive applications.
Cause 6: Persistence Profile on the Wrong Virtual Server
Why It Happens
This one sounds embarrassingly simple but I see it regularly in environments where virtual servers were cloned from templates or built by different engineers over time. The persistence profile exists and is correctly configured — it's just attached to a virtual server that isn't handling the traffic. A common variant: a redirect VS on port 80 has the persistence profile instead of the actual SSL VS on port 443. Another: a wildcard VS catches traffic before the specific VS that has persistence configured.
How to Identify It
List all virtual servers and their persistence configurations in one shot:
tmsh list ltm virtual | grep -E "ltm virtual|persist"
Then confirm which VS is actually processing traffic by checking connection counts:
tmsh show ltm virtual stats | grep -E "ltm virtual|clientside.cur"
The VS with active
clientside.curconnections is the one handling your traffic. Verify it has the persistence profile. If the traffic-handling VS is missing it, that's your fix.
How to Fix It
tmsh modify ltm virtual vs_solvethenetwork_443 persist replace-all-with { cookie_persist { default yes } }
Cause 7: iRule Overriding the Persistence Decision
Why It Happens
iRules execute in the request processing pipeline and can directly override load balancing decisions. An iRule calling
poolor
nodedirectly bypasses the persistence lookup result. A call to
persist noneclears it explicitly. If someone added a routing iRule to the virtual server without accounting for the persistence profile, the iRule wins every time.
How to Identify It
tmsh list ltm virtual vs_solvethenetwork_443 rules
If rules are attached, inspect each one for direct pool or node selection calls:
tmsh list ltm rule /Common/routing_rule
ltm rule /Common/routing_rule {
when HTTP_REQUEST {
if { [HTTP::uri] starts_with "/api" } {
pool pool_api
}
}
}
The
pool pool_apicall here bypasses persistence for every /api URI. The persistence record exists and is correct — it's just being ignored.
How to Fix It
Add an explicit persist call inside the iRule alongside the pool selection so persistence is honored within the iRule-driven flow:
when HTTP_REQUEST {
if { [HTTP::uri] starts_with "/api" } {
pool pool_api
persist source_addr
}
}
Alternatively, restructure routing so iRule logic complements rather than replaces the persistence profile decision.
Prevention
Most persistence failures are configuration drift problems — something worked, then a profile was changed, a feature was added, or a virtual server was cloned without checking all its dependencies. The fix is building a short verification sequence into your change process and running it after any modification to a virtual server or its associated profiles.
After changes, run this quick sanity check:
# Verify persistence profile is attached as primary
tmsh list ltm virtual vs_solvethenetwork_443 persist
# Verify HTTP profile is present (required for cookie persistence)
tmsh list ltm virtual vs_solvethenetwork_443 profiles | grep http
# Check OneConnect source-mask if OneConnect is in use
tmsh list ltm profile oneconnect | grep source-mask
# Confirm persist records appear within 60 seconds of test traffic
tmsh show ltm persistence persist-records all-properties
For cookie persistence specifically, add a synthetic check in your monitoring stack that validates the presence of the
BIGipServercookie in HTTP responses. A missing cookie is an early warning you can catch before users start filing tickets about lost sessions.
Keep SSL persistence off new deployments unless you have a documented reason for it. Cookie persistence handles modern TLS correctly, is straightforward to troubleshoot, and doesn't depend on TLS session resumption behavior that clients control. When you're building universal persistence expressions, always configure a fallback persistence profile — the first unauthenticated request won't have a session key yet and you don't want that falling through to random load balancing.
Document your persistence design in the virtual server description field. It takes ten seconds and saves the next engineer from having to reverse-engineer why OneConnect has a /32 source mask or why cookie persistence is paired with a fallback source-addr profile. A one-line comment in the description is worth more than a half-hour of
tmsh listarchaeology six months from now.
