InfraRunBook
    Back to articles

    F5 High CPU on BIG-IP

    F5
    Published: Apr 20, 2026
    Updated: Apr 20, 2026

    A hands-on troubleshooting guide for diagnosing high CPU on F5 BIG-IP, covering iRule overhead, SSL offload saturation, connection table bloat, memory pressure, and disabled hardware forwarding — with real CLI commands throughout.

    F5 High CPU on BIG-IP

    Symptoms

    You log into your BIG-IP and the CPU graph looks wrong. Maybe a monitoring alert woke you up at 2 AM, maybe an application team is calling about timeouts — whatever brought you here, the first thing you see is TMM CPU pegged well above comfortable levels. The box is still forwarding traffic, but it is struggling.

    Typical indicators of high CPU on a BIG-IP include TMM utilization consistently above 70–80%, connection establishment times climbing, health monitors beginning to fail and marking pool members down, and the management plane becoming sluggish. TMSH commands that normally execute instantly start hanging. The GUI spins. Syslog fills with

    TMM CPU threshold exceeded
    messages. In the worst cases, you start seeing virtual servers go red as the BIG-IP can no longer respond to its own health probes.

    Start your investigation here:

    [infrarunbook-admin@bigip-01:Active:Standalone] ~ # top -b -n 1 | grep -E "(tmm|Cpu)"

    And for a proper TMM-level view:

    [infrarunbook-admin@bigip-01:Active:Standalone] ~ # tmsh show sys tmm-info
    
    Sys::TMM Information
    ---------------------------------------------------------------------------
    TMM   CPU   Memory     Connections   Conn-Rate   PVA-Client  PVA-Server
    ---------------------------------------------------------------------------
    0     89%   1.2G/2.0G  42311         12400/s     0           0
    1     91%   1.1G/2.0G  41987         11800/s     0           0
    2     34%   0.8G/2.0G  18203         5200/s      0           0
    3     32%   0.8G/2.0G  17901         5100/s      0           0

    Notice two things in this output. TMM 0 and TMM 1 are saturated while TMM 2 and 3 are relatively idle — that uneven distribution is a clue about flow hashing, which I will cover later. Also notice that PVA-Client and PVA-Server are both zero across every thread. That means no flows are being hardware-offloaded, which is a significant problem on its own. Let us work through the most common root causes one by one.


    Root Cause 1: iRule Processing Overhead

    iRules are powerful. They are also one of the easiest ways to accidentally saturate your BIG-IP CPU. Every iRule event that fires runs in the TMM context — inline, blocking, consuming CPU cycles right there in the fast path. If your iRule is doing heavy string manipulation, calling external data groups on every request, using

    HTTP::payload
    to inspect request bodies, or running nested conditionals across thousands of transactions per second, it adds up fast.

    In my experience, the worst offenders are iRules that were written years ago to solve a specific problem and then got copy-pasted across dozens of virtual servers. Nobody went back to profile them. A rule that costs 2 microseconds at 100 requests per second becomes a CPU hog at 200,000 requests per second. The math is unforgiving.

    To identify which iRules are contributing to CPU load, use the built-in statistics collection. Review them across all rules at once:

    [infrarunbook-admin@bigip-01:Active:Standalone] ~ # tmsh show ltm rule all-stats
    
    Ltm::Rule Event: /Common/inspect_uri:HTTP_REQUEST
      Priority           500
      Executions         14823441
      Failures           0
      Aborts             0
      CPU cycles (min)   212
      CPU cycles (mean)  8843
      CPU cycles (max)   482001
      CPU cycles (total) 131204879363
    
    Ltm::Rule Event: /Common/legacy_header_insert:HTTP_REQUEST
      Priority           500
      Executions         14820193
      Failures           0
      Aborts             0
      CPU cycles (min)   88
      CPU cycles (mean)  142
      CPU cycles (max)   1893
      CPU cycles (total) 2104467426

    Compare those two rules. The

    legacy_header_insert
    rule executes in a mean of 142 cycles — fast and cheap. The
    inspect_uri
    rule has a mean of 8,843 cycles and a max of 482,001. That maximum is a red flag; it means at least one execution caused a severe stall. With nearly 15 million executions counted, this rule is consuming enormous CPU. That is where you start.

    Common optimizations: move static lookup data into data groups (hash lookup instead of linear string matching), replace regex patterns with

    [string match]
    prefix checks where possible since string operations are significantly cheaper than regex compilation and execution, avoid
    HTTP::payload
    unless you genuinely need to buffer and inspect the request body, and ensure your rule events fire at the correct context — using
    HTTP_REQUEST
    rather than
    CLIENT_ACCEPTED
    so the rule only activates when HTTP data is actually present.

    Also check what is actually attached to your high-traffic virtual servers:

    [infrarunbook-admin@bigip-01:Active:Standalone] ~ # tmsh list ltm virtual /Common/vs_web_443 rules
    
    ltm virtual /Common/vs_web_443 {
        rules {
            /Common/inspect_uri
            /Common/legacy_header_insert
            /Common/old_redirect_rule
        }
    }

    Three rules on one virtual server, all firing on every HTTP request. Profile each one. If a rule is no longer serving its original purpose — or if it was a one-time fix for something that has since been resolved elsewhere — remove it. Dead iRules attached to production virtual servers are a remarkably common finding during performance audits, and the fix is literally a single

    tmsh modify
    command away.


    Root Cause 2: SSL Offload Overloading

    SSL termination is one of the primary value propositions of putting a BIG-IP in the path, but it comes with a real CPU cost. RSA key exchanges are computationally expensive. TLS 1.3 with ECDHE is cheaper than TLS 1.2 with RSA 4096, but full handshakes still cost meaningful cycles regardless of version. When your SSL TPS climbs faster than your hardware acceleration can absorb, the overflow lands directly on the TMM software CPU.

    I have seen this happen after a routine certificate renewal where someone switched from a 2048-bit RSA cert to a 4096-bit RSA cert without understanding the CPU implication — within an hour the device was struggling under what looked like normal traffic. I have also seen it happen when session resumption was accidentally disabled on the client SSL profile, forcing a full handshake on every single connection instead of resuming from the session cache.

    Check your SSL profile statistics to understand the current state:

    [infrarunbook-admin@bigip-01:Active:Standalone] ~ # tmsh show ltm profile client-ssl /Common/clientssl_web
    
    Ltm::Client SSL Profile: /Common/clientssl_web
      Handshake Failures               142
      Renegotiations                   0
      Session Cache Current Entries    24871
      Session Cache Hits               18432
      Session Cache Lookups            42301
      Session Cache Overflows          3891
      Connections (TLS 1.2)            287441
      Connections (TLS 1.3)            94321
      Current Connections              1823
      Total Connections                381904
      Avg TPS                          4831

    That session cache overflow count of 3,891 is worth addressing immediately. Cache overflows mean those clients are falling back to full handshakes, each one costing substantially more CPU than a resumed session. Increase the cache size and extend the cache timeout if clients are connecting more frequently than the current timeout allows:

    [infrarunbook-admin@bigip-01:Active:Standalone] ~ # tmsh modify ltm profile client-ssl /Common/clientssl_web cache-size 65536 cache-timeout 3600
    [infrarunbook-admin@bigip-01:Active:Standalone] ~ # tmsh save sys config

    Also verify whether hardware SSL acceleration is functional on your platform. On BIG-IP hardware with dedicated crypto ASICs (the i4000, i5000, and i7000 series all have hardware crypto), a non-functional crypto module means every SSL operation is handled in software:

    [infrarunbook-admin@bigip-01:Active:Standalone] ~ # tmsh show net hardware field-fmt | grep -A5 crypto
    
        crypto {
            ssl-hw {
                status enabled
            }
        }

    If

    ssl-hw
    shows disabled or the section is absent entirely, all SSL work is hitting the TMM CPU in software. On BIG-IP Virtual Edition deployments, software SSL is always the case — the only levers available are right-sizing the vCPU allocation for your expected SSL TPS and aggressive session caching.

    Review your cipher suite configuration as well. RSA 4096 key exchanges are roughly 4–8 times more expensive than RSA 2048. If security policy permits, moving to ECDSA certificates with P-256 curves gives equivalent or better security at a dramatically lower CPU cost per handshake. This is a meaningful change on devices handling thousands of new TLS connections per second.


    Root Cause 3: Too Many Concurrent Connections

    The connection table on a BIG-IP is not free. Every established connection — client-side and server-side — consumes memory and requires periodic processing for timer management, keepalives, and state tracking. When concurrent connections climb into the millions, the sheer overhead of managing that state starts consuming measurable CPU cycles, separate from and in addition to the actual traffic forwarding work.

    This is distinct from connection rate. You can have a moderate new connection rate and still accumulate millions of concurrent connections if timeouts are too permissive, if clients are abandoning sessions without proper TCP teardown, or if the application tier is holding connections open far longer than typical. I have seen BIG-IP devices with normal new-connection rates but 3–4 million concurrent connections simply because nobody had ever tuned the idle timeout away from the default.

    Check your current connection table state:

    [infrarunbook-admin@bigip-01:Active:Standalone] ~ # tmsh show sys connection count
    
    Sys::Connections
      Connections: 2847392
    
    [infrarunbook-admin@bigip-01:Active:Standalone] ~ # tmsh show ltm virtual /Common/vs_web_443 | grep -i connection
    
      Current Connections             847392
      Maximum Connections             1243019
      Total Connections               48293847

    If you have nearly a million connections on a single virtual server and cannot explain why, look at your TCP profile timeout settings:

    [infrarunbook-admin@bigip-01:Active:Standalone] ~ # tmsh list ltm profile tcp /Common/tcp-wan-optimized | grep -E "(idle|close|fin|time-wait)"
    
        close-wait-timeout 5
        fin-wait-2-timeout 300
        idle-timeout 300
        time-wait-recycle enabled
        time-wait-timeout 2000

    An

    idle-timeout
    of 300 seconds means an inactive connection holds a connection table slot for five full minutes. For most web application workloads, 60 seconds is more than sufficient. Tightening this value flushes stale connections much faster and reduces the size of the table the system is constantly iterating over.

    OneConnect is the other major lever. When enabled, BIG-IP multiplexes many client-side connections over a smaller pool of persistent server-side connections. A virtual server handling 50,000 simultaneous clients without OneConnect may have 50,000 client-side and 50,000 corresponding server-side connections in the table. With OneConnect enabled and 200 persistent server connections, you serve those same 50,000 clients while managing a fraction of the total state. The CPU reduction on HTTP workloads can be dramatic:

    [infrarunbook-admin@bigip-01:Active:Standalone] ~ # tmsh modify ltm virtual /Common/vs_web_443 profiles add { /Common/oneconnect { } }
    [infrarunbook-admin@bigip-01:Active:Standalone] ~ # tmsh save sys config

    Confirm that your backend pool members support HTTP/1.1 connection reuse and do not rely on connection-level session state before enabling this in production. Most modern application servers handle connection multiplexing correctly, but it is worth verifying before you flip the switch on a production virtual server.


    Root Cause 4: Memory Pressure

    Memory and CPU problems on a BIG-IP are tightly coupled in ways that are not always obvious. When the system runs low on available memory, TMM starts dropping connections to reclaim it. The kernel may begin swapping, which destroys I/O performance and indirectly drives CPU load higher as processes wait on swap I/O. In severe cases, the kernel starts killing processes to survive. If TMM itself gets killed and restarted, you get a brief traffic interruption followed by a CPU spike as every client that was connected tries to reconnect simultaneously.

    Memory pressure can be caused by connection table bloat (see the previous section), oversized buffer allocations in HTTP or TCP profiles, large iRule data groups loaded into TMM memory, or simply having provisioned too many BIG-IP modules for the available physical RAM on the chassis.

    Check memory state with both the BIG-IP native command and the underlying Linux view:

    [infrarunbook-admin@bigip-01:Active:Standalone] ~ # tmsh show sys memory
    
    Sys::Memory (bytes)
      TMM Memory Used     3.8G
      TMM Memory Total    4.0G
      Other Memory Used   2.1G
      Other Memory Total  4.0G
      Swap Used           842M
      Swap Total          1.0G
    [infrarunbook-admin@bigip-01:Active:Standalone] ~ # free -m
    
                  total        used        free      shared  buff/cache   available
    Mem:          16033       14982         312          48         739         803
    Swap:          1023         842         181

    TMM memory at 95% of its allocation and 842 MB of swap actively in use — this system is in trouble. Swap usage on a production BIG-IP should normally be zero. Seeing any swap activity is a warning; seeing hundreds of megabytes of swap in use means you are already past the warning stage and into damage-control territory.

    Short-term actions: tighten connection idle timeouts to flush stale state faster, check whether large data groups can be reduced, and verify that the

    maxrejectrate
    threshold has not been inadvertently set to a value that causes the system to hold more failed connection state than necessary.

    Longer-term: review which BIG-IP modules are provisioned. Every module allocated at

    nominal
    level takes a share of system memory, whether it is actively processing traffic or not:

    [infrarunbook-admin@bigip-01:Active:Standalone] ~ # tmsh list sys provision
    
    sys provision afm {
        level nominal
    }
    sys provision asm {
        level nominal
    }
    sys provision ltm {
        level nominal
    }
    sys provision apm {
        level nominal
    }

    If AFM, ASM, and APM are all provisioned at

    nominal
    but you are only actively using LTM and ASM, deprovision what you do not need. Changing provisioning requires a controlled reboot — schedule it in a maintenance window. But it is often the correct long-term answer when memory is consistently constrained and you have modules sitting idle consuming resources.


    Root Cause 5: Hardware Forwarding Disabled

    This one catches people off guard, especially engineers who come from a pure software networking background. BIG-IP hardware platforms include a Packet Velocity Accelerator — essentially an ASIC or FPGA capable of forwarding established flows entirely in hardware without involving TMM CPU at all. When a flow is offloaded to the PVA, it consumes essentially zero TMM CPU for the bulk of its lifetime. TMM only handles the initial connection setup and the final teardown. The PVA does everything in between.

    If PVA hardware forwarding is disabled — deliberately as a workaround for some other issue, accidentally through a configuration change, or implicitly because an attached profile is incompatible with hardware offload — every single packet of every established flow goes through TMM software processing. On a busy device carrying multiple gigabits of sustained traffic, that is the difference between 20% TMM CPU and 95% TMM CPU. It is that significant.

    First confirm whether the PVA hardware is present and enabled at the platform level:

    [infrarunbook-admin@bigip-01:Active:Standalone] ~ # tmsh show net hardware field-fmt | grep -A4 pva
    
        pva {
            status enabled
            version 9.4
        }

    Now check whether your specific virtual servers are configured to use it:

    [infrarunbook-admin@bigip-01:Active:Standalone] ~ # tmsh list ltm virtual /Common/vs_web_443 pva-acceleration
    
    ltm virtual /Common/vs_web_443 {
        pva-acceleration none
    }

    There it is. The PVA is present and enabled at the hardware level, but this virtual server is explicitly configured with

    pva-acceleration none
    . Every packet hits TMM in software. Confirm by checking the per-thread TMM info — the PVA counters should tell the whole story:

    [infrarunbook-admin@bigip-01:Active:Standalone] ~ # tmsh show sys tmm-info
    
    Sys::TMM Information
    ---------------------------------------------------------------------------
    TMM   CPU   Memory     Connections   PVA-Client  PVA-Server
    ---------------------------------------------------------------------------
    0     89%   1.2G/2.0G  42311         0           0
    1     91%   1.1G/2.0G  41987         0           0
    2     34%   0.8G/2.0G  18203         0           0
    3     32%   0.8G/2.0G  17901         0           0

    Zero PVA offloads across all threads on a device handling 100,000+ concurrent connections. Everything is in software. Re-enable hardware forwarding:

    [infrarunbook-admin@bigip-01:Active:Standalone] ~ # tmsh modify ltm virtual /Common/vs_web_443 pva-acceleration full
    [infrarunbook-admin@bigip-01:Active:Standalone] ~ # tmsh save sys config

    Before doing this in production, understand why it was disabled in the first place. Check your change history and ticket system. PVA is sometimes set to

    none
    as a workaround for a specific platform bug or a known incompatibility with an attached feature. If there is no documented reason, re-enabling it is generally safe — but verify in a maintenance window where you can watch for unintended side effects. After re-enabling, watch the
    PVA-Client
    and
    PVA-Server
    counters in
    tmsh show sys tmm-info
    ; you should see those numbers start climbing within minutes as the PVA begins offloading established flows, and you should see a corresponding drop in TMM CPU.

    A few things will always prevent PVA offload regardless of the

    pva-acceleration
    setting. SSL profiles prevent hardware offload because the PVA cannot perform TLS decryption. iRules that fire on per-packet events (like
    CLIENT_DATA
    or
    SERVER_DATA
    ) prevent offload because packet-level events require TMM involvement for each packet. Some APM and ASM inspection features similarly force software processing. For virtual servers where hardware offload is structurally unavailable, optimizing the other factors — iRule efficiency, SSL session caching, connection timeouts — becomes even more critical.


    Root Cause 6: Logging Overhead

    This one flies under the radar during performance audits. High-Speed Logging, request logging profiles, and ASM verbose logging can generate enormous log volumes. A request logging profile attached to a high-traffic virtual server that logs every URL, every request header, and every response code is formatting and transmitting a complete log record for every single transaction. At 50,000 requests per second, that is a non-trivial workload — both the string formatting inside TMM and the I/O path to the logging destination.

    Look for request logging profiles attached to virtual servers:

    [infrarunbook-admin@bigip-01:Active:Standalone] ~ # tmsh list ltm virtual /Common/vs_web_443 profiles
    
    ltm virtual /Common/vs_web_443 {
        profiles {
            /Common/http { }
            /Common/clientssl_web { context clientside }
            /Common/serverssl { context serverside }
            /Common/oneconnect { }
            /Common/request-log-verbose { }
        }
    }

    That

    request-log-verbose
    profile is a candidate. Check what it is logging, at what granularity, and where it is sending records. If the destination is a remote syslog over TCP, you also have the risk of blocking I/O behavior when the syslog buffer fills under load. For high-volume virtual servers, switch to UDP-based HSL which is non-blocking, reduce log verbosity to only what operations teams actually use, or implement request sampling — logging 1–5% of transactions at random still provides representative data for analysis without the full CPU overhead of logging every request.


    Root Cause 7: TMM Thread Imbalance

    Modern BIG-IP hardware runs one TMM thread per CPU core assigned to the TMM process. Traffic is distributed across those threads using Receive Side Scaling, hashing on a tuple of source IP, destination IP, source port, and destination port. Under normal conditions the load spreads roughly evenly. In practice, this breaks down when traffic lacks entropy in the hash inputs.

    The most common trigger is a large fraction of traffic originating from a NAT gateway behind which thousands of clients share a small pool of public IP addresses. With only a handful of distinct source IPs, the RSS hash has limited entropy to work with and consistently routes many of those flows to the same one or two TMM threads. You end up with threads 0 and 1 sitting at 90% CPU while threads 2 and 3 idle along at 30% — exactly the pattern in the opening output of this article.

    There is no universal single-command fix for RSS imbalance. If you control the upstream NAT, using a larger pool of source IPs provides more entropy for the hash algorithm. On some platforms you can tune the RSS hash key via

    db
    variables to try to improve distribution. In other cases this is fundamentally a capacity planning problem — the effective parallelism of your BIG-IP is lower than the core count because the traffic pattern defeats the distribution mechanism, and you need to account for that when sizing the device.


    Prevention

    Most high CPU incidents on BIG-IP are preventable. The recurring pattern is that someone makes a change — attaches a new iRule, renews a cert with a larger key size, disables PVA as a quick troubleshooting step and forgets to re-enable it, deploys a verbose logging profile to all virtual servers — and the CPU impact does not become visible until traffic peaks hours or days later. Nobody was watching the CPU trend during the window when the change was made.

    The single most effective prevention habit is proper baselining. Know what normal TMM CPU utilization looks like on your device across different hours of the day and days of the week. Use SNMP polling or streaming telemetry to track per-thread TMM CPU continuously over time. When CPU starts trending up, you want to catch it at 60% and have a conversation about it — not be woken up at 2 AM when it hits 95% during peak traffic.

    For iRules, establish a review gate before attaching new rules to production virtual servers. Use

    tmsh show ltm rule all-stats
    to profile rules in a staging environment first. Set a team standard that any rule with mean CPU cycles above a defined threshold requires an architecture review before production deployment. This takes ten minutes and has saved many teams from hard-to-diagnose performance regressions.

    For SSL, keep session cache sizing appropriate to your concurrent client count. Calculate the expected number of unique clients connecting within your cache timeout window and set

    cache-size
    accordingly. Monitor session cache hit rates — target above 80% for typical web workloads — and treat falling hit rates as an early warning indicator. Review cipher configurations at least annually; older cipher lists often include RSA-based key exchange algorithms that modern clients will negotiate preferentially and that are significantly more expensive than their ECDHE equivalents.

    Keep connection timeouts tuned to realistic values for your workload. Review your application's actual connection lifetime patterns and configure idle timeouts to reflect them. Enable OneConnect on HTTP virtual servers where the backend supports connection multiplexing — it is one of the highest-leverage configuration changes you can make for connection-heavy workloads, and the operational risk is low on properly functioning HTTP backends.

    On hardware platforms, monitor PVA offload rates as a first-class metric alongside CPU and memory. If you modify a profile, attach an iRule, or enable a security feature on a virtual server and the PVA client and server counters drop to zero, that change is now costing you in software processing overhead. Make the tradeoff consciously. And whenever you disable hardware forwarding as a troubleshooting step, note it in your ticket immediately and schedule re-evaluation during the next change window. Temporary workarounds have a way of becoming permanent configurations on busy teams.

    Finally, align module provisioning with actual usage. Deprovisioned modules you are not actively using consume memory that could otherwise serve live connections. Review provisioning annually as part of your lifecycle management process, and treat any change as a controlled reboot event that requires a maintenance window — not something to squeeze in between meetings.

    Frequently Asked Questions

    How do I check which TMM thread is causing high CPU on a BIG-IP?

    Run 'tmsh show sys tmm-info' to see per-thread CPU utilization, memory usage, and connection counts. This breaks down CPU usage by individual TMM thread, making it straightforward to identify whether load is evenly distributed or concentrated on one or two threads — which points toward either a global capacity issue or a flow distribution problem like RSS imbalance.

    Can iRules cause CPU spikes even at moderate traffic volumes?

    Yes. iRules that perform expensive operations — regex matching on request bodies, HTTP::payload collection, large linear-scan data group lookups — can saturate CPU at surprisingly low request rates. Use 'tmsh show ltm rule all-stats' to check mean and maximum CPU cycles per execution. A rule with a mean of 8,000+ cycles running at 50,000 executions per second is consuming hundreds of billions of CPU cycles per second.

    How do I confirm that PVA hardware forwarding is actually offloading flows?

    Check the PVA-Client and PVA-Server counters in 'tmsh show sys tmm-info'. On a healthy system with hardware offload working, these counters should show significant and growing flow counts during active traffic. If both read zero on a busy device, hardware forwarding is not active — either pva-acceleration is set to 'none' on the virtual server, or an attached profile such as an SSL or packet-level iRule is preventing offload.

    What TMM CPU percentage should prompt a BIG-IP investigation?

    Sustained TMM CPU above 70% on any single thread is worth investigating. Brief spikes to 80–90% during traffic bursts are less concerning than consistent utilization above 70% over multiple minutes. At 85%+ sustained, you are approaching the point where the system cannot absorb traffic spikes, which leads to connection drops, health monitor failures, and cascading pool member flapping.

    Why would PVA hardware forwarding be disabled on a BIG-IP virtual server?

    Several things can prevent PVA offload. It is sometimes explicitly disabled via 'pva-acceleration none' as a workaround for a platform-specific bug or interoperability issue. SSL profiles always prevent full PVA offload since the ASIC cannot execute TLS decryption. iRules that fire on per-packet events like CLIENT_DATA or SERVER_DATA require TMM involvement for each packet and prevent offload. Some APM and ASM inspection features have the same effect. Before re-enabling PVA on a virtual server where it was disabled, review your change history to understand the original reason.

    Related Articles