Cisco High CPU on Router Troubleshooting

Symptoms

You log into a router and something feels wrong immediately. Commands take three seconds to echo back. The console is sluggish. You run

show processes cpu

and see 94% for the past five minutes. Routing protocol neighbors start flapping. Your phone rings. This is high CPU on a Cisco router, and it's one of the more stressful situations in network operations because the box itself is struggling to help you investigate what's wrong with it.

Common symptoms you'll observe before and during the incident:

CLI response time severely degraded — commands taking 2–10 seconds to execute
SSH sessions dropping or refusing new connections entirely
Routing protocol neighbors going down: OSPF, EIGRP, or BGP sessions resetting
Syslog flooding with
%SYS-3-CPUHOG
or
%SCHED-3-STARVATION
messages
SNMP traps firing for CPU threshold violations
Increased latency and drops on transit traffic despite no interface errors
NTP clock drift, keepalives missed, HSRP state transitions

Your first move is always to snapshot CPU utilization before you start changing anything. Get the data while the symptom is active.

sw-infrarunbook-01# show processes cpu sorted
CPU utilization for five seconds: 94%/72%; one minute: 91%; five minutes: 88%

 PID Runtime(ms)  Invoked   uSecs    5Sec   1Min   5Min TTY Process
 169   148721564  4832910   30775  41.59%  38.20% 37.10%   0 IP Input
  62    52384721  9823641    5332  18.22%  15.40% 14.89%   0 OSPF Hello
  45    12938471  2341987    5525   8.44%   7.22%  6.98%   0 CEF process
   1     8241983  1023456    8054   3.20%   2.80%  2.75%   0 Chunk Manager

The two numbers after "five seconds" are total CPU and interrupt CPU respectively. In the example above, 72% of that 94% is interrupt-driven. That distinction is critical — it tells you whether the problem lives in the software process scheduler or deeper down at the hardware interface layer. Getting that wrong sends you troubleshooting the wrong thing entirely.

Root Cause 1: A Single Process Consuming Excessive CPU

This is the most common scenario you'll encounter. A specific IOS process — IP Input, BGP I/O, OSPF Hello, or sometimes a management daemon like SNMP ENGINE — climbs to 40–60% utilization and stays there. In my experience, IP Input is the most frequent offender, and when it shows up at the top of the list, it usually means the router is process-switching traffic that should be CEF-switched, or it's receiving a flood of packets addressed directly to the router itself.

Why it happens: IOS runs a cooperative multitasking scheduler. Each process gets CPU time in turns. If one process has a massive backlog — BGP is processing thousands of updates, IP Input is handling a scan or amplification attack aimed at the router's own IP, or SNMP ENGINE is fielding a full routing table MIB walk every 30 seconds — it hogs the scheduler and starves everything else. The router's own routing and keepalive processes get delayed. Neighbors drop. Traffic forwarding degrades.

How to identify it: Start with

show processes cpu sorted

to find the top offender, then use the history view to understand the timeline and duration.

sw-infrarunbook-01# show processes cpu history

     888886666655555444443333322222111110000
     321098765432109876543210987654321098765

100
 90         **
 80       ****
 70     ******          *
 60   ********       ***
 50  *********     *****
 40 **********   *******
CPU% per second (last 60 seconds)

For IP Input specifically, check how much traffic is addressed to the router itself rather than transiting through it:

sw-infrarunbook-01# show ip traffic
IP statistics:
  Rcvd:  15234982 total, 8231456 local destination,
         0 format errors, 0 checksum errors, 0 bad hop count
         0 unknown protocol, 12 not a gateway
  Frags: 0 reassembled, 0 timeouts, 0 couldn't reassemble
  Bcast: 231 received, 0 sent
  Mcast: 0 received, 0 sent
  Sent:  6234981 generated, 0 forwarded
  Drop:  1823 encapsulation failed, 0 unresolved, 0 no adjacency

When "local destination" is a large percentage of total received, the router is spending CPU handling traffic to its own address — management traffic, scans, DDoS, or misconfigured hosts. That all goes through IP Input as process-switched traffic.

How to fix it: For traffic flooding the router's own control plane, implement Control Plane Policing. CoPP rate-limits traffic destined to the router itself without affecting transit forwarding.

sw-infrarunbook-01(config)# ip access-list extended COPP-ICMP
sw-infrarunbook-01(config-ext-nacl)# permit icmp any any
sw-infrarunbook-01(config-ext-nacl)# exit

sw-infrarunbook-01(config)# class-map match-all COPP-ICMP-CLASS
sw-infrarunbook-01(config-cmap)# match access-group name COPP-ICMP
sw-infrarunbook-01(config-cmap)# exit

sw-infrarunbook-01(config)# policy-map COPP-POLICY
sw-infrarunbook-01(config-pmap)# class COPP-ICMP-CLASS
sw-infrarunbook-01(config-pmap-c)# police rate 1000 pps
sw-infrarunbook-01(config-pmap-c)# exit

sw-infrarunbook-01(config)# control-plane
sw-infrarunbook-01(config-cp)# service-policy input COPP-POLICY

Extend this pattern to cover SNMP, SSH, BGP, OSPF, and NTP traffic with appropriate rate limits for each. One class per protocol so that an anomaly in one doesn't choke the others.

Root Cause 2: Interrupt-Level CPU High

This one catches engineers off-guard because most troubleshooting instincts focus on software processes. Interrupt CPU is fundamentally different — it represents time the CPU spends handling hardware interrupts, primarily packet receive and transmit operations at the interface driver level. When interrupt CPU is the dominant number, the problem isn't in the IOS scheduler — it's at the hardware boundary.

Why it happens: Every packet arriving on an interface generates a hardware interrupt. The CPU must service that interrupt to move the packet from the interface buffer into the software queue. If traffic volume is high enough, or if small packets arrive at extremely high packet-per-second rates — which generate far more interrupts than large packets at equivalent bandwidth — interrupt CPU climbs. A 1 Gbps stream of 64-byte packets generates roughly 1.5 million interrupts per second. Software-based routers without ASIC-based forwarding can't keep up.

How to identify it: Look at the second number in the five-seconds CPU field. When it's close to the total, interrupts are the problem, not processes.

sw-infrarunbook-01# show processes cpu sorted
CPU utilization for five seconds: 89%/81%; one minute: 85%; five minutes: 82%

 PID Runtime(ms)  Invoked   uSecs    5Sec   1Min   5Min TTY Process
  62     1238471   923456    1341   4.20%   3.80%  3.75%   0 OSPF Hello
  45     1093847   812345    1347   2.44%   2.22%  2.18%   0 ARP Input
   1      823456   654321    1258   1.10%   0.98%  0.95%   0 Chunk Manager

89% total, 81% interrupt — the process list shows almost nothing, yet the router is grinding. That mismatch is the tell. Confirm with interface statistics:

sw-infrarunbook-01# show interfaces GigabitEthernet0/0/0
GigabitEthernet0/0/0 is up, line protocol is up
  Hardware is ISR4400-4x1GE, address is 0050.56ab.1234
  Internet address is 10.10.10.1/24
  MTU 1500 bytes, BW 1000000 Kbit/sec
     reliability 255/255, txload 248/255, rxload 251/255
  Full-duplex, 1000Mb/s
  input rate 987,654,321 bits/sec, 1,823,456 packets/sec
  output rate 823,456,789 bits/sec, 1,456,789 packets/sec
  Input queue: 245/75/1823/0 (size/max/drops/flushes)

Nearly 1.8 million packets per second on input with input queue drops — that's your confirmation. The drops tell you the CPU can't drain the receive queue fast enough. Also check for interface errors that might be generating spurious interrupts from retransmits or CRC conditions:

sw-infrarunbook-01# show interfaces GigabitEthernet0/0/0 counters errors
Port         Align-Err  FCS-Err  Xmit-Err  Rcv-Err UnderSize OutDiscards
Gi0/0/0              0        0         0        0         0           0

How to fix it: Short-term, identify the traffic source via

show ip cache flow

or NetFlow and null-route or upstream-filter the offending source. Long-term, if legitimate traffic is the cause, this router has hit a platform limitation — you need hardware with distributed forwarding via NPUs or ASICs (ASR, Catalyst 8000 series) rather than software interrupt handling. Verify CEF is fully enabled (covered next) as a first step, since process switching amplifies interrupt load.

Root Cause 3: CEF Disabled — Traffic Falling to Process Switching

This one has caused me genuine production pain more than once, and it's insidious because the symptoms look identical to a traffic flood. CEF is IOS's high-performance forwarding engine. When it's healthy, transit packets are switched using a pre-built FIB and adjacency table — no process involvement, minimal CPU. Disable it or break it, and every single transit packet goes through the IP Input process. On a busy router carrying a few hundred thousand packets per second, that means immediate CPU saturation.

Why it happens: CEF can be disabled manually with

no ip cef

— sometimes left that way after troubleshooting. It also falls back to process switching for specific traffic types: unsupported GRE configurations, certain encryption modes, IP accounting enabled on an interface, adjacency failures that prevent the FIB from resolving next-hops, or interfaces with features that IOS can't accelerate. I've also seen it happen after a config restore from backup where the original config had CEF disabled for a debugging session that nobody cleaned up.

How to identify it:

sw-infrarunbook-01# show ip cef summary
IPv4 CEF is disabled

sw-infrarunbook-01# show ip interface GigabitEthernet0/0/0 | include switching
  IP fast switching is disabled
  IP CEF switching is disabled
  IP route-cache flags are No CEF

That "No CEF" flag is the confirmation. But CEF can also be globally enabled while still failing per-prefix due to incomplete adjacencies. Check for those separately:

sw-infrarunbook-01# show ip cef summary
IPv4 CEF is enabled for distributed CEF
  VRF Default
    56234 prefixes (54891/1343 fwd/non-fwd)
    Table id 0x0

sw-infrarunbook-01# show adjacency GigabitEthernet0/0/0 detail | include incomplete
  10.10.10.254               incomplete
  10.10.10.253               incomplete

Incomplete adjacencies force the router to ARP for every single packet to those next-hops. That's per-packet ARP resolution, which is pure process-switching overhead even when CEF is nominally enabled. If your gateway IPs show as incomplete, you're process-switching everything that routes through them.

How to fix it: Re-enable CEF globally and clear the incomplete adjacencies:

sw-infrarunbook-01(config)# ip cef
sw-infrarunbook-01(config)# ip cef distributed

sw-infrarunbook-01(config)# interface GigabitEthernet0/0/0
sw-infrarunbook-01(config-if)# ip route-cache cef

! Clear stuck incomplete adjacencies
sw-infrarunbook-01# clear adjacency GigabitEthernet0/0/0

! Verify CEF is now active and adjacencies resolved
sw-infrarunbook-01# show ip cef summary
IPv4 CEF is enabled for distributed CEF
  VRF Default
    56234 prefixes (56234/0 fwd/non-fwd)

sw-infrarunbook-01# show adjacency GigabitEthernet0/0/0 summary
Protocol  Interface                 Address
IP        GigabitEthernet0/0/0      10.10.10.254(9)

After re-enabling, watch IP Input in

show processes cpu sorted

. If CEF was the cause, that process should drop from 40–60% to under 5% within 30–60 seconds as forwarding shifts back to the hardware-assisted FIB path.

Root Cause 4: Routing Protocol Instability

Routing protocol flapping is a CPU force multiplier. Every OSPF neighbor drop triggers an SPF recalculation. Every BGP session reset triggers withdrawal and re-advertisement of potentially hundreds of thousands of prefixes. Each convergence event burns CPU on route computation, RIB updates, and FIB rebuilds. In a network with many prefixes or a persistently flapping link, this can sustain elevated CPU for minutes — or indefinitely if you don't fix the underlying cause.

Why it happens: OSPF neighbors drop due to missed hellos from CPU overload (yes, high CPU causes flapping which causes more CPU — a feedback loop), interface errors, MTU mismatches, authentication failures, or dead timer misconfiguration. BGP sessions reset due to hold timer expiry when the router is too busy to send keepalives, TCP resets, or peer misconfigurations. Each protocol reconvergence is CPU-intensive, and in large topologies the cost compounds.

How to identify it: Check OSPF SPF execution frequency — a healthy network should show very few SPF runs:

sw-infrarunbook-01# show ip ospf statistics
  OSPF Router with ID (10.10.10.1) (Process ID 1)

  Area 0:
    SPF algorithm executed 847 times
    Last executed 00:00:03 ago
    Last full SPF: 00:00:03 ago

  SPF Throttling:
    Initial SPF schedule delay 50 msecs
    Minimum hold time between two consecutive SPFs 200 msecs
    Maximum wait time between two consecutive SPFs 5000 msecs

847 SPF runs is a serious problem — that means 847 topology change events. Check current neighbor states and look for anything not in FULL:

sw-infrarunbook-01# show ip ospf neighbor
Neighbor ID     Pri   State           Dead Time   Address         Interface
10.20.20.1        1   FULL/DR         00:00:35    10.10.10.2      Gi0/0/0
10.30.30.1        1   EXSTART/  -     00:00:34    10.10.10.3      Gi0/0/1
10.40.40.1        1   LOADING/  -     00:00:31    10.10.10.4      Gi0/0/2

EXSTART and LOADING neighbors are stuck in database exchange, which burns CPU continuously. For BGP, check session stability and reset counts:

sw-infrarunbook-01# show bgp summary
Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
10.10.10.254    4 65001  823456  412398   189234    0  982 00:12:43    524288
10.10.10.253    4 65002    8234    4123        0    0    0 00:00:14 Idle (Admin)

sw-infrarunbook-01# show bgp neighbors 10.10.10.254 | include resets
  Number of resets: 234
  Number of resets due to: Peer closed the session: 198

An OutQ of 982 means BGP is backlogged — it can't send updates fast enough because the CPU is too busy to service the BGP I/O process. 234 session resets means that peer has been flapping. Each reset triggers a full table withdrawal and re-advertisement of 524,288 prefixes.

How to fix it: For OSPF, identify why neighbors are stuck. Check MTU consistency and authentication on both sides:

sw-infrarunbook-01# show ip ospf interface GigabitEthernet0/0/1
  Internet Address 10.10.10.1/30, Area 0
  Process ID 1, Router ID 10.10.10.1, Network Type POINT_TO_POINT
  Timer intervals: Hello 10, Dead 40, Wait 40, Retransmit 5
  Neighbor Count is 0, Adjacent neighbor count is 0

sw-infrarunbook-01# show interfaces GigabitEthernet0/0/1 | include MTU
  MTU 1500 bytes

Zero neighbors on a point-to-point link with correct timers usually means an MTU mismatch. Tune SPF throttle timers to reduce the CPU cost of rapid reconvergence:

sw-infrarunbook-01(config)# router ospf 1
sw-infrarunbook-01(config-router)# timers throttle spf 50 200 5000

! For BGP: add dampening and BFD for fast, clean failure detection
sw-infrarunbook-01(config)# router bgp 65001
sw-infrarunbook-01(config-router)# bgp dampening 15 750 2000 60
sw-infrarunbook-01(config-router)# neighbor 10.10.10.254 fall-over bfd

BFD lets BGP detect link failures in milliseconds rather than waiting for the hold timer to expire. That means fewer queued updates get built up before the session tears down, which means less reconvergence work when it does.

Root Cause 5: Memory Pressure Causing CPU Spikes

Memory exhaustion and high CPU are tightly coupled — the relationship isn't obvious until you've seen it a few times. When a router starts running critically low on free memory, IOS spends increasing CPU cycles on memory management: garbage collection, buffer coalescing, memory pool compaction, and handling allocation failures. The CPU gauge shows high utilization, but the root driver is actually a leak or genuine memory exhaustion. Treating the CPU without finding the memory issue means the problem comes back.

Why it happens: Memory gets consumed by a full BGP routing table, memory leaks in specific IOS features or versions, large ACLs expanded into TCAM, NetFlow caches that grow unbounded, crypto session state accumulating on VPN concentrators, or processes that allocate memory and never free it. Once free memory drops below a threshold, IOS starts aggressively compacting memory pools — a CPU-intensive operation that runs repeatedly and starves everything else.

How to identify it:

sw-infrarunbook-01# show memory summary
                    Head     Total(b)     Used(b)     Free(b)  Lowest(b) Largest(b)
Processor    7F1234560000   536870912   521234567    15636345    8234123    7812345
lsmpi_io     7F2345670000    67108864    66234567      874297     234567     412345
I/O          7F3456780000   134217728    87654321    46563407   12345678   23456789

15 MB of free processor memory on a 512 MB router is critical. The "Lowest" column is important — it shows the historical minimum since the last reload. At 8 MB, this router has been even lower than current. That's a platform on the edge of a crash. Now find what's holding the memory:

sw-infrarunbook-01# show processes memory sorted
                    Processor memory
 PID TTY  Allocated      Freed    Holding    Getbufs    Retbufs Process
  62   0  523456789  478234567   45222222          0          0 BGP Router
 186   0  234567890  189234567   45333323          0          0 BGP I/O
  45   0  123456789  123450000      6789          0          0 OSPF Hello
   0   0   89234567   43456789   45777778          0          0 *Dead*

sw-infrarunbook-01# show memory allocating-process totals | include Dead
   45777778    *Dead*

BGP Router and BGP I/O together holding 90 MB and never releasing. The

*Dead*

entry with 45 MB held means a crashed process left allocated memory behind — a classic leak indicator. Check the IOS-XE version against Cisco's bug database for known BGP memory leaks on your platform.

How to fix it: Short-term, reduce memory consumption by filtering the BGP table and compressing the NetFlow cache:

! Cap BGP prefix acceptance with a warning at 80%
sw-infrarunbook-01(config)# router bgp 65001
sw-infrarunbook-01(config-router)# neighbor 10.10.10.254 maximum-prefix 600000 80

! Check for runaway NetFlow cache
sw-infrarunbook-01# show ip cache flow | include entries
IP Flow Switching Cache, 4456448 bytes
  1823456 active, 221568 inactive, 9234567 added

! Reduce cache size
sw-infrarunbook-01(config)# ip flow-cache entries 32768

! Emergency: clear specific caches for immediate relief
sw-infrarunbook-01# clear ip cache
sw-infrarunbook-01# clear ip flow stats

If you've confirmed a software leak and need to recover memory without a reload, clearing the BGP table causes it to be rebuilt from scratch, which sometimes reclaims leaked memory from the old state:

sw-infrarunbook-01# clear ip bgp 10.10.10.254
! This resets the BGP session — only do this in a maintenance window
! or if the alternative is a router reload

Long-term, upgrade to an IOS-XE release with the memory leak patched. Cisco's Software Checker will map your current version to known defects and recommend a fixed release for your platform.

Root Cause 6: SNMP Polling and Management Plane Overload

I've walked into more than one "mystery high CPU" situation where the culprit turned out to be an NMS polling the router every 30 seconds on every OID in the book. SNMP runs as a process in IOS and can consume substantial CPU when polled at high frequency, especially when walking large MIB tables.

Why it happens: Walking ipRouteTable or CISCO-BGP4-MIB on a router carrying a full Internet routing table forces IOS to serialize its internal routing data structures into SNMP response PDUs. It's expensive. Multiply by 10 monitoring systems all polling at 60-second intervals and you have sustained CPU load from SNMP ENGINE alone. Misconfigured trap destinations or SNMP community strings accessible to scanners make this worse.

How to identify it:

sw-infrarunbook-01# show processes cpu sorted | include SNMP
 145    48234521   4823456   9997  22.34%  19.23%  18.45%   0 SNMP ENGINE

sw-infrarunbook-01# show snmp
Chassis: FOX1234ABCD
...
187234 SNMP packets input
    0 Bad SNMP version errors
    0 Unknown community name
  187234 Number of requested variables
  187234 Get-next PDUs
    0 Set-request PDUs

187,234 get-next PDUs confirms an active MIB walk. SNMP ENGINE at 22% CPU is the process-level confirmation. Fix it by restricting MIB access and enforcing SNMPv3:

sw-infrarunbook-01(config)# snmp-server view INFRA-VIEW internet included
sw-infrarunbook-01(config)# snmp-server view INFRA-VIEW ipRouteDest excluded
sw-infrarunbook-01(config)# snmp-server view INFRA-VIEW ipCidrRouteDest excluded
sw-infrarunbook-01(config)# snmp-server view INFRA-VIEW bgpPathAttrTable excluded

sw-infrarunbook-01(config)# snmp-server group INFRA-OPS v3 priv read INFRA-VIEW
sw-infrarunbook-01(config)# snmp-server user infrarunbook-admin INFRA-OPS v3 auth sha AuthP@ssXR9 priv aes 128 PrivP@ssXR9

! Restrict SNMP to known NMS hosts only
sw-infrarunbook-01(config)# ip access-list standard SNMP-HOSTS
sw-infrarunbook-01(config-std-nacl)# permit 10.20.30.10
sw-infrarunbook-01(config-std-nacl)# permit 10.20.30.11
sw-infrarunbook-01(config-std-nacl)# deny   any log
sw-infrarunbook-01(config-std-nacl)# exit
sw-infrarunbook-01(config)# snmp-server community READONLY ro SNMP-HOSTS

Prevention

Preventing high CPU incidents comes down to three fundamentals: baseline before you need it, protect the control plane before something attacks it, and keep the router doing what it was designed to do.

Start by establishing a CPU baseline on every router during normal operations. Run

show processes cpu history

and document what normal looks like. A router idling at 15% process CPU has headroom. One idling at 40% doesn't — the next BGP reconvergence or traffic spike will push it over the edge. Know your floor before the floor moves.

Deploy CoPP on every router. It's not optional in production networks. Without it, a single misconfigured host sending ICMP floods to your router's management IP can saturate the control plane and take down routing protocol sessions for legitimate traffic. Build separate CoPP classes for ICMP, SNMP, SSH, BGP, OSPF, and NTP so that an anomaly in one protocol doesn't kill the others.

Keep CEF healthy and monitored. Build a simple EEM applet to alert on syslog messages containing "CEF disabled" so accidental disablement gets caught in minutes rather than discovered during an incident. Periodically audit

show ip cef summary

across your fleet — especially after config changes, software upgrades, or restores from backup.

Tune routing protocol timers for your topology scale. Default SPF timers and BGP hold timers were designed for small networks. In large-scale deployments, SPF throttle timers on OSPF and BGP dampening for unstable prefixes significantly reduce CPU cost during reconvergence. BFD for BGP fast failover means the session tears down before a backlog of keepalives and updates builds up.

Monitor memory trends with your NMS, not just current values. A router losing 1 MB of free processor memory per day has roughly two weeks before it becomes a problem. Trending

ciscoMemoryPoolFree

over time will surface slow leaks months before they cause an outage. Alert on a threshold, not just a floor.

Keep IOS-XE patched against known bugs. A significant number of high-CPU incidents in production environments are caused by documented software defects that already have fixes available. Use Cisco's Software Checker to identify recommended releases for your hardware platform and schedule maintenance upgrades before bugs become outages. The best time to patch is before the router pages you at 2 AM.

Cisco High CPU on Router Troubleshooting

Symptoms

Root Cause 1: A Single Process Consuming Excessive CPU

Root Cause 2: Interrupt-Level CPU High

Root Cause 3: CEF Disabled — Traffic Falling to Process Switching

Root Cause 4: Routing Protocol Instability

Root Cause 5: Memory Pressure Causing CPU Spikes

Root Cause 6: SNMP Polling and Management Plane Overload

Prevention

Related Articles

Frequently Asked Questions

What is the first command to run when a Cisco router shows high CPU?

What does the second number in 'show processes cpu' five-seconds mean?

How do I check if CEF is disabled on a Cisco router?

Can routing protocol flapping cause a CPU feedback loop on a Cisco router?

How does memory pressure cause high CPU on a Cisco router?

Related Articles