Ruijie Community

Title: Troubleshooting for the “mstp.elf” process with very high cpu utilization [Print this page]

Author: admin    Time: 2018-4-9 19:23
Title: Troubleshooting for the “mstp.elf” process with very high cpu utilization
The Spanning TreeProtocol (STP) is disabled on a core switch. However, the show cpu command output shows high CPU utilization and the CPUProtect Policy (CPP) statistics show that a large number of Bridge ProtocolData Unit (BPDU) packets are sent to the CPU.
Switch#show cpu
=======================================
    CPU Using Rate Information
CPU utilization in five seconds: 87%
CPU utilization in one minute  : 65%
CPU utilization in five minutes: 16%
NO   5Sec   1Min  5Min   Process
   1   0.00%   0.00%  0.00% init
   2   0.00%   0.00%  0.00% kthreadd
   3   0.00%   0.00%  0.00% migration/0
   4   0.00%   0.00%  0.00% ksoftirqd/0
   5   0.00%   0.00%  0.00% migration/1
   6   0.00%   0.00%  0.00% ksoftirqd/1
  11   0.00%   0.00%  0.00% events/0
  12   0.00%   0.00%  0.00% events/1
  15   0.00%   0.00%  0.00% khelper
.............................................
1021    80%   60%    15%  mstp.elf


Author: admin    Time: 2018-4-9 19:23
Possible Causes
1) A loop occurs on the Spanning TreeProtocol (STP) network. On the STP-enabled network, ports are not blocked dueto incorrect calculation on some switches, thereby forming a loop.
2) Due to BPDU attacks by humans or otherdevices, the switch receives a large number of BPDU packets in a short time.
3) STP and some other security functions(such as 802.1X) are enabled on some switches but errors occur to softwareprocessing on these devices. As a result, blocked ports forward EAPOL packets,causing a loop and high CPU utilization.
Troubleshooting procedure
Troubleshooting Procedure
Step 1 Check the CPU utilization of the switch.
Step 2 Check parameters related to the taskand timer processes.
Step 3 Check whether BPDU attacks areinitiated by humans.
Step 4 Check the configurations ofconvergence switches connected to the core switch.
If the fault persists, collect faultinformation and contact Ruijie technical support for assistance.
Step1: Check CPU process information to see whether anyspecial process causes an increase in CPU utilization.
1.Run the show cpu command multiple times to display the CPU utilization ofthe switch. If the high CPU utilization is caused by the BPDU receivingprocess, save the logs of the following operations.
(Note: The BPDU packets sent to anSTP-disabled switch consume its CPU resources. That is, discarding excessiveBPDU packets may also cause an increase in CPU utilization.)
Switch#sho cpu
=======================================
    CPU Using Rate Information
CPU utilization in five seconds: 87%
CPU utilization in one minute  : 65%
CPU utilization in five minutes: 16%
NO   5Sec   1Min  5Min   Process
   1   0.00%   0.00%  0.00% init
   2   0.00%   0.00%  0.00% kthreadd
   3   0.00%   0.00%  0.00% migration/0
   4   0.00%   0.00%  0.00% ksoftirqd/0
   5   0.00%   0.00%  0.00% migration/1
   6   0.00%   0.00%  0.00% ksoftirqd/1
  11   0.00%   0.00%  0.00% events/0
  12   0.00%   0.00%  0.00% events/1
  15   0.00%   0.00%  0.00% khelper
.............................................
1021    80%   60%    15%  mstp.elf
.............................................
CheckStandard:
1. Check whether the CPU utilization of themstp.elf process is high for 5 seconds in the show cpu command output for threeconsecutive times. (The CPU utilization above 15% is usually assumed high)
Note: The mstp.elf process is used toprocess STP-related events, for example, BPDU packet receiving andtransmission, interface event, and state machine processing. If STP isdisabled, the process may also cause high CPU utilization due to BPDU packetdrop. You can run the show cpu-protectmboard command to check the CPU utilization of the MSTP.
2. Run the show cpu-protect mboard command to display the numbers of receivedand dropped BPDU packets.
Ruijie#show cpu-protect mboard
%cpu port bandwidth: 100000(pps)
Traffic-class   Bandwidth(pps)  Rate(pps) Drop(pps)
-------------   --------------  --------- ---------
0             20000           0          0        
1             20000           0          0        
2             20000           0          0        
3             20000           0          0        
4             20000           0          0        
5             20000           0          0      
6             20000           0          0        
7             20000           0          0        
Packet Type Traffic-class  Bandwidth(pps)  Rate(pps) Drop(pps)  Total  Total Drop
------------------  -------------   --------------       ---------     ---------    --------- ----------  
bpdu          6         128            0             0      206099   101088         
arp            1         10000         0              0       0       0           
   
Checkcriterion:
If the BPDU packet drop is high, itindicates that the switch receives a large number of BPDU packets, of whichsome are discarded by the CPP hardware and some are discarded by the software.
If the statistics of STP-related process inthe show cpu and show cpu-protect mboard command outputare small, go to Step 2.
Step2: Check whether high CPU utilization is caused byman-made BPDU attacks
A core switch receives a large number ofBPDU packets for either of the following reasons:
1. A large number of BPDU packets are sentdue to man-made attacks or device faults.
2. BPDU packets from a large number ofaccess switches are transparently transmitted to the core switch.
To exclude the above possibility, identifyBPDU packet sources by mirroring-based packet capture:
Perform port mirroring on the Access portsof the core switch one by one, analyze the captured BPDU packets, and determinethe source MAC addresses of these BPDU packets. It is recommended to identifythe switches sending these abnormal BPDU packets based on the source MACaddresses and perform rate limiting on BPDU packets.
If the possibility is excluded, go toStep 3.
Step3: Check theconfigurations of all convergence switches connected to the core switch
1. If the possibility ofman-made attacks or device faults is excluded, check the configurations ofconvergence switches. If convergence switches do not filter BPDU packets, BPDUpackets will be transparently transmitted to the core switch, causing high CPUutilization.
1) Log in to a convergenceswitch connected to the core switch, and check whether BPDU filtering isenabled on the port connecting the switch to the core switch or a PC. If not,it is recommended to enable BPDU filtering on the port connecting the switch tothe PC.
Ruijie(config-if-GigabitEthernet0/1)#spanning-tree bpdufilter enable
//Ensure that no loop occurs among the devices connected to this port.
2. Check configurations of other security functions than STP on theswitch, especially security-related packets forwarding on the blocked ports. Ifabnormal forwarding occurs, a loop may have occurred among the devices.






Welcome to Ruijie Community (https://community.ruijienetworks.com/) Powered by Discuz! X3.2