Applicable scope:
All CM cards in high-end switches that use obliquely inserted memory module sockets, as shown in the following figure.
The models include but are not limited to the following:
M18010-CM
M18010-CM II
M18014-CM
M18014-CM II
M18007-CM II
M18007-CM II LITE
M8600E-CM
M7800E-CM
Fault symptom:
The two common fault logs are as follows:
The device is restarted repeatedly and the following exception information is displayed in the case of boot:
Boot 1.2.2-eaf8aaa (Build time: Apr 21 2014 - 10:12:42)
DRAM: 4 GiB
Boot 1.2.2-eaf8aaa (Build time: Apr 21 2014 - 10:12:42)
DRAM: 4 GiB
The device automatically restarts and the following exception information is displayed (the ECC error is reported repeatedly):
NAND: 512 MiB
Flash: 8 MiB
SETMAC: Setmac operation was performed at 2014-06-16 21:16:11 (version: 11.0)
Press Ctrl+C to enter Boot Menu
Bootloader: Done loading app on coremask: 0xf
[ 0.000000] ERROR PBANK_LSB: 4, ROW_LSB: 2, Row bits: 16, Col bits: 10, Row mask: 0xffff, Col mask: 0x3ff
[ 0.000000] ERROR LMC0 ECC: sec_err:8 ded_err:0
[ 0.000000] LMC0 ECC: Failing dimm: 0
[ 0.000000] LMC0 ECC: Failing rank: 0
[ 0.000000] LMC0 ECC: Failing bank: 7
[ 0.000000] LMC0 ECC: Failing row: 0xff0b
[ 0.000000] LMC0 ECC: Failing column: 0x2dbe
[ 0.000000] LMC0 ECC: syndrome: 0xce
[ 0.000000] Failing Address: 0x000000010f0b6cf8, Data: 0xc00627d8c006cfec
[ 0.000000] ERROR PBANK_LSB: 4, ROW_LSB: 2, Row bits: 16, Col bits: 10, Row mask: 0xffff, Col mask: 0x3ff
[ 0.000000] ERROR LMC0 ECC: sec_err:1 ded_err:0
[ 0.000000] LMC0 ECC: Failing dimm: 0
[ 0.000000] LMC0 ECC: Failing rank: 0
[ 0.000000] LMC0 ECC: Failing bank: 5
[ 0.000000] LMC0 ECC: Failing row: 0x14
[ 0.000000] LMC0 ECC: Failing column: 0x1110
[ 0.000000] LMC0 ECC: syndrome: 0xce
[ 0.000000] Failing Address: 0x0000000000144480, Data: 0x080510000083102d
[ 9.235671] ERROR PBANK_LSB: 4, ROW_LSB: 2, Row bits: 16, Col bits: 10, Row mask: 0xffff, Col mask: 0x3ff
[ 9.350371] ERROR LMC0 ECC: sec_err:8 ded_err:0
[ 9.350374] LMC0 ECC: Failing dimm: 0
[ 9.350377] LMC0 ECC: Failing rank: 0
[ 9.350379] LMC0 ECC: Failing bank: 6
[ 9.350382] LMC0 ECC: Failing row: 0xdd
[ 9.350385] LMC0 ECC: Failing column: 0x379a
[ 9.350388] LMC0 ECC: syndrome: 0xce
[ 9.350390] Failing Address: 0x0000000000dde458, Data: 0xcccccccccccccccc
3. Troubleshooting suggestion:
When a faulty card encounters the preceding fault symptoms, the fault may be caused by poor contact between the memory module and the memory module socket. In this case, perform the following operations to attempt to eliminate the poor contact:
Step 1: Remove the faulty card from the chassis and put it on a flat platform.
Step 2: After wearing ESD gloves or an ESD wrist strap, hold the edge in the middle of the memory module where no component resides (as shown in Figure 2), shake the memory module top down along the direction vertical to the memory module plane (as shown in Figure 3), with the amplitude smaller than 5 mm, to prevent damage to the memory module and socket.
Figure 2
Figure 3
Step 3: Hold both ends of the memory module and socket with index fingers and thumbs, and press the memory module into the socket with force along the direction parallel to the memory module, as shown in Figure 4.
Figure 4
Step 4: Insert the faulty card into the chassis and power on the device.
If the fault is rectified and the device runs properly after the preceding operations are performed, the poor contact is eliminated and the sudden poor contact will not occur on the memory module in the subsequent device running.
If the fault persists after the preceding operations are performed, you are recommended to perform the following operations:
Step 1: When the faulty card encounters the repeated restart symptom, press Ctrl+T till the card resets and enters the memory self-check state. Then, release the buttons. After the memory self-check is complete, record the collected log for future troubleshooting.
Step 2: Record the customer name, device running duration, device serial number, and other common information.
Step 3: Start the DOA or RMA process for the faulty card.