T6 card ports down after encountering a fatal error
Applies to
- Dual port 40/100G Ethernet T62100-CR card for Metrocluster interconnects
- 4 or 8 node Metrocluster IP only (2 node Metrocluster IP does not apply)
- Any platform that supports T6 cards for Metrocluster
Issue
- AutoSupport is generated from the node:
HA Group Notification from cl01-n01 (PARTNER DOWN, TAKEOVER IMPOSSIBLE ) EMERGENCY - Both ports of the 40/100G Ethernet T62100-CR card go down after encountering a fatal error:
Tue Mar 23 11:00:58 +0000 [MC2-N02: intr: netif.linkDown:info]: Ethernet e1a: Link down, check cable.Tue Mar 23 11:00:58 +0000 [MC2-N02: intr: netif.linkDown:info]: Ethernet e1b: Link down, check cable.Tue Mar 23 11:00:58 +0000 [MC2-N02: intr: netif.fatal.err:alert]: The network device in slot 1 encountered fatal error e1a/e1b.- Node reboot might happen with the following error visible on console:
e5a/e5b (t6nex0): ! PL_PERR_CAUSE 0x19404 = 0x00000001, E 0x1fffe3ff, F 0xffffffff
e5a/e5b (t6nex0): ! [0x00000001] CIM
e5a/e5b (t6nex0): * PL_INT_CAUSE 0x1940c = 0x00100041, E 0x1fffe77d, F 0x00000000
e5a/e5b (t6nex0): * [0x00100000] MA
e5a/e5b (t6nex0): * [0x00000040] PL
e5a/e5b (t6nex0): * [0x00000001] CIM
e5a/e5b (t6nex0): ! MA_INT_CAUSE 0x77e0 = 0x00000002, E 0x00000007, F 0x00000006
e5a/e5b (t6nex0): ! MA_PARITY_ERROR_STATUS1 0x77f4 = 0x00020000, E 0x00000000, F 0xffffffff
e5a/e5b (t6nex0): ! PL_PL_INT_CAUSE 0x19430 = 0x00000018, E 0x00000010, F 0x00000010
e5a/e5b (t6nex0): ! [0x00000010] Fatal parity error
e5a/e5b (t6nex0): ? [0x00000008]
e5a/e5b (t6nex0): firmware reports adapter error: Crash (0xc014c010)
- After a node reboot the ports may recover completely or for a brief period before going down again
