HA Interconnect Link down on AFF-A300
Applies to
AFF-A300
Issue
- After replacing the motherboard on faulty node the HA Interconnect remained offline.
- The system showed repeated link flapping and eventually stayed down.
Output for system ha-interconnect status show:
Node A: Logical Link status is Down
Node B: Logical Link status is Down
NODE-A
slot 0: Interconnect HBA: Generic OFED Provider
Port Name: ic0a
GID: fe80:0000:0000:0000:0000:0000:0000:0104
Base LID: 0x104
Active MTU: 8192
slot 0: NTB Interconnect (PLX87b0)
Max HW Data Rate: PCIe Gen 3 x 8
HW Data Rate: PCIe Gen 1 x 0
SW Data Rate: PCIe Gen 1 x 0
Logical Link: Down <<<<<<
Port State: Enabled
NODE-B
slot 0: Interconnect HBA: Generic OFED Provider
Port Name: ic0a
GID: fe80:0000:0000:0000:0000:0000:0000:0105
Base LID: 0x105
Active MTU: 8192
slot 0: NTB Interconnect (PLX87b0)
Max HW Data Rate: PCIe Gen 3 x 8
HW Data Rate: PCIe Gen 1 x 8
SW Data Rate: PCIe Gen 3 x 0
Logical Link: Down <<<<<
Port State: Enabled
EMS Logs:
[?] Tue Sep 09 14:24:42 +0200 [NODE-A: gop_eq_thread: ic.linkStatusChange:info]: HA interconnect: Port ic0a link is down.
[?] Tue Sep 09 14:25:55 +0200 [NODE-A: gop_eq_thread: ic.linkStatusChange:info]: HA interconnect: Port ic0a link is up.
Or
[?] Mon Sep 15 19:00:00 +0200 [NODE-A: statd: ic.HAInterconnectDown:error]: HA interconnect: Interconnect down for 5438 minutes: links down
[?] Mon Sep 15 20:00:00 +0200 [NODE-A: statd: ic.HAInterconnectDown:error]: HA interconnect: Interconnect down for 5498 minutes: links down
- Hard power cycle of the HA Pair was performed by removing the controllers from the chassis
- HA Pair recovered temporarily but flapped and failed again
- Motherboard re-seat was attempted on Node A with partner node inserted without changes
- Motherboard replacement was performed on Node A with partner node inserted without changes
- Motherboard re-seat was performed on Node B with partner node inserted in chassis without changes
