AFF A250 won't boot due to faulty drive
Applies to
- ONTAP 9
- AFF A250
Issue
- Node not booting
- Partner node sees inventory errors, EMS LOGS:
3/6/2023 11:03:10 node-02 ERROR cf.disk.inventory.mismatch: Status of the disk 0n.10P1(66305030:54101010:00253845:00000002:500A0981:00000001:00000000:00000000:00000000:00000000)has recently changed or the node (node-01) is missing the disk.
- SP LOGS on the downed node indicate errors on slot 10 of the shelf:
44 | OEM record ee | Device Bus: 112 Dev: 0 Fun: 0 (slot 10) Failed to train at max link speed/width, retraining cycle 0
- PCI stealth errors to this same device can also be observed
Sun Jan 29 17:18:21 0100 [node-01: PCI Link: pcie.stealth.link.errors:debug]: params: {'pcie_link_errors': 'performance is not optimal for PCI Device 144d:a824 in slot 10 on Controller. Dv[a824](112,0,0) in slot 10: LinkCap(MaxLkSp(3),MaxLkWd(4),ASPM(0),L0(6),L1(2),SurpDn,DLAct,Port(10)), LinkCap(MaxLkSp(3),MaxLkWd(2),ASPM(0),L0(7),L1(6),Port(0)), LinkStatus(LkSp(3),LkWd(1),DLAct). '}
- We see in storage shelf, drive 10 has a link width of 1 (instead of 2)
[10 ] OK 8.0 8.0 8.0 8.0 2 2 1 0 0 0 0 0