Node unable to complete takeover with "Local node missing partner disk"
Applies to
- AFF-A250
- NS224 NVMe Shelf
Issue
- EMS shows errors as follows:
[NODE-A: SKL cerror: pcie.stealth.errors:debug]: params: {'pcie_errors': 'IIO2: RPT(100,0,0): PLX PCIE 9797 switch on Controller, PLX PCIE 9797 switch on Controller, Br[9797](102,0,0): RcvErr(P1(255)), Br[9797](102,1,0): BadTLP(2), BadDLLP(1021); Br[9797](102,1,0): DevStatus(Corr), CorrErr(Rcvr,BDLLP); '} <<<<< Multiple errors observed
...
[NODE-A: kernel: nvme.link.disabled.error:error]: PCIe link disabled for NVMe SSD in slot 6 due to excessive errors. <<<<< Disk in SLOT 6 gets disabled
...
[NODE-A: config_thread: raid.rg.media_scrub.summary.media:notice]: params: {'errors': '0', 'rg': '/NODE_B_NVME_SSD_1/plex0/rg0', 'current': '.'}
[NODE-A: cfdisk_config: cf.disk.inventory.mismatch:error]: Status of the disk ?.? (66304C30:52500891:00253841:00000003:500A0981:00000002:00000000:00000000:00000000:00000000) has recently changed or the node (NODE-A) is missing the disk.
[NODE-A: cfdisk_config: cf.disk.invent.mismatchalt:alert]: Status of some of the disks has changed or the node (NODE-A) is missing 4 disks (detailed logs have been throttled).
[NODE-A: cfdisk_config: callhome.sfo.miscount:error]: Call home for HA GROUP ERROR: DISK/SHELF COUNT MISMATCH
- STORAGE-DISK report in Autosupports shows error with Disk Qualification Package:
Disk Qualification Package Details:
Package Date: Unable to load any package (Unknown error or Package may not be present)
Header Information
FileName = N/A
FileVersion = N/A
DriveRecordCount = N/A
AliasRecordCount = N/A
DeviceRecordCount = N/A
SystemRecordCount = N/A
- One node in the HA Pair is not able to see all disks
Node A
slot 0: Virtual NVMe Host Adapter 0n
0 : NETAPP X4018S17331T9NTE NA53 1831.1GB 4160B/sect (XXXXXXXXXXXXXX)
...
4 : NETAPP X4018S17331T9NTE NA53 1831.1GB 4160B/sect (XXXXXXXXXXXXXX)
5 : NETAPP X4018S17331T9NTE NA53 1831.1GB 4160B/sect (XXXXXXXXXXXXXX)
7 : NETAPP X4018S17331T9NTE NA53 1831.1GB 4160B/sect (XXXXXXXXXXXXXX)
8 : NETAPP X4018S17331T9NTE NA53 1831.1GB 4160B/sect (XXXXXXXXXXXXXX)
...
23 : NETAPP X4018S17331T9NTE NA53 1831.1GB 4160B/sect (XXXXXXXXXXXXXX)
Shelf 0: NS224NSM8E Firmware rev. NSM8E A: 0120 NSM8E B: 0120
Node B
slot 0: Virtual NVMe Host Adapter 0n
0 : NETAPP X4018S17331T9NTE NA53 1831.1GB 4160B/sect (XXXXXXXXXXXXXX)
1 : NETAPP X4018S17331T9NTE NA53 1831.1GB 4160B/sect (XXXXXXXXXXXXXX)
...
4 : NETAPP X4018S17331T9NTE NA53 1831.1GB 4160B/sect (XXXXXXXXXXXXXX)
5 : NETAPP X4018S17331T9NTE NA53 1831.1GB 4160B/sect (XXXXXXXXXXXXXX)
6 : NETAPP X4018S17331T9NTE NA53 1831.1GB 4160B/sect (XXXXXXXXXXXXXX)
7 : NETAPP X4018S17331T9NTE NA53 1831.1GB 4160B/sect (XXXXXXXXXXXXXX)
8 : NETAPP X4018S17331T9NTE NA53 1831.1GB 4160B/sect (XXXXXXXXXXXXXX)
...
23 : NETAPP X4018S17331T9NTE NA53 1831.1GB 4160B/sect (S60LNE0N602930)
Shelf 0: NS224NSM8E Firmware rev. NSM8E A: 0120 NSM8E B: 0120
- Disk Qualification Package update does not solve the issue
- Disk re-seat in slot does not solve the issue
- Disk and motherboard replacement do not solve the issue