Node is stuck in a boot loop due to MULTIPLE DISKS MISSING in the same shelf
Applies to
- ONTAP 9
- DS212
Issue
- Node unexpected reboot due to raid group missing multiple disks:
Thu Aug 08 14:30:23 +0900 [Node-01: config_thread: raid.config.filesystem.disk.missing:info]: File system Disk /aggr1_1/plex0/rg1/0a.01.8 Shelf 1 Bay 8 [NETAPP X380_WVAXE10TA07 NA00] S/N [VHGM7DAM] UID [5000CCA0:C822FB48:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000] is missing.
Thu Aug 08 14:30:23 +0900 [Node-01: config_thread: raid.config.filesystem.disk.missing:info]: File system Disk /aggr1_1/plex0/rg1/0a.01.1 Shelf 1 Bay 1 [NETAPP X380_STATE10TA07 NA00] S/N [ZA2D9K68] UID [5000C500:CA4A429F:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000] is missing.
Thu Aug 08 14:30:23 +0900 [Node-01: config_thread: raid.config.filesystem.disk.missing:info]: File system Disk /aggr1_1/plex0/rg1/0a.01.2 Shelf 1 Bay 2 [NETAPP X380_STATE10TA07 NA00] S/N [ZA2DASCT] UID [5000C500:CA579B2F:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000] is missing.
Thu Aug 08 14:24:28 +0900 [Node-01: config_failed_disk: callhome.disks.missing:error]: Call home for MULTIPLE DISKS MISSING
RAID group /aggr1_1/plex0/rg1 (partial, block checksums)
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
tparity FAILED N/A 9324290/ -
dparity FAILED N/A 9324290/ -
parity FAILED N/A 9324290/ -
data FAILED N/A 9324290/ -
data FAILED N/A 9324290/ -
data FAILED N/A 9324290/ -
data FAILED N/A 9324290/ -
data FAILED N/A 9324290/ -
data 0a.02.7 0a 2 7 SA:B 0 FSAS 7200 9324290/19096145920 9342976/19134414848
data FAILED N/A 9324290/ -
data 0a.02.0 0a 2 0 SA:B 0 FSAS 7200 9324290/19096145920 9342976/19134414848
data 0a.02.1 0a 2 1 SA:B 0 FSAS 7200 9324290/19096145920 9342976/19134414848
data 0a.02.2 0a 2 2 SA:B 0 FSAS 7200 9324290/19096145920 9342976/19134414848
data 0a.02.3 0a 2 3 SA:B 0 FSAS 7200 9324290/19096145920 9342976/19134414848
data 0a.02.4 0a 2 4 SA:B 0 FSAS 7200 9324290/19096145920 9342976/19134414848
data 0a.02.5 0a 2 5 SA:B 0 FSAS 7200 9324290/19096145920 9342976/19134414848
data 0a.02.6 0a 2 6 SA:B 0 FSAS 7200 9324290/19096145920 9342976/19134414848
Raid group is missing 9 disks.
- Chassis PSU reports DC undervoltage then recovered.
Thu Aug 08 14:24:49 +0900 [Node-02: dsa_worker5: ses.status.psWarning:error]: DS212-12 (S/N SHxxxxxxxxxxx) shelf 1 on channel 0a power warning for Power supply 1: warning status; DC undervoltage. This module is on the rear of the shelf at the bottom left.
Thu Aug 08 14:24:58 +0900 [Node-02: dsa_worker2: callhome.shlf.ps.fault:error]: Call home for SHELF POWER SUPPLY WARNING
Thu Aug 08 14:25:16 +0900 [Node-02: dsa_worker2: ses.status.volWarning:error]: DS212-12 (S/N SHxxxxxxxxxxx) shelf 1 on channel 0a voltage warning for Voltage sensor 1: internal communication error. This module is on the rear of the shelf on the lower left power supply.
Thu Aug 08 14:25:16 +0900 [Node-02: dsa_worker2: ses.status.volWarning:error]: DS212-12 (S/N SHxxxxxxxxxxx) shelf 1 on channel 0a voltage warning for Voltage sensor 2: internal communication error. This module is on the rear of the shelf on the lower left power supply.
Thu Aug 08 14:25:25 +0900 [Node-02: statd: monitor.shelf.warning:error]: Fault reported on disk storage shelf attached to channel 0a. Check fans, power supplies, disks, and temperature sensors.
Thu Aug 08 14:25:43 +0900 [Node-02: dsa_worker3: ses.status.psError:alert]: DS212-12 (S/N SHxxxxxxxxxxx) shelf 1 on channel 0a power error for Power supply 1: critical status; AC Fail. This module is on the rear of the shelf at the bottom left.
Thu Aug 08 14:25:43 +0900 [Node-02: dsa_worker3: callhome.shlf.power.intr:error]: Call home for SHELF POWER INTERRUPTED
Thu Aug 08 14:26:29 +0900 [Node-02: dsa_worker4: ses.status.psInfo:info]: DS212-12 (S/N SHxxxxxxxxxxx) shelf 1 on channel 0a power supply information for Power supply 1: normal status.
- All missing disk is belongs to the same shelf, they are bypassed by IOM A or IOM B:
Enclosure Status: unrecoverable
Channel: 0a
Shelf: 1
Shelf Type: DS212-12
Product Serial Number: SHJxxxxxxxxxxxx
Module Type: IOM12
Disk Elements:
Element Status Status Bytes Status Descriptions
0 [Bay 0]: OK 01,00,00,00
1 [Bay 1]: OK 01,01,00,00
2 [Bay 2]: OK 01,02,00,00
3 [Bay 3]: OK 01,03,00,00
4 [Bay 4]: OK 01,04,00,00
5 [Bay 5]: OK 01,05,00,00
6 [Bay 6]: OK 01,06,00,00
7 [Bay 7]: NONCRITICAL 03,07,10,04 ENCLOSURE BYPASSED B, BYPASSED B
8 [Bay 8]: OK 01,08,00,00
9 [Bay 9]: NONCRITICAL 03,09,20,08 ENCLOSURE BYPASSED A, BYPASSED A
10 [Bay 10]: NONCRITICAL 03,0A,30,2C ENCLOSURE BYPASSED B, ENCLOSURE BYPASSED A, BYPASSED B, BYPASSED A, FAULT REQSTD
11 [Bay 11]: NONCRITICAL 03,0B,30,0C ENCLOSURE BYPASSED B, ENCLOSURE BYPASSED A, BYPASSED B, BYPASSED A
0 [Bay 0]: FF,00,00,00