System experiences multidisk panic after PSU alerts
Applies to
- FAS8300
- DS212-12 shelf
Issue
- System reports multiple
ses.status.psWarning:error
events against multiple shelves. Example:
[node01: dsa_worker3: ses.status.psWarning:error]: DS212-12 (S/N SHJHUXXXXXXXXXX) shelf 1 on channel 0a power warning for Power supply 2: warning status; DC undervoltage. This module is on the rear of the shelf at the bottom right.
[node01: dsa_worker1: ses.status.psWarning:error]: DS212-12 (S/N SHJHUXXXXXXXXXX) shelf 5 on channel 0a power warning for Power supply 2: warning status; DC undervoltage. This module is on the rear of the shelf at the bottom right.
[node01: dsa_worker3: callhome.shlf.power.intr:error]: Call home for SHELF POWER INTERRUPTED
- Errors from environment monitor
env_mgr: monitor.chassisPowerSupply.degraded:notice
are reported
[node01: env_mgr: monitor.chassisPowerSupply.degraded:notice]: Chassis power supply 2 is degraded: PSU2 VOUT is Critical Low (0 mV)
[node01: env_mgr: monitor.chassisPowerSupply.degraded:notice]: Chassis power supply 2 is degraded: PSU2 IOUT is Critical Low (0 mA)
[node01: env_mgr: monitor.chassisPowerSupply.degraded:notice]: Chassis power supply 2 is degraded: PSU2 POUT is Critical Low (0 mW)
[node01: env_mgr: monitor.chassisPowerSupply.degraded:notice]: Chassis power supply 2 is degraded: PSU2 Curr IIN is Critical Low (0 mA)
[node01: env_mgr: monitor.chassisPowerSupply.degraded:notice]: Chassis power supply 2 is degraded: PSU2 Hot is Critical Low (0 C)
[node01: env_mgr: monitor.chassisPowerSupply.degraded:notice]: Chassis power supply 2 is degraded: PSU2 FAN is Critical Low (0 RPM)
[node01: env_mgr: monitor.chassisPowerSupply.off:notice]: Chassis power supply 2 off.
[node01: env_mgr: callhome.chassis.ps.degraded:error]: Call home for CHASSIS POWER SUPPLY DEGRADED: PS 2
- One of the paths for shelves get degraded, due to power degradation (shelf "lost").
- The affected SAS path's HBA hits an I/O error
[node01: scsi_cmdblk_strthr_admin: scsi.cmd.selectionTimeout:error]: Disk device 0a.02.0: Adapter/target error: HA status 0x7: cdb 0x8f:00000001f3301000:00000400. Targeted device did not respond to requested I/O. I/O will be retried. Disk 0a.02.0 Shelf 2 Bay 0 [NETAPP X380_WVAXE10TA07 NA01] S/N [VHGSWVPM] UID [5000CCA0:C82B747C:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000]
[node01: scsi_cmdblk_strthr_admin: scsi.cmd.selectionTimeout:error]: Disk device 0a.02.1: Adapter/target error: HA status 0x7: cdb 0x28:983cde00:0078. Targeted device did not respond to requested I/O. I/O will be retried. Disk 0a.02.1 Shelf 2 Bay 1 [NETAPP X380_WVAXE10TA07 NA01] S/N [VHGST4XM] UID [5000CCA0:C82B3CE4:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000]
[node01: scsi_cmdblk_strthr_admin: scsi.cmd.selectionTimeout:error]: Disk device 0a.02.3: Adapter/target error: HA status 0x7: cdb 0x8f:00000001f3301800:00000400. Targeted device did not respond to requested I/O. I/O will be retried. Disk 0a.02.3 Shelf 2 Bay 3 [NETAPP X380_WVAXE10TA07 NA01] S/N [VHGSX4LM] UID [5000CCA0:C82B78CC:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000]
- After multiple timeouts against the disks through the affected path, the system declares the disks missing resulting in a Multidisk Panic situation
[node01: config_thread: cf.multidisk.fatalProblem:error]: Node encountered a multidisk error or other fatal error while waiting to be taken over. aggr aggr1_N1: raid volfsm, fatal multi-disk error.. Raid type - raid_tec Group name plex0/rg1 state NORMAL. 12 disks failed in the group. Disk 0a.02.1 Shelf 2 Bay 1 [NETAPP X380_WVAXE10TA07 NA01] S/N [VHGST4XM] UID [5000CCA0:C82B3CE4:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000] error: no valid path to disk. Disk 0a.02.2 Shelf 2 Bay 2 [NETAPP X380_WVAXE10TA07 NA01] S/N [VHGST4UM] UID [5000CCA0:C82B3CD8:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000] error: disk does not exist. Disk 0a.02.3 Shelf 2 Bay 3 [NETAPP X380_WVAXE10TA07 NA01] S/N [VHGSX4LM] UID [5000CCA0:C82B78CC:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000] error: disk does not exist. Disk 0a.02.4 Shelf 2 Bay 4 [NETAPP X380_WVAXE10TA07 NA01] S/N [VHGS921M] UID [5000CCA0:C82A5A44:00000000:00000000:00000000:00000000:00000000:00000000:0000.