Power outages cause service disruption without failover
Applies to
- ONTAP 9
- AFF and FAS systems
Issue
- Power outages cause service disruption
- Failover did not occur despite HA configuration being operational
- Both nodes reported Multi-disk panics (MDP) at the same time
Node1 log:
Sat May 07 17:02:45 +0800 [Node 1: asd_asyncd_1a: sas.port.down:debug]: SAS port "1c" went down.
Sat May 07 17:02:45 +0800 [Node 1: asd_asyncd_0: sas.port.down:debug]: SAS port "0b" went down.
Sat May 07 17:02:45 +0800 [Node 1: asd_asyncd_0: sas.port.down:debug]: SAS port "0d" went down.
Sat May 07 17:03:27 +0800 [Node 1: config_thread: cf.multidisk.fatalProblem:info]: Node encountered a multidisk error or other fatal error while waiting to be taken over. aggr ggr1xx: raid volfsm, fatal multi-disk error.. Raid type - raid_dp Group name plex0/rg1 state NORMAL. 8 disks failed in the group.
Sat May 07 17:08:57 +0800 [Node 1: send_boot_msg_thread: mgr.boot.reason_ok:notice]: System rebooted after power-on.
Sat May 07 17:08:57 +0800 [Node 1: send_boot_msg_thread: callhome.reboot.poweron:notice]: Call home for REBOOT (power on)
Node2 log:
Sat May 07 17:02:45 +0800 [Node 2: asd_asyncd_1a: sas.port.down:debug]: SAS port "1c" went down.
Sat May 07 17:02:45 +0800 [Node 2: asd_asyncd_0: sas.port.down:debug]: SAS port "0b" went down.
Sat May 07 17:02:45 +0800 [Node 2: asd_asyncd_0: sas.port.down:debug]: SAS port "0d" went down.
Sat May 07 17:03:26 +0800 [Node 2: fmmbx_instanceWorker: cf.multidisk.fatalProblem:info]: Node encountered a multidisk error or other fatal error while waiting to be taken over. Permanent errors on all HA mailbox disks (while marshalling header).
Sat May 07 17:08:54 +0800 [Node 2: send_boot_msg_thread: mgr.boot.reason_ok:notice]: System rebooted after power-on.
Sat May 07 17:08:54 +0800 [fas8300_n02: send_boot_msg_thread: callhome.reboot.poweron:notice]: Call home for REBOOT (power on)