A300 FAS 8200 two PSU show MULTIFAULT, but system alive
Applies to
- AFF-A300
- FAS 8200
- ONTAP 9.X
Issue
- Node1 has taken over Node2
- It can not be connect to service processor of node2, ping sp ip will time out.
- A300 two PSU show
MULTIFAULT
, but node1 alive.
::> run -node * environment status
2 entries were acted on.
Node: node1
Sensor Name State Current Critical Warning Warning Critical
Reading Low Low High High
-------------------------------------------------------------------------------------------------
PSU2 MULTIFAULT
PSU1 MULTIFAULT
EMS log:
[?] Sun Apr 30 09:56:55 +0800 [Node1: env_mgr: monitor.chassisPowerSupply.degraded:notice]: Chassis power supply 2 is degraded: PSU2 Temperature is Unreadable
[?] Sun Apr 30 09:56:55 +0800 [Node1: env_mgr: monitor.chassisPowerSupply.degraded:notice]: Chassis power supply 2 is degraded: PSU2 Current is Unreadable
[?] Sun Apr 30 09:56:55 +0800 [Node1: env_mgr: monitor.chassisPowerSupply.degraded:notice]: Chassis power supply 2 is degraded: PSU2 Fan1 Speed is Unreadable
[?] Sun Apr 30 09:56:55 +0800 [Node1: env_mgr: monitor.chassisPowerSupply.degraded:notice]: Chassis power supply 2 is degraded: PSU2 Fan1 Fault is Unreadable
[?] Sun Apr 30 09:56:55 +0800 [Node1: env_mgr: monitor.chassisPowerSupply.degraded:notice]: Chassis power supply 2 is degraded: PSU2 Fan2 Speed is Unreadable
[?] Sun Apr 30 09:56:55 +0800 [Node1: env_mgr: monitor.chassisPowerSupply.degraded:notice]: Chassis power supply 2 is degraded: PSU2 Fan2 Fault is Unreadable
[?] Sun Apr 30 09:56:55 +0800 [Node1: env_mgr: monitor.chassisPowerSupply.degraded:notice]: Chassis power supply 2 is
[?] Sun Apr 30 09:57:44 +0800 [Node1: env_mgr: monitor.chassisPowerSupply.degraded:notice]: Chassis power supply 1 is degraded: PSU1 InPower Monitor is Unreadable
[?] Sun Apr 30 09:57:44 +0800 [Node1: env_mgr: monitor.chassisPowerSupply.degraded:notice]: Chassis power supply 2 is degraded: PSU2 InPower Monitor is Unreadable
[?] Sun Apr 30 09:57:44 +0800 [Node1: env_mgr: monitor.chassisPowerSupply.degraded:notice]: Chassis power supply 1 is degraded: PSU1 Temperature is Unreadable
[?] Sun Apr 30 09:57:44 +0800 [Node1: env_mgr: monitor.chassisPowerSupply.degraded:notice]: Chassis power supply 1 is degraded: PSU1 Current is Unreadable
[?] Sun Apr 30 09:57:44 +0800 [Node1: env_mgr: monitor.chassisPowerSupply.degraded:notice]: Chassis power supply 1 is degraded: PSU1 Fan1 Speed is Unreadable
- Further Errors cab be seen on the system:
CLTFLT:HA Group Notification from Node1 (CONTROLLER TAKEOVER COMPLETE AUTOMATIC - Communiction Error) ALERT
Fri Aug 09 09:20:44 +0200 [Node1: ctrl_hb_port_ic0a: ctrl.rdma.heartBeat:info]: HA interconnect: Missed heartbeat to <IP-address>.
Fri Aug 09 09:20:44 +0200 [Node1: gop_eq_thread: ic.linkStatusChange:info]: HA interconnect: Port ic0a link is down.
::> event log show -severity * -message-name callhome*
Time Node Severity Event
------------------- ---------------- ------------- ---------------------------
8/9/2024 10:00:14 Node1 ALERT callhome.hainterconnect.down: Call home for HA INTERCONNECT DOWN due to links down.
8/9/2024 09:35:33 Node1 ALERT callhome.hm.alert.critical: Call home for Health Monitor process cphm: CriticalFruMultiFaultAlert[PSQ0A2190600796].
8/9/2024 09:22:49 Node1 ERROR callhome.chassis.ps.degraded: Call home for CHASSIS POWER SUPPLY DEGRADED: PS 1
8/9/2024 09:21:55 Node1 ERROR callhome.chassis.power: Call home for CHASSIS POWER DEGRADED: Power Supply Status Critical: PSU1, PSU2.