Skip to main content
NetApp Knowledge Base

AFF A900 node shuts down without panic string or error messages

Views:
388
Visibility:
Public
Votes:
1
Category:
aff-series
Specialty:
HW
Last Updated:

Applies to

  • ONTAP 9
  • AFF A900
  • ASA A900
  • FAS9500

Issue

  • Node reboots without any panic string or error messages
  • The partner node initiates a takeover and the below events are reported in the event logs:

[Cluster-01: gop_eq_thread: ic.linkStatusChange:info]: HA interconnect: Port ic6a link is down.
[Cluster-01: cf_fastTimeout: cf.ic.heartBeatFailed:error]: HA interconnect: Heartbeat failed.
[Cluster-01: ctrl_hb_port_ic6a: ctrl.rdma.heartBeat:info]: HA interconnect: Missed heartbeat to 192.0.1.5.

[Cluster-01: vifmgr: vifmgr.cluscheck.droppedall:alert]: Total packet loss when pinging from cluster lif Cluster-01_clus2 (node Cluster-01) to cluster lif Cluster-02_clus1 (node Cluster-02).

[Cluster-01: cf_main: cf.fsm.takeover.noHeartbeat:alert]: Failover monitor: Takeover initiated after no heartbeat was detected from the partner node.
[Cluster-01: cf_main: cf.fsm.stateTransit:info]: Failover monitor: UP --> TAKEOVER
[Cluster-01: cf_takeover: ha.takeover.stateChng:debug]: params: {'old_state': 'NOT_IN_TAKEOVER', 'new_state': 'IN_CFO_TAKEOVER'}
[Cluster-01: cf_takeover: cf.fm.takeoverStarted:notice]: Failover monitor: takeover started
  • The BMC CLI command bmc status -d shows the CPU Catastrophic Error being asserted and de-asserted.

Sep 15 01:53:36 BMCxxxx root: eventfifod 47586.00981(n): 171(0xc0ab) : CPU Catastrophic Error asserted
Sep 15 01:53:36 BMCxxxx root: eventfifod 47586.00981(o): 171(0x90ab) : CPU Catastrophic Error de-asserted
Sep 15 01:53:36 BMCxxxx root: eventfifod 47659.00887(n): 17(0xc011) : PCH Platform reset asserted
Sep 15 01:53:36 BMCxxxx root: eventfifod 47659.00887(s): 22(0xe016) : LPC Bus reset asserted
Sep 15 01:53:36 BMCxxxx root: eventfifod 47659.00887(s): 23(0xe017) : TPM Reset asserted
Sep 15 01:53:37 BMCxxxx root: eventfifod 47659.00887(s): 24(0xe018) : NIC0 Reset asserted
Sep 15 01:53:37 BMCxxxx root: eventfifod 47659.00887(s): 25(0xe019) : NIC1 Reset asserted
Sep 15 01:53:37 BMCxxxx root: eventfifod 47659.00887(s): 27(0xe01b) : NVME reset asserted

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.