Skip to main content
NetApp Knowledge Base

Cluster network degraded alerts and takeover not possible on AFF A800

Views:
1,071
Visibility:
Public
Votes:
0
Category:
aff-series
Specialty:
hw
Last Updated:

Applies to

  • AFF A800
  • AFF C800
  • X1146A T62100-CR

Issue

  • Receiving daily CLUSTER NETWORK DEGRADED alerts.

[cluster-01: vifmgr: callhome.clus.net.degraded:alert]: Call home for CLUSTER NETWORK DEGRADED: Total Packet Loss - Ping failures detected between cluster-01_clus2 ( 169.254.32.8 ) on cluster-01 and cluster-02_clus1 ( 169.254.99.167 ) on cluster-02

  • Receiving hourly HA interconnect down alerts.
6/21/2024 08:00:00  nodename     ERROR         ic.HAInterconnectDown: HA interconnect: Interconnect down for 93 minutes: link1 down
6/21/2024 07:00:00  nodename     ALERT         callhome.hainterconnect.down: Call home for HA INTERCONNECT DOWN due to link1 down.
  • Cluster also triggers alerts regarding unsynchronized NVRAM logs causing takeover being disabled

[cluster-01: statd: cf.takeover.disabled:alert]: HA mode, but takeover of partner is disabled due to reason : unsynchronized log.

  • In the EMS log we see the following messages

[cluster-01: nvmm_mirror_sync: nvmm.mirror.aborting:debug]: mirror of sysid 1, partner_type HA Partner and mirror state NVMM_MIRROR_LAYOUT_SYNCING is aborted because of reason NVPM_ERR_MSG_SEND_FAILED.
[cluster-01: nvmm_error: nvmm.mirror.aborting:debug]: mirror of sysid 1, partner_type HA Partner and mirror state NVMM_MIRROR_OFFLINE is aborted because of reason NVMM_ABORT_SYNCING_MIRROR.
[cluster-01: nvmm_helper: nvpm.state.changed:debug]: Node 1's NVPM state changed from "2" to "2".

  • These alerts begin triggering after the following message is seen

[cluster-01: intr: netif.fatal.err:alert]: The network device in slot 1 encountered fatal error e1a/e1b.

  • Service processor logs show:

e1a/e1b:Fatal parity error (0x10)
PL_PERR_CAUSE 0x00004000 PL_PERR_ENABLE 0x1fffe3ff
PCIE_INT_CAUSE 0x40002000
t6nex2: encountered fatal error, adapter stopped.
e1a/e1b:PCI DMA channel write request parity error (0x2000)
t6nex2: encountered fatal error, adapter stopped.

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.