Cluster network degraded alerts and takeover not possible on AFF A800
Applies to
- A800
- X1146A
Issue
- Receiving daily 'CLUSTER NETWORK DEGRADED' alerts
[cluster-01: vifmgr: callhome.clus.net.degraded:alert]: Call home for CLUSTER NETWORK DEGRADED: Total Packet Loss - Ping failures detected between cluster-01_clus2 ( 169.254.32.8 ) on cluster-01 and cluster-02_clus1 ( 169.254.99.167 ) on cluster-02
- Cluster also triggers alerts regarding unsynchronized NVRAM logs causing takeover being disabled
[cluster-01: statd: cf.takeover.disabled:alert]: HA mode, but takeover of partner is disabled due to reason : unsynchronized log.
- In the EMS log we see the following messages
[cluster-01: nvmm_mirror_sync: nvmm.mirror.aborting:debug]: mirror of sysid 1, partner_type HA Partner and mirror state NVMM_MIRROR_LAYOUT_SYNCING is aborted because of reason NVPM_ERR_MSG_SEND_FAILED.
[cluster-01: nvmm_error: nvmm.mirror.aborting:debug]: mirror of sysid 1, partner_type HA Partner and mirror state NVMM_MIRROR_OFFLINE is aborted because of reason NVMM_ABORT_SYNCING_MIRROR.
[cluster-01: nvmm_helper: nvpm.state.changed:debug]: Node 1's NVPM state changed from "2" to "2".
- These alerts begin triggering after the following message is seen
[cluster-01: intr: netif.fatal.err:alert]: The network device in slot 1 encountered fatal error e1a/e1b.