During cluster expansion from 2 nodes to 4 nodes, VMs rebooted
Applies to
- ONTAP 9
- Cisco Nexus 3232C
Issue
- VMs rebooted during cluster expansion from 2 node to 4 nodes(switchless to switched convertion).
- Migrate from a two-node switchless cluster to a cluster with Cisco Nexus 3232C cluster switches procedure was being followed.
- After connecting port e0a to new switch, we could see alot of CRC errors being reported which caused cluster network to go into degraded state.
Before port was moved to switch:
-- interface e0a (244 days, 0 hours, 9 minutes, 21 seconds) --
RECEIVE
Total frames: 257g | Frames/second: 12224 | Total bytes: 276t
Bytes/second: 13104k | Total errors: 0 | Errors/minute: 0
Total discards: 0 | Discards/minute: 0 | Multi/broadcast: 1777k
Non-primary u/c: 0 | CRC errors: 0 | Runt frames: 0
After port was moved to switch:
-- interface e0a (244 days, 0 hours, 32 minutes, 51 seconds) --
RECEIVE
Total frames: 257g | Frames/second: 12223 | Total bytes: 276t
Bytes/second: 13107k | Total errors: 46094 | Errors/minute: 0
Total discards: 0 | Discards/minute: 0 | Multi/broadcast: 1777k
Non-primary u/c: 0 | CRC errors: 43246 | Runt frames: 0
EMS
[Node2-02: vifmgr: vifmgr.cluscheck.hwerrors:alert]: Port e0a on node tenali-02 is reporting a high number (at least 1 per 1000 packets) of observed hardware errors (CRC, length, alignment, dropped).
[Node2-02: vifmgr: callhome.clus.net.degraded:alert]: Call home for CLUSTER NETWORK DEGRADED: CRC Errors Detected - High CRC errors detected on port e0a node Node2-02
- We could see Vifmgr going offline.
Vifmgr
[kern_vifmgr:info:7049] [0x80c13b200] [FailoverMgr::localNodeDown] VifMgr on node tenali-01 is now out of quorum.
[kern_vifmgr:info:6866] [0x80c140700] [FailoverMgr::localNodeDown] VifMgr on node tenali-02 is now out of quorum.
