Node panics and no longer joins the cluster on reboot
Applies to
- FAS2750
- AFF-A250
- MCC-IP (MetroCluster IP)
- MetroCluster Switch RCF upgrade
Issue
- During RCF upgrade to switches in a MCC-IP, one controller experiences a CLAM panic
Aug 01 17:40:16 [node01:vifmgr.clus.linkdown:EMERGENCY]: The cluster port e0a on node Node-01 has gone down unexpectedly. Aug 01 17:47:34 [node01:vifmgr.clus.linkdown:EMERGENCY]: The cluster port e0a on node Node-01 has gone down unexpectedly. Aug 01 17:52:10 [node01:vifmgr.clus.linkdown:EMERGENCY]: The cluster port e0a on node Node-01 has gone down unexpectedly. Aug 01 18:21:34 [node01:vifmgr.clus.linkdown:EMERGENCY]: The cluster port e0b on node Node-01 has gone down unexpectedly. PANIC : Received PANIC packet from partner, receiving message is (Coredump and takeover initiated because Connectivity, Liveliness and Availability Monitor (CLAM) has determined this node is out of quorum.)
- Upon reboot cluster ports e0a/e0b are up but the nodes are not healthy
::*> network port show -role cluster
Auto-Negot Duplex Speed (Mbps)
Node Port Role Link MTU Admin/Oper Admin/Oper Admin/Oper
------ ------ ------------ ---- ----- ----------- ---------- ------------
node01
e0a cluster up 9000 true/true full/full auto/10000
e0b cluster up 9000 true/true full/full auto/10000
::> cluster show
Node Health Eligibility Epsilon
-------------------- ------- ------------ ------------
node01 false true false
node02 false true false
storage failover show
reports the node has not started its applications
::> storage failover show
Takeover
Node Partner Possible State Description
-------------- -------------- -------- -------------------------------------
node01
node02 true Connected to node02
node02
node01 true Connected to node01.
Waiting for cluster applications to
come online on the local node.
Offline applications: vldb, vifmgr,
bcomd, crs, scsi blade, clam.
2 entries were displayed.