Cluster communication issue after node reboot due to CRCs on cluster port

Last updated

Nov 2, 2023
Save as PDF
Share
1. Share
2. Tweet
3. Share

Views:: 182

Visibility:: Public

Votes:: 0

Category:: fas-systems

Specialty:: HW

Last Updated:: 11/2/2023, 2:19:28 PM

Applies to

AFF A700s
Switchless cluster

Issue

During a window maintenance, the node is rebooted manually
After the node reboots, there are cluster communication issues leading to the following outputs
The aggregates of the partner node are shown as unknown:

cluster::> aggr show

Info: Node cluster-NodeB that hosts aggregate cluster_DATA_AGGR is offline Node cluster-NodeB that hosts aggregate cluster_ROOT is offline

Aggregate Size Available Used% State #Vols Nodes RAID Status --------- -------- --------- ----- ------- ------ ---------------- ------------ clusterA_DATA_AGGR 44.24TB 11.31TB 74% online 33 cluster- raid_dp, NodeA normal clusterA_ROOT 992.7GB 48.09GB 95% online 1 cluster- raid_dp, NodeA normal clusterB_DATA_AGGR - - - unknown - cluster- - NodeB clusterB_ROOT - - - unknown - cluster- - NodeB 4 entries were displayed.

Cluster ports are up and seemingly healthy

cluster::> port show

Node: cluster-NodeA Speed(Mbps) Health Port IPspace Broadcast Domain Link MTU Admin/Oper Status --------- ------------ ---------------- ---- ---- ----------- -------- e0M Default - up 1500 auto/1000 - e0a Cluster - up 9000 auto/40000 - e0e Cluster - up 9000 auto/40000 - e0f Default - up 9000 auto/10000 - e0g Default - up 9000 auto/10000 - e0h Default - up 9000 auto/10000 - e0i Default - up 9000 auto/10000 - e5a Default - up 9000 auto/40000 - e5e Default - down 9000 auto/auto - 9 entries were displayed.

Cluster rings show offline for the partner node

cluster::> set diagnostic

Warning: These diagnostic commands are for use by NetApp personnel only. Do you want to continue? {y|n}: y

cluster::*> cluster ring show Node UnitName Epoch DB Epoch DB Trnxs Master Online --------- -------- -------- -------- -------- --------- --------- cluster-NodeA mgmt 0 38 137123 - offline cluster-NodeA vldb 0 38 138432 - offline cluster-NodeA vifmgr 0 38 794750 - offline cluster-NodeA bcomd 0 38 80 - offline cluster-NodeA crs 0 38 1 - offline cluster-NodeB mgmt 40 40 446 cluster-NodeB master cluster-NodeB vldb 40 40 187 cluster-NodeB master cluster-NodeB vifmgr 40 40 60 cluster-NodeB master cluster-NodeB bcomd 40 40 7 cluster-NodeB master

Node UnitName Epoch DB Epoch DB Trnxs Master Online --------- -------- -------- -------- -------- --------- --------- cluster-NodeB crs 40 40 1 cluster-NodeB master 10 entries were displayed.

The node is reports cluster health: false

cluster::> cluster show Node Health Eligibility --------------------- ------- ------------ cluster-NodeA false true cluster-NodeB true true

Warning: Cluster HA is not working correctly. Make sure that both nodes are healthy by using the "cluster show" command; then reconfigure cluster HA to correct the configuration. Check the output of "cluster ha show" following the reconfiguration to verify node health. If reconfiguring cluster HA does not resolve the issue, contact technical support for assistance. 2 entries were displayed.