Cluster join can fail due to incorrect cabling with BES-53248

Last updated

Aug 15, 2024
Save as PDF
Share
1. Share
2. Tweet
3. Share

Views:: 869

Visibility:: Public

Votes:: 0

Category:: fabric-interconnect-and-management-switches

Specialty:: hw

Last Updated:: 8/15/2024, 9:54:16 AM

Applies to

Cluster Network Switch (CNS) Broadcom BES-53248
Switched cluster with cluster ports running at different speeds: 10G and 40G

Issue

Error joining new nodes from a different platform to an already existing cluster. Example:

Cluster network RPC communication test from local address 169.254.99.130 to 169.254.247.154

Error: Cluster network RPC communication test from local address 169.254.99.130 to 169.254.247.154 failed with subsequent larger RPC request of size 1024 where size 0 succeeded. Possible MTU mismatch on cluster network ports or network switch. 

Reason: f_echo_1: RPC: Timed out; netid=tcp fd=253 TO=5.0s TT=5.001s O=1076b I=0b CN=275/2 VSID=-3 169.254.99.130:40455 <-> 169.254.247.154:7815. Verify the network configuration.

ONTAP Event Messages (EMS) report a ClusterSwitchConnectivity_Alert. Example:

[node_name: mgwd: callhome.hm.alert.major:alert]: Call home for Health Monitor process cshm: ClusterSwitchConnectivity_Alert[node_name]. [node_name: cshmd: hm.alert.raised:alert]: Alert Id = ClusterSwitchConnectivity_Alert , Alerting Resource = node_name raised by monitor ethernet-switch

From the CNS logs, we can see some link downs for in the same port groups:

switch_name TRAPMGR[trapTask]: traputil.c(753) 34807 %% NOTE Link Down: 0/5, Reason Code: 0x62 <189> switch_name TRAPMGR[trapTask]: traputil.c(753) 34806 %% NOTE Link Down: 0/6, Reason Code: 0x62 <189> switch_name TRAPMGR[trapTask]: traputil.c(753) 34799 %% NOTE SFP inserted in 0/7