Cluster join can fail due to incorrect cabling with BES-53248
Applies to
- Cluster Network Switch (CNS) Broadcom BES-53248
- Switched cluster with cluster ports running at different speeds: 10G and 40G
Issue
- Error joining new nodes from a different platform to an already existing cluster. Example:
Cluster network RPC communication test from local address 169.254.99.130 to 169.254.247.154
Error: Cluster network RPC communication test from local address 169.254.99.130 to 169.254.247.154 failed with subsequent larger RPC request of size 1024 where size 0 succeeded. Possible MTU mismatch on cluster network ports or network switch.
Reason: f_echo_1: RPC: Timed out; netid=tcp fd=253 TO=5.0s TT=5.001s O=1076b I=0b CN=275/2 VSID=-3 169.254.99.130:40455 <-> 169.254.247.154:7815. Verify the network configuration.
- ONTAP Event Messages (EMS) report a
ClusterSwitchConnectivity_Alert
. Example:
[node_name: mgwd: callhome.hm.alert.major:alert]: Call home for Health Monitor process cshm: ClusterSwitchConnectivity_Alert[node_name].
[node_name: cshmd: hm.alert.raised:alert]: Alert Id = ClusterSwitchConnectivity_Alert , Alerting Resource = node_name raised by monitor ethernet-switch
- From the CNS logs, we can see some link downs for in the same port groups:
switch_name TRAPMGR[trapTask]: traputil.c(753) 34807 %% NOTE Link Down: 0/5, Reason Code: 0x62 <189>
switch_name TRAPMGR[trapTask]: traputil.c(753) 34806 %% NOTE Link Down: 0/6, Reason Code: 0x62 <189>
switch_name TRAPMGR[trapTask]: traputil.c(753) 34799 %% NOTE SFP inserted in 0/7