ClusterSwitchConfig_Alert reported in system health alert for non-existing switch
Applies to
- ONTAP 9
- Cluster Switch Health Monitor (CSHM) AutoSupport message
Issue
- Subsystem health for switch gets degraded:
cluster1::> system health subsystem show
Subsystem Health
----------------- ------------------
SAS-connect ok
Environment ok
Memory ok
Service-Processor ok
Switch-Health degraded
CIFS-NDO ok
Motherboard ok
IO ok
MetroCluster ok
MetroCluster_Node ok
FHM-Switch ok
FHM-Bridge ok
SAS-connect_Cluster ok
system health alert show
example:
Node: node_name1
Resource: node_name2
Severity: Major
Indication Time: Fri Aug 13 23:03:53 2021
Suppress: false
Acknowledge: false
Probable Cause: One or more nodes are not connected to both cluster
switches.
Possible Effect: If one cluster switch fails, "node_name2" might lose
access to the cluster.
Corrective Actions: Ensure the switch "no_switch_name1" is connected
to the node "node_name2".
Node: node_name1
Resource: no_switch_name1
Severity: Major
Indication Time: Fri Aug 13 23:29:53 2021
Suppress: false
Acknowledge: false
Probable Cause: Cluster switch "no_switch_name1" with IP address
"123.123.123.123" is not reachable via SNMP. Incorrect
SNMP community string might be configured on the
cluster switch.
Possible Effect: Cluster switch communication problems and
accessibility issues.
Corrective Actions: Check the SNMP community string on the cluster switch
to verify the expected community string is configured.
Use the "system cluster-switch show -snmp-config"
command to view the expected community string.
- The switch name and IP address in the alert do not belong to any cluster switches or is reported in the case of a switchless cluster.
- EMS log example:
Fri Aug 13 23:05:50 +0100 [node_name1: mgwd: callhome.hm.alert.major:alert]: Call home for Health Monitor process cshm: ClusterSwitchConfig_Alert[node_name2].
Fri Aug 13 23:30:51 +0100 [node_name1: mgwd: callhome.hm.alert.major:alert]: Call home for Health Monitor process cshm: SwitchCommunityString_Alert[no_switch_name1].
Fri Aug 13 23:03:54 +0100 [node_name1: cshmd: hm.alert.raised:alert]: Alert Id = ClusterSwitchConfig_Alert , Alerting Resource = node_name2 raised by monitor cluster-switch
Fri Aug 13 23:29:53 +0100 [node_name1: cshmd: hm.alert.raised:alert]: Alert Id = SwitchCommunityString_Alert , Alerting Resource = no_switch_name1 raised by monitor cluster-switch
Fri Aug 13 23:29:53 +0100 [node_name1: cshmd: hm.alert.raised:alert]: Alert Id = SwitchCommunityString_Alert , Alerting Resource = no_switch_name2 raised by monitor cluster-switch
network port show -node * -role cluster -fields remote-device-id
reports the correct cluster network switches.