CSHM: ClusterSwitchConfig_Alert: System health degraded due to wrong Cluster Switch cabling

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

Views:: 4,191

Visibility:: Public

Votes:: 0

Category:: fabric-interconnect-and-management-switches

Specialty:: hw

Last Updated:

Applies to

ONTAP 9
Cluster Switch Health Monitor (CSHM) AutoSupport message

Issue

System health is degraded

cluster::>system health status show

Status

---------------

degraded

Switch-Health subsystem is degraded

cluster::> system health subsystem show

Subsystem         Health

----------------- ------------------

SAS-connect       ok

Environment       ok

Memory            ok

Service-Processor ok

Switch-Health     degraded

CIFS-NDO          ok

Motherboard       ok

IO                ok

MetroCluster      ok

MetroCluster_Node ok

FHM-Switch        ok

FHM-Bridge        ok

SAS-connect_Cluster ok

13 entries were displayed.

The following health alerts are logged

::> system health alert show

Node: node01

Resource: node01

Severity: Major

Indication Time: Mon Oct 07 13:19:08 2019

Suppress: false

Acknowledge: false

Probable Cause: One or more nodes are not connected to both cluster switches.

Possible Effect: If one cluster switch fails, "node01" might lose access to the cluster.

Corrective Actions: Ensure the switch "switch02" is connected to the node "node01".



Node: node01

Resource: node02

Severity: Major

Indication Time: Mon Oct 07 13:19:08 2019

Suppress: false

Acknowledge: false

Probable Cause: One or more nodes are not connected to both cluster switches.

Possible Effect: If one cluster switch fails, "node02" might lose access to the cluster.

Corrective Actions: Ensure the switch "switch01" is connected to the node "node02".



2 entries were displayed.

::> system health alert show Node: Node-01 Alert ID: ClusterSwitchConnectivity_Alert Resource: Node01 Severity: Major Indication Time: Sat Feb 17 02:00:36 2024 Suppress: false Acknowledge: false Probable Cause: One or more nodes are not connected to both cluster switches. Possible Effect: If one cluster switch fails, Node01 might lose access to the cluster.

4. You could also see the following heath alerts

cluster::> system health alert show Node: cluster-01 Resource: Ethernet1/1 Severity: Minor Indication Time: Wed Aug 25 05:23:27 2021 Suppress: false Acknowledge: false Probable Cause: MTU value "1500" on port "e0e" of node "cluster-01" is improperly set. It should be 9000. Possible Effect: Received Ethernet packets that are larger than the configured MTU are dropped, causing data transfer issues. Corrective Actions: modify the MTU using the command "network port broadcast-domain modify -ipspace Cluster -broadcast-domain Cluster -mtu <MTU>". To find out the broadcast domain name of the port, execute the command "network port broadcast-domain show".

5. AutoSupport reports:

HMSCSA:HA Group Notification from cluster1-03 (Health Monitor process cshm: ClusterSwitchConfig_Alert[cluster1-03]) ERROR