What alerts and components of a switch are monitored by CSHM?
Applies to
- ONTAP 9
- Cluster Switch Health Monitor (CSHM) AutoSupport message
Answer
What is CSHM?
The Cluster Switch Health Monitor (CSHM) is a proactive, integrated on-the-box monitoring system that checks the health of cluster and management switches through the system health monitoring framework. It ensures that the switches are functioning optimally and alerts administrators to any potential issues.
Components Monitored by CSHM:
-
Environmental Subsystem:
- Fan
- Temperature
- Power Supply Unit (PSU)
- Voltage
-
Port Status:
- Inter-Switch Link (ISL) ports
- Node ports
-
Interface Configuration:
- Speed
- Duplex
- Interface Counters
-
Software Configuration:
- Switch software version
Alerts Monitored :
Cluster Switch Temperature Alerts :
- Alerts if switch temperature exceeds Minor threshold
- Alerts if switch temperature exceeds Major threshold
- Alerts if switch temperature status can’t be read
Cluster Switch Fan Alerts :
- Alerts if switch fan sensor status is failed
- Alerts if switch fan sensor status is not operational
Cluster Switch Power Alerts :
- Alerts for PSU failure in the switch
- Alerts for Not operational PSU in the switch
Cluster Switch Interface Alerts :
- Alerts for non-working ISL port
- Alerts for incorrect duplex setting on an interface
Alerts Triggered by CSHM Subsystem:
- UnsupportedSwitch_Alert
- ClusterSwitchConfig_Alert
- ClusterSwitchMissing_Alert
- ClusterIfIslDownWarn_Alert
- SwitchCommunityString_Alert
- SwitchIfIslDownWarn_Alert
- SwitchIfOutErrorsWarn_Alert
- SwitchIfInErrorsWarn_Alert
- SwitchLinkDiscoveryProtocol_Alert
- SwitchFanFail_Alert
- SwitchFanNotPresent_Alert
- SwitchFanNotOperational_Alert
- SwitchPsuFanNotOperational_Alert
- SwitchPowerFail_Alert
- SwitchPowerNotPresent_Alert
- SwitchPowerNotOperational_Alert
- SwitchSNMPCommunication_Alert
- SwitchTemperatureWarn_Alert
- SwitchTemperatureNotOperational_Alert
- SwitchEndOfSupport_Alert
By monitoring these components and triggering alerts, CSHM helps maintain the health and performance of your network infrastructure.
