CONTAP-586439: Repetitive false positive cluster switch SwitchSNMPCommunication_Alert
Issue
- ONTAP reports a system health alert regarding failed SNMP communication for both cluster network switches at the same time:
Node: MyCluster-07
Alert ID: SwitchSNMPCommunication_Alert
Resource: MyClusterSwitch01
Severity: Major
Indication Time: Wed Jul 23 10:48:14 2025
Suppress: false
Acknowledge: false
Probable Cause: SNMP communication from the node to the ethernet
switch has failed repeatedly. Invalid SNMP settings
are configured with ONTAP Switch Health Monitoring or
on the Ethernet switch.
Possible Effect: Ethernet switch communication problems and
accessibility issues.
Corrective Actions: 1) Check the SNMPv2c community or SNMPv3 username on the Ethernet switch to verify
that the expected community string or username is configured.
To view the expected community string or username, run the "system switch ethernet show -snmp-config" command.
2) (SNMPv3) Verify that the SNMPv3 credentials are present within ONTAP.
To view the established SNMP logins, run the "security login show -application snmp" command.
If a custom engine-id was provided for the SNMPv3 user,
ensure it is same as that of the remote switch.
Node: MyCluster-07
Alert ID: SwitchSNMPCommunication_Alert
Resource: MyClusterSwitch02
Severity: Major
Indication Time: Wed Jul 23 10:48:14 2025
Suppress: false
Acknowledge: false
Probable Cause: SNMP communication from the node to the ethernet
switch has failed repeatedly. Invalid SNMP settings
are configured with ONTAP Switch Health Monitoring or
on the Ethernet switch.
Possible Effect: Ethernet switch communication problems and
accessibility issues.
Corrective Actions: 1) Check the SNMPv2c community or SNMPv3 username on the Ethernet switch to verify
that the expected community string or username is configured.
To view the expected community string or username, run the "system switch ethernet show -snmp-config" command.
2) (SNMPv3) Verify that the SNMPv3 credentials are present within ONTAP.
To view the established SNMP logins, run the "security login show -application snmp" command.
If a custom engine-id was provided for the SNMPv3 user,
ensure it is same as that of the remote switch. - Verification of SNMP communication with the cluster network switches manually, shows no problems
- The ONTAP cluster switch health monitor alerts can also be different, not generally communication related, but manual validation will always prove them false positive, for example:
SwitchFanNotPresent_Alert - The alerts clear without human intervention automatically after a while (usually minutes), for example:
::> event log show
Mon Sep 29 10:26:33 +0200 [whacko-07: cshmd: hm.alert.raised:alert]: Alert Id = SwitchSNMPCommunication_Alert , Alerting Resource = MyClusterSwitch01 raised by monitor ethernet-switch
Mon Sep 29 10:26:33 +0200 [whacko-07: cshmd: hm.alert.raised:alert]: Alert Id = SwitchSNMPCommunication_Alert , Alerting Resource = MyClusterSwitch02 raised by monitor ethernet-switch
Mon Sep 29 10:31:35 +0200 [whacko-07: cshmd: hm.alert.cleared:notice]: Alert Id = SwitchSNMPCommunication_Alert , Alerting Resource = MyClusterSwitch01 cleared by monitor ethernet-switch
Mon Sep 29 10:31:35 +0200 [whacko-07: cshmd: hm.alert.cleared:notice]: Alert Id = SwitchSNMPCommunication_Alert , Alerting Resource = MyClusterSwitch02 cleared by monitor ethernet-switch
- Before the cshmd alert got raised and before it does clear automatically again, the logfile
messages.logof the cluster node that runs the cshmd process, shows a repetitive log sequence:Sun Sep 28 2025 21:17:06 +02:00 [kern_cshm:info:27384] [Sep 28 21:17:06]: 0x80b12e700: 0: ERR: util::VserverContext: call_rpc 804: mgwd_vserver_info_1 failed with error: RPC: Unable to send
Sun Sep 28 2025 21:17:06 +02:00 [kern_cshm:info:27384] [Sep 28 21:17:06]: 0x80b12e700: 0: ERR: SNMP::Server: Cannot set Cserver context, 259
Sun Sep 28 2025 21:17:06 +02:00 [kern_cshm:info:27384] [Sep 28 21:17:06]: 0x80b12e700: 0: ERR: SNMP::Server: src/snmp/snmp/Context.h 55: VserverContext : Failed to get vserverID from SNMP thread
Sun Sep 28 2025 21:17:06 +02:00 [kern_cshm:info:27384] [Sep 28 21:17:06]: 0x80b12e700: 0: ERR: SNMP::Server: src/snmp/snmp/Context.h 55: VserverContext : Failed to get vserverID from SNMP thread
Sun Sep 28 2025 21:17:06 +02:00 [kern_cshm:info:27384] [Sep 28 21:17:06]: 0x80b12e700: 0: ERR: SNMP: src/snmp/applications/Applications.cpp 68: VserverContext : Failed to get vserverID from SNMP thread
- When the alert is cleared using
system health alert deletecommand, it does raise again immediately unless the above messages.log signature has stopped again already by itself
