"Cluster monitoring failed" or "Cluster not reachable" alert randomly received from Active IQ Unified Manager
Applies to
- Active IQ Unified Manager 9.6, 9.7 (UM)
- Oncommand Unified Manager 6x/7x (UM)
Issue
"Cluster monitoring failed"
alert is received for random cluster at random times. However, the acquisition succeeds most of the times and there is no performance data gap observed.- "Cluster cannot be reached" email alerts are issued intermittently and become obsolete after approximately 15 minutes.
Errors similar to the bellow in
au.log
2020-03-19 06:35:07,290 ERROR [pool-3-thread-956] c.o.s.a.d.n.NetAppOCIEArchivePerformancePackage (NetAppOCIEArchivePerformancePackage.java:307) - Failed to get archive file names from zapi. java.net.ConnectException: Connection timed out (Connection timed out)
at java.base/java.net.PlainSocketImpl.socketConnect(Native Method) ~[?:?]
[...]
... 20 more
Wrapped by: com.onaro.sanscreen.acquisition.framework.datasource.DataSourceErrorException: Communication problem with the cluster: <cluster_ip>
at com.onaro.sanscreen.acquisition.framework.datasource.DataSourceErrorException.createWithEnhanced(DataSourceErrorException.java:73) ~[au-framework.jar:9.6.0-2019.06.J5087]
[...]
ocumserver.log
[com.netapp.ipc.jms.OCIE_Events] OCIE JMS notification message received: {WarningCount=0, DatasourceName=x.x.x.x, DatasourceID=12,
Error0_ClusterManagementIP=x.x.x.x, PackageName=netappfoundation, TotalReportTime=569, PollStartTime=1591613772703, ErrorCount=1,
Success=false, DurationTime=23248, Error0_Message=Failed to connect to the cluster., TotalZAPITime=-1, NotificationType=PACKAGE_COMPLETED, Error0_Type=NETWORK_ACCESS_FAILURE, UpdateTime=1591613796437, Error0_Port=443, MessageType=PACKAGE_NOTIFICATION,
Error0_Zapi=service-processor-get}