AIQ Unified Manager 'Cluster monitoring failed' or 'Cluster not reachable' alert due to network lag with ONTAP cluster
Applies to
- Active IQ Unified Manager 9.6 + (UM)
- Oncommand Unified Manager 6x/7x/9.x (UM)
- All OS Platforms
Issue
Cluster monitoring failed
alert received for random cluster at random timesCluster cannot be reached
email alerts are issued intermittently and become obsolete after subsequent pollings- au.log:
2020-03-19 06:35:07,290 ERROR [pool-3-thread-956] c.o.s.a.d.n.NetAppOCIEArchivePerformancePackage (NetAppOCIEArchivePerformancePackage.java:307) - Failed to get archive file names from zapi. java.net.ConnectException:
Connection timed out (Connection timed out)
at java.base/java.net.PlainSocketImpl.socketConnect(Native Method) ~[?:?]
[...]
... 20 more
Wrapped by: com.onaro.sanscreen.acquisition.framework.datasource.DataSourceErrorException: Communication problem with the cluster: <cluster_ip>
at com.onaro.sanscreen.acquisition.framework.datasource.DataSourceErrorException.createWithEnhanced
DataSourceErrorException.java:73) ~[au-framework.jar:9.6.0-2019.06.J5087]
[...]
- ocumserver.log:
[com.netapp.ipc.jms.OCIE_Events] OCIE JMS notification message received: {WarningCount=0, DatasourceName=x.x.x.x, DatasourceID=12,
Error0_ClusterManagementIP=x.x.x.x, PackageName=netappfoundation, TotalReportTime=569, PollStartTime=1591613772703, ErrorCount=1,
Success=false, DurationTime=23248, Error0_Message=Failed to connect to the cluster., TotalZAPITime=-1, NotificationType=PACKAGE_COMPLETED, Error0_Type=NETWORK_ACCESS_FAILURE, UpdateTime=1591613796437, Error0_Port=443, MessageType=PACKAGE_NOTIFICATION,
Error0_Zapi=service-processor-get}