All collectors run into Error status with Error Code AGENT008 at the same time
Applies to
- Data Infrastructure Insights (DII) (formerly Cloud Insights)
- Storage Workload Security
Issue
- Alert emails from accounts@service.cloudinsights.netapp.com are sent:
- subject:
Critical Health Alert: Storage Workload Security Data Collector '<Collector Name>' is disconnected
Description: SVM Data Collector '<Collector Name>' is disconnected. The SVM is not monitored and protected.
Error: Failed to determine the health of the collector within 2 retries, try restarting the collector again(Error Code: AGENT008)
- subject:
Warning Health Alert: Storage Workload Security User Directory Collector '<Collector Name>' is disconnected
Description: User Directory Collector '<Collector Name>' is disconnected. Users' information is not updated.
Error: Failed to determine the health of the collector within 2 retries, try restarting the collector again(Error Code: AGENT008)
- All collectors displayed via followings are in
Error
status with message:- Workload Security > Collectors > Data Collectors
- Workload Security > Collectors > User Directory Collectors
Failed to determine the health of the collector within 2 retries, try restarting the collector again(Error Code: AGENT008)
agent.log
indicates that it fails to get status of collector withcertificate_unknown
then removes it from the monitoring target
[ERROR] [prod] [<TENANT_ID>] [<AGENT_UUID>] [agent-AgentDataSourceStateManagerActor] - Failed to get state of <DATASOURCE_UUID>, reason: NotAfter: <TIMESTAMP>
..
[ERROR] [prod] [<TENANT_ID>] [<AGENT_UUID>] [agent-AgentDataSourceStateManagerActor] - Failed to get state of <DATASOURCE_UUID>, reason: Received fatal alert: certificate_unknown
..
[INFO] [prod] [<TENANT_ID>] [<AGENT_UUID>] [agent-AgentDataSourceStateManagerActor] - Removed collector: <DATASOURCE_UUID> from monitoring
..
[INFO] [prod] [<TENANT_ID>] [<AGENT_UUID>] [agent-AgentDataSourceStateManagerActor] - All collector health status has been updated- stateMap: [Map(<DATASOURCE_UUID> -> error)], statusMap: [Map(<DATASOURCE_UUID> -> Failed to determine the health of the collector within 2 retries, try restarting the collector again(Error Code: AGENT008))]
..
[WARN] [prod] [<TENANT_ID>] [<AGENT_UUID>] [agent-AgentDataSourceJvm] - Skipped Refresh Jwt as the datasource <DATASOURCE_UUID> is not running