Element cluster fails to report to ActiveIQ due to out of memory condition in the mNode AIQ collector service
Applies to
- HCI or AFA clusters running Element Software and configured to report to ActiveIQ
- Element management node (mNode) version 11.3 or above running management services 2.15.28
- Hybrid Cloud Control (HCC) version 2.15
Issue
Symptoms can vary across the storage cluster (AFA or HCI storage nodes running Element Software) and HCI compute nodes.
For Storage Clusters:
- the storage cluster is no longer reporting to ActiveIQ
For Compute nodes:
- the compute nodes are no longer reporting to ActiveIQ
- the compute nodes may not show in Hybrid Cloud Control (HCC) on the mNode
The common denominator for both environments is an out of memory error in the container logs for the mnode-svc-aiq-collector service on the mNode.
In /var/log/syslog
:
Memory cgroup out of memory
Task in /docker/af508468f78f4d8fd1811193b19eeecc5da3e3bcb1e64f3835b600976974f257 killed as a result of limit