- System Event Log (SEL) at BMC/IPMI shows
Correctable ECCalert at the time of event
<Event ID> | <DATE> | <TIME> | Memory | Correctable ECC (@DIMM<ID>(CPU<#>)) | Asserted
- NetApp SolidFire ActiveIQ and cluster UI shows that all block and metadata services of the node are unresponsive shortly after the SEL log entry:
unresponsiveService - A block service is not responding.(This alert will be shown for each block drive)
unresponsiveService - A metadata service is not responding.
- Unhealthy service alerts are shown shortly after the unresponsive services alerts:
blockServiceUnhealthy - A block service is unhealthy and SolidFire is attempting to migrate data away from it.
sliceServiceUnhealthy - A metadata service is unhealthy and SolidFire is attempting to migrate data away from it.
- All drives of the affected node becomes available after a while (NetApp SolidFire ActiveIQ and cluster UI):
driveAvailable - Node ID X has Y available drive(s).