Skip to main content
NetApp Knowledge Base

Why did we receive an email notification without seeing obvious errors in Active IQ?

Views:
26
Visibility:
Public
Votes:
0
Category:
element-software
Specialty:
solidfire
Last Updated:

Applies to

  • NetApp Element software
  • NetApp SolidFire Active IQ

Answer

When cluster alerts occur simultaneously with a cluster master migration, it is possible the alerts are not sent to Active IQ immediately. Especially if the alerts are related to the releasing cluster master. The alerts are being kept in a stale status until the next time the cluster master migrates and scrubs all the alerts, even if the alert has been resolved in the meantime.
 
Both resolved and unresolved alerts will still be considered as "new" and an email will be sent out regardless with the notification after the next cluster master migration.
 
  • To find the alerts in Active IQ, go to Reporting > Errors
    • For unresolved alerts: sort on Date
    • For resolved alerts: sort on Resolution Time
    • Alternatively filter on the Alert ID from the email
       
  • To identify a cluster master migration:
    • In Active IQ go to Reporting > Events
    • Filter for clusterMasterEvent
    • Note: The event list only keeps track of the last 10 000 events. If overwritten already, NetApp Support can still trace further back in time from the storage logs.

Please keep in mind this article only explains the behaviour of the delayed email and does not address the origin of the alert. Further investigation may still be required what triggered the alert.

This article also only relates to isolated instances. If the alert emails keep recurring, please see Repeated alerts from SolidFire cluster which have been resolved.

Additional Information

Example of notification mail:

Alert ID: <#>
Severity: error 
Cluster: <CLUSTER NAME>
Occurrence Time: 2021-07-06 15:42:30 UTC
Notification Time: 2021-07-21 19:51:51 UTC
  --->> Substantial difference between the time stamps

clusterFaultID: 102
Additional Detail:

clusterFaultID: 102
nodeHardwareFaultID: 593
code: nodeOffline
details: The SolidFire Application cannot communicate with Storage node having node ID 1.
severity: error
date: 2021-07-06T15:42:53.164604Z
resolved: true
type: node
nodeID: 1