Skip to main content
NetApp Knowledge Base

How to identify ONTAP EMS events for subscription in Active IQ Unified Manager

Views:
3,139
Visibility:
Public
Votes:
1
Category:
active-iq-unified-manager
Specialty:
om
Last Updated:

Applies to

  • ONTAP 9.3.x and above
  • ONTAP EMS
  • OnCommand Unified Manager 7.x - 9.5.x
  • Active IQ Unified Manager 9.6.x and above

Description

OnCommand Unified Manager has been rebranded to Active IQ Unified Manager as of the 9.6 release.  For the purposes of this article, Unified Manager will refer to OnCommand Unified Manager 7.x - 9.5.x and Active IQ Unified Manager 9.6.x and above.   

Unified Manager introduced a feature in the 7.0 release that allows Unified Manager to subscribe to specific ONTAP EMS events. In this scenario, Unified Manager serves as a centralized location to receive critical EMS events and send notifications to administrators through alerts configured against the EMS events. Centralized monitoring and management eases configuration of critical EMS events and alert notifications based on these critical EMS events.  

Procedure

Querying for EMS events on Data ONTAP 9.x

Unified Manager requires a specific text string as input to configure an EMS Subscription. This text string is the EMS Message Name. The Message Name may be determined by issuing clustershell (CLI) commands on a running cluster to query the EMS Catalog directly.

Pre-requisites:
  • You must have a role that allows clustershell (CLI) access (via an SSH client) to run the event* commands for the ONTAP releases supported by Unified Manager.
  • All commands run as examples are from the clustershell and require no elevated privileges.

To simplify the command entry and minimize typographical errors, use tab completion after part of a text string has been entered.

CAUTION

Do NOT use events marked as deprecated for EMS Subscription configuration in Unified Manager or direct configuration on a cluster.  Deprecated events are subject to removal at any time.

What are the severities and severity definitions?

ONTAP 9.3+
clu93::> event catalog show -severity ?
  EMERGENCY                   Disruption
  ALERT                       Single point of failure
  ERROR                       Degradation
  NOTICE                      Information
  INFORMATIONAL               Information
  DEBUG                       Debug information‌

What types of events result in an AutoSupport ('callhome', destination is 'asup') for shelf faults (shlf* is a wildcard search)? These are additional high-severity events of interest (see the EMS Configuration Express Guide). Note the message name in the first column of outputs below. This message name is the string required when configuring the Unified Manager EMS Subscription feature. Wildcards may be used to search for events.

ONTAP 9.3+
clu93::> event catalog show -message-name callhome.shlf*   
Message                          Severity         SNMP Trap Type
-------------------------------- ---------------- -----------------
callhome.shlf.fan                EMERGENCY        Severity-based
callhome.shlf.fan.warn           ERROR            Severity-based
callhome.shlf.fault              ERROR            Severity-based
callhome.shlf.overtemp           ERROR            Severity-based
callhome.shlf.power.intr         ERROR            Severity-based
callhome.shlf.ps.fault           ERROR            Severity-based
6 entries were displayed.

Run the event catalog show -message-name to view more information about the EMS event.

ONTAP 9.3+
clu93::> event catalog show -message-name callhome.shlf.fan     

     Message Name: callhome.shlf.fan
         Severity: EMERGENCY
      Description: This message occurs when the system detects faulty hardware on the disk shelf, such as a fan, power supply unit (PSU), or failing temperature sensor. The problem might be environmental (temperature or faulty power) or hardware-related. If your system is configured to do so, it generates and transmits an AutoSupport (or 'call home') message to NetApp technical support and to the configured destinations. Successful delivery of an AutoSupport message significantly improves problem determination and resolution.
Corrective Action: Evaluate the environment in which your system is operating and identify whether the problem is environmental or hardware-related. Your system should be in a room with an operating temperature of 18C to 24C (65F to 75F). If faulty hardware caused the error, such as a bad temperature sensor or a broken fan, replace the faulty part as soon as possible. If you need assistance, contact NetApp technical support.
   SNMP Trap Type: Severity-based
    Is Deprecated: false   

What about another event? Example: SnapMirror backups are critical to the business and monitoring status is important.

ONTAP 9.3+
clu93::> event catalog show -message-name snapmirror*
Message                          Severity         SNMP Trap Type
-------------------------------- ---------------- -----------------
snapmirror.block.on.reconstruct  NOTICE           Severity-based
snapmirror.block.reconstructErr  ERROR            Severity-based
snapmirror.conf.depre.cpsync     ERROR            Severity-based
snapmirror.conf.full             ERROR            Severity-based
snapmirror.conf.invalidStr       ERROR            Severity-based
snapmirror.conf.obsolete.nvsync  ERROR            Severity-based

The 'status' items are the events the business needs to monitor, specifically for errors in SnapMirror updates:

ONTAP 9.3+
clu93::> event catalog show -message-name snapmirror.status*
Message                          Severity         SNMP Trap Type
-------------------------------- ---------------- -----------------
snapmirror.status.dstUpdateSnapErr 
                                 ERROR            Severity-based
snapmirror.status.illegalSrcPath ERROR            Severity-based
snapmirror.status.noBaseSnapshot ERROR            Severity-based
snapmirror.status.updateStatusErr 
                                 ERROR            Severity-based
4 entries were displayed.

The details of the snapmirror.status.updateStatusErr event confirms when the event will be triggered for monitoring and a possible corrective action to remediate the issue, if encountered:

ONTAP 9.3+
clu93::> event catalog show -message-name snapmirror.status.updateStatusErr 

     Message Name: snapmirror.status.updateStatusErr
         Severity: ERROR
      Description: This event is generated when Data ONTAP cannot update the ONTAP system registry with upgraded snapmirror status information. Insufficient disk space on the root volume is the most common reason for this failure.
Corrective Action: Check if the root volume is out of disk space by issuing the 'df' command from the appliance CLI. If the root volume is full, either free up space or add more disks on the volume.
   SNMP Trap Type: Severity-based
    Is Deprecated: false
 

Inspecting a Data ONTAP 9.x EMS Catalog File

There are three ways to inspect the EMS Catalog:

  1. File-based, using search capabilities in the application that opens the file. Every running ONTAP cluster has a copy of the EMS Catalog, located on a cluster node: /etc/ems/ems_catalog.ems.  This ems_catalog.ems file may be downloaded and opened in any text editor (the file format is XML) for inspection. The XML files are also available in this article.  Refer to the 'Additional Information' section below to download the EMS Catalog, organized by ONTAP release.
  2. EMS Reference in docs.netapp.com.   Can view specific event types such as "callhome" or export specific event types to PDF.
  3. Review the EMS Catalog documentation (PDF format). The EMS Catalog PDF files are available at the Documentation by Product Library: ONTAP 9 in the "More Resources" section, or by visiting a release-specific link below:

CAUTION

Do NOT use events marked as deprecated for EMS Subscription configuration in Unified Manager or direct configuration on a cluster.  Deprecated events are subject to removal at any time.  For ONTAP 9.0, do NOT use the INFORMATIONAL severity class as it has been deprecated.

EMS Catalog files, XML format:

 

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.