Skip to main content
NetApp Knowledge Base

How to troubleshoot correctable memory errors on FAS and AFF systems

Views:
24,949
Visibility:
Public
Votes:
14
Category:
ontap-9
Specialty:
hw
Last Updated:

Applies to

  • ONTAP 9
  • Data ONTAP 8
  • AFF / FAS platforms
  • DIMM Replacement Guide

Answer

Check Active IQ to see if CECC memory impact your systems.

Choose the appropriate guide based on platform and ONTAP version.

Platform System or NVRAM ONTAP Version Guide
  • AFF A900 / FAS9500
  • AFF A800
  • AFF A700s
  • AFF A700 / FAS9000
  • AFF A400 / FAS8300 / FAS8700
  • AFF A300 / FAS8200
  • AFF A250 / FAS500f
  • AFF A220 / AFF C190 / FAS27x0
  • AFF A200 / FAS26x0
  • AFF80x0 / FAS80x0
System DIMM
  • 9.1P18 and later P releases
  • 9.3P11 and later P releases
  • 9.4P6 and later P releases
  • 9.5 and later major releases

Correctable memory errors in ONTAP with dynamic thresholds

  • 9.1P17 and earlier P releases
  • 9.2 all P releases
  • 9.3P10 and earlier P releases
  • 9.4P5 and earlier P releases
Correctable memory errors reporting in ONTAP versions with static thresholds
NVRAM DIMM

9.1 and higher

Correctable memory errors on NVRAM DIMMs in ONTAP

  • FAS25x0
  • FAS22x0
  • V / FAS32x0
  • V / FAS62x0
System or NVRAM 9.1 and higher Correctable memory errors on 62XX, 32XX, 25XX, and 22XX systems in ONTAP
  • FAS80x0
  • FAS25x0
  • FAS22x0
  • V / FAS32x0
  • V / FAS62x0
System or NVRAM

Data ONTAP 8 7-Mode

Correctable memory errors on Data ONTAP 8

 

 

Additional Information

Notes:

  • DIMMs reporting correctable ECC errors should NOT be replaced only because correctable ECC errors are seen in EMS logs or if the “CriticalCECCCountMemErrAlert” system event and AutoSupport messages are seen.
  • NetApp storage systems utilize error-correcting code (ECC) memory modules (DIMMs) for both main system memory and NVRAM/NVMEM subsystems. When possible, memory errors are corrected in-flight by the memory subsystem hardware with little to no impact on system performance.
    • Previously ONTAP running on AFF/FAS storage systems employed a longstanding policy to alert the system administrator about “excessive” CECC memory errors based on a threshold of 500 errors since the last reboot of the system.
    • After extensive analysis of correctable ECC (CECC) memory errors by NetApp and its hardware component vendors, it was determined that CECC memory errors are typically not a good predictor of a system disruption due to uncorrectable ECC (UECC) memory errors – especially with the latest generations of memory controllers and dynamic random-access memory (DRAM).
    • Additionally, the CPU cycles used to monitor, log and correct large numbers of memory errors have negligible impact to system performance.
  • As a result, NetApp changed the monitoring algorithm for CECC memory errors used by ONTAP on many currently-supported AFF/FAS systems to a dynamic monitoring algorithm, with much higher thresholds configured to trigger the “CriticalCECCCountMemErrAlert” controller Health Monitor alert and corresponding "Health Monitor" AutoSupport message.
    • Alerts triggered using the older policy can be considered false positives and should not be taken as an indication for memory replacement as it will result in unnecessary hardware maintenance with no tangible benefits.

 

  • On versions of ONTAP that use the dynamic algorithm, CECC memory errors continue to be periodically logged in ONTAP event logs. However, they are no longer relevant in determining the need for DIMM replacement.
  • Correctable ECC errors are not an indicator that an uncorrectable ECC error will occur.   Should an uncorrectable memory error occur, it will cause a system disruption (panic). If a system disruption occurs, the panic message will call out the DIMM or DIMMs where the uncorrectable error occurred. For further information see:
  • Recent BIOS/LOADER releases for current shipping ONTAP platforms contain memory management enhancements. These updates improve resiliency to uncorrectable ECC errors as well as reduce scenarios where DIMMs can be mapped out during boot such as Bugs 1195242, 1195243, or 1195423. If your BIOS version is not the latest available for your AFF or FAS system, NetApp recommends updating the BIOS to the latest version.  Find the latest BIOS/LOADER version for your systems on the System Firmware & Diagnostics Download page.
  • JEDEC-standard NVDIMM modules are used in the following platforms:
    • AFF A800, AFF A400, AFF A320
    • FAS8700, FAS8300
NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.