Skip to main content
NetApp Knowledge Base

NetApp Element software may misreport memory errors and result in a cluster fault for memoryEccThreshold on MemCtlr0

Views:
1,679
Visibility:
Public
Votes:
0
Category:
element-software
Specialty:
solidfire
Last Updated:

Applies to

  • NetApp Element software 12.0 and 12.2
  • NetApp SolidFire SF-Series product line
  • NetApp H-series storage nodes

Issue

  • NetApp Element software may misreport correctable errors on DIMMs as being correctable errors on a node's memory controller
  • Default settings for ECC errors on a node's memory controller are overly aggressive, resulting in a persistent, error severity cluster fault after even a single error
  • The following is the cluster fault shown in NetApp SolidFire Active IQ and the cluster UI
    • Error Code: memoryEccThreshold
    • Details: Correctable ECC memory error count crossed threshold on Memory controller: MemCtlr0
  • Node's BMC system event log (SEL) actually reports error(s) on a DIMM at the same time as the cluster fault(s)
    • [Information]   [Memory Error]   [Memory]            Correctable ECC (CPU_A0) - Asserted

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.