Skip to main content
NetApp Knowledge Base

NetApp Element software may misreport memory errors and result in a cluster fault for memoryEccThreshold on MemCtlr0

Views:
1,494
Visibility:
Public
Votes:
0
Category:
element-software
Specialty:
solidfire
Last Updated:

Applies to

  • NetApp Element software 12.0 and 12.2
  • NetApp SolidFire SF-Series product line
  • NetApp H-series storage nodes

Issue

  • NetApp Element software may misreport correctable errors on DIMMs as being correctable errors on a node's memory controller
  • Default settings for ECC errors on a node's memory controller are overly aggressive, resulting in a persistent, error severity cluster fault after even a single error
  • The following is the cluster fault shown in NetApp SolidFire Active IQ and the cluster UI
    • Error Code: memoryEccThreshold
    • Details: Correctable ECC memory error count crossed threshold on Memory controller: MemCtlr0
  • Node's BMC system event log (SEL) actually reports error(s) on a DIMM at the same time as the cluster fault(s)
    • [Information]   [Memory Error]   [Memory]            Correctable ECC (CPU_A0) - Asserted

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

Scan to view the article on your device