Skip to main content
NetApp Knowledge Base

Uncorrectable memory error on an AFF / FAS system that does not support PPR

Views:
2,229
Visibility:
Public
Votes:
1
Category:
fas-systems
Specialty:
hw
Last Updated:
11/29/2024, 8:03:35 AM

Applies to

  • ONTAP 9
  • Platforms:
  • AFF A320
  • AFF A300 / FAS8200
  • AFF A250 / AFF C250 / FAS500f
  • AFF A220 / AFF C190 / FAS27x0
  • AFF A200 / FAS26x0
  • AFF / FAS80x0
  • FAS22x0 / FAS25x0
  • FAS32x0 / FAS62x0

Issue

  • Controller panics and reboots with a DIMM error:

PANIC: ECC error at DIMM-18: 2C-0F-2007-2664E6BE,ADDR 0x180a048b40,(Node(1), Memory controller(1), CH(3), DIMM(0), Rank(0), Bank Group(1), Bank(0x0), Row(0xb8b1), Col(0x2f8), Uncorrectable Machine Check Error at CPU21.

  • EMS log:

cf_hwassist: cf.hwassist.takeoverTrapRecv:debug]: hw_assist: Received takeover hw_assist alert from partner(node02), system_down because dimm_uecc_error.

  • Event all log:

ECC error at DIMM-2: 2C-0F-1910-20FE7F16,ADDR 0x27fce6000,(Node(0), Memory controller(0), CH(1), DIMM(0), Rank(0), Bank Group(0), Bank(0x0), Row(0x0), Col(0x0)), devtag(0x3f), correrr(0x0) Uncorrectable Machine Check Error at CPU9. BDWL_HA0 Error: STATUS<0xfe00000000010091>(Val,OverF,UnCor,Enable,MiscV,AddrV,PCC,CorrSts(0),CorrCnt(0),ExtErr(0x1),ErrCode(Channel 1, Read),ErrCode(0x91)),MISC<0x00000000406aea86>(HaDbBank(0),PE(0),ReqOpcode(0x2),RNID(0),RTID(0x35),HTID(0x75))
Requesting SP to power cycle the filer to attempt to clear DRAM UECC

[IPMI Event.critical]: DIMM UECC Fatal Error detected by Storage OS
[Trap Event.critical]: hwassist dimm_uecc_error (32)
[Trap Event.critical]: SNMP dimm_uecc_error (32)
[IPMI Event.critical]: System power cycle
[IPMI.notice]: 08e8 | 02 | EVT: 015000ad | P3V3 | Assertion Event, "Lower Non-critical going low " | Reading: 0.000 | Threshold: 3.027
[IPMI.notice]: 08e9 | 02 | EVT: 015200a9 | P3V3 | Assertion Event, "Lower Critical going low " | Reading: 0.000 | Threshold: 2.957
[IPMI.notice]: 08ea | 02 | EVT: 0300ffff | Power_Good | Assertion Event, "State Deasserted"
[IPMI.notice]: 08eb | 02 | EVT: 015006af | P12V | Assertion Event, "Lower Non-critical going low " | Reading: 0.372 | Threshold: 10.850
[IPMI.notice]: 08ec | 02 | EVT: 015206aa | P12V | Assertion Event, "Lower Critical going low " | Reading: 0.372 | Threshold: 10.540
[BMC.critical]: Filer Reboots

 

 

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.