Skip to main content
NetApp Knowledgebase

AFF A700s CECC: Correctable Machine Check Errors being reported against wrong DIMM

Views:
664
Visibility:
Public
Votes:
1
Category:
aff-series
Specialty:
hw
Last Updated:

Applies to

  • AFF A700s
  • All Flash FAS

Issue

The CECC error is reported in the same DIMM even after a replacement:

The system health alert show command reports errors similar to the following on the cluster:

Node                  xxxxxx
Monitor               controller
Alert ID              CriticalCECCCountMemErrAlert
Alerting Resource     DIMM-x
Subsystem             Memory
Indication Time       Tue Oct 09 12:24:36 2018
Perceived Severity    Critical
Probable Cause        DIMM_Degraded
Description           The DIMM has degraded, leading to memory errors.

The following are corrective actions:

1. Contact technical support to obtain a new DIMM of the same specification
2. If possible, perform a takeover of this node and bring the node down for maintenance
3. Refer to the DIMM replacement guide for your given hardware platform to replace the DIMM
4. Bring the storage system online

Possible Effect:
Memory issues can lead to a catastrophic system panic, which can lead to data downtime on the node.


The EMS log displays a message similar to the following, reporting CECC error on the specific DIMM:

[?] Tue Oct 09 12:24:36 IST [xxxx: mgwd: callhome.hm.alert.critical:alert]: Call home for Health Monitor process nphm: CriticalCECCCountMemErrAlert[DIMM-x].

Normally, a replacement of this DIMM is suggested.
However, even after the replacement, the errors in the same DIMM might be reported by the cluster.

 

 

 

CUSTOMER EXCLUSIVE CONTENT

Registered NetApp customers get unlimited access to our dynamic Knowledge Base.

New authoritative content is published and updated each day by our team of experts.

Current Customer or Partner?

Sign In for unlimited access

New to NetApp?

Learn more about our award-winning Support