Skip to main content
NetApp Knowledge Base

CriticalCECCCountMemErrAlert and BootDimmDisableAlert in observed in AFF A1K

Views:
Visibility:
Public
Votes:
0
Category:
aff-series
Specialty:
hw
Last Updated:

Applies to

  • AFF A1K
  • System DIMM modules

Issue

  • ONTAP triggers alert against one DIMM module as follows for CriticalCECCCountMemErrAlertMessage in EMS

[CLUSTER-01: mgwd: callhome.hm.alert.critical:alert]: Call home for Health Monitor process nphm: CriticalCECCCountMemErrAlert[DIMM-32].

  • Output for command ::*> memory dimm show -node <node_name> shows a single DIMM as "degraded"

::*> memory dimm show -node CLUSTER-01
  (system controller memory dimm show)
              DIMM     UECC  CECC  Alert    CPU           Slot          Failure
Node          Name    Count Count Method Socket Channel Number Status    Reason
------------- ------- ----- ----- ------ ------ ------- ------ ------- --------
NAS3_APP_A
              DIMM-1      0     0 bucket      1       7      0 ok          none
              ...
              ...
              DIMM-32     0 151597 bucket     0       3      0 degraded    none <<<<<<<
16 entries were displayed.

  • Replacing the affected DIMM does not fix the issue:
    • The DIMM shows failed during boot up sequence
    • Additional DIMM is failed
    • Multiple DIMM modules are disabled

DIMM in slot 1 is disabled
DIMM in slot 5 is disabled
DIMM in slot 7 is disabled
DIMM in slot 12 is disabled
DIMM in slot 14 is disabled
DIMM in slot 16 is disabled
DIMM in slot 17 is disabled
DIMM in slot 21 is disabled
DIMM in slot 23 is disabled
DIMM in slot 28 is disabled
DIMM in slot 30 failed <<<<<< New failed
DIMM in slot 32 failed

  • During boot sequence the following error is observed:

Apr 13 21:59:46 [CLUSTER-01:platform.reducedMemory:ALERT]: System memory (255 GB) is less than expected (1024 GB). Check DIMMs slots 1, 5, 7, 12, 14, 16, 17, 21, 23, 28, 30, 32.

  • Swaping the DIMM modules to different slots do not solve the issue:

Initializing System Memory ...
DIMM:32 mapped out. BIOS MRC mapped out DIMM. Major / Minor Error Code: 0x46 / 0x03
Complete channel mapped out.

  • The system is able to boot up but new alert "BootDimmDisableAlert" is triggered for each one of the disabled DIMMs

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.