Skip to main content
NetApp Knowledge Base

Poor performance and high CPU usage in a single node due to a degraded DIMM

Views:
71
Visibility:
Public
Votes:
0
Category:
ontap-9
Specialty:
perf
Last Updated:

Applies to

  • ONTAP 9
  • AFF A400

Issue

  • High CPU causes poor performance in one node.
  • High write latency in a data aggregate. Example:

Time                Node             Severity      Event
------------------- ---------------- ------------- ---------------------------
7/24/2023 18:33:25  node_name        ERROR         wafl.cp.toolong: Aggregate aggr_name experienced a long CP.
7/24/2023 18:15:22  node_name        ERROR         wafl.cp.toolong: Aggregate aggr_name experienced a long CP.

  • Node reboot after a PANIC, with a CORE DUMP file generated. Example:

"process on cpu17 hung (telnet_0) for 5001 milliseconds! in SK process telnet_0 on release 9.10.1P12 (C"

  • Correctable errors in a DIMM module. Example:

Number of correctable ECC since boot 60362216: Information about Correctable ECC: ECC error at DIMM-xx: CE-03-2106-18AEE039,ADDR 0x5959b3100,(Node(1), Memory controller(0), CH(0), DIMM(0), Rank(0), Bank Group(2), Bank(0x0), Row(0x52ad), Col(0x2c0))
Correctable Machine Check Error at CPU17 McBank7. SKL_IMC0 Error: STATUS<0xcc10000001010090> (...)

Number of correctable ECC since boot 60427752: Information about Correctable ECC: ECC error at DIMM-xx: CE-03-2106-18AEE039,ADDR 0x8698e9d00,(Node(1), Memory controller(0), CH(0), DIMM(0), Rank(1), Bank Group(0), Bank(0x0), Row(0x7d3f), Col(0x70))
Correctable Machine Check Error at CPU13 McBank7. SKL_IMC0 Error: STATUS<0xcc10000001010090> (...)

  • Memory Error Alert triggered for that DIMM. Example:

[node_name: mgwd: callhome.hm.alert.critical:debug]: Call home for Health Monitor process nphm: CriticalCECCCountMemErrAlert[DIMM-xx].

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.
Scan to view the article on your device