Skip to main content
NetApp Knowledge Base

Flash Cache does not insert and user slowness is seen due to high disk utilization

Views:
1,208
Visibility:
Public
Votes:
0
Category:
fas-systems
Specialty:
perf
Last Updated:
9/2/2024, 12:27:33 PM

Applies to

  • ONTAP 9
  • All FAS systems with NVMe based Flash Cache

Issue

  • Flash Cache inserts may have stopped on one or both nodes in an HA pair
    • If inserts stop:
      • Disk utilization is at 100% or near that level
      • Latencies have increased from a normal 0-5 ms to 20+ ms
    • Note: Sometimes inserts may not stop but the Insertq_save error below will be surpressed a high volume of times

Example: node2 has zero inserts using the below stats show -p flexscale-access advanced command

Cluster::> set -privilege advanced
Cluster::*> node run -node * stats show -p flexscale-access
Node: node1
 Usage    Hit   Meta   Miss Hit Evict Inval Insert Chain Blocks Chain Blocks  Replaced
     %     /s     /s     /s   %    /s    /s     /s    /s     /s    /s     /s        /s
    88    873     91    427  67   194     0   1426   430    895    22   1432       430
    88    711    254    480  59     0   884      0   418    732     0      0       418
    88    534    106    404  56     0     0      0   247    542     0      0       247
    88    640    100    528  54    88     0    766   279    657    12    793       279
^c
Node: node2
 Cache                                               Reads       Writes      Disk Reads
 Usage    Hit   Meta   Miss Hit Evict Inval Insert Chain Blocks Chain Blocks  Replaced
     %     /s     /s     /s   %    /s    /s     /s    /s     /s    /s     /s        /s
    63   1653      7   6481  20     0     9      0     0      0     0      0       266
    63   3902     23   7859  33     0     0      0     0      0     0      0       428
    63   2331     12   6462  26     0     0      0     0      0     0      0       330
    63   1130      6   7013  13     0    23      0     0      0     0      0       171
    63   1684      5   8037  17     0     0      0     0      0     0      0       246
  • Multiple events WAFL external cache I/O write error
    • These error messages are seen on the event log, which are surpressed tens of thousands to millions of times in a 10 minute window
    • If hundreds of thousands or millions are surpressed in 10 minutes this is the signature, otherwise it is normal and may be ignored
Wed Aug 10 03:03:37 UTC [node2: wafl_exempt08: ems.engine.suppressed:debug]: 
    Event 'extCache.io.writeError' suppressed 1638001 times in last 610 seconds.
Wed Aug 10 03:03:37 UTC [node2: wafl_exempt06: extCache.io.writeError:notice]: 
    WAFL external cache I/O write error: Insertq_save: unable to set up context chain, code 0.

  • The card status is ok in sysconfig -a which shows no failures
	slot 1: Flash Cache NVMe
		Serial Number:      ZEP00GPR
		Part Number:        119-00329
		Hardware Revision:  A0
		Firmware Version:   NA03
		Model Name:         X3311A
		Capacity:           1024 GB
		State:              ok
  • The system may panic, such as this panic string:

Panic_Message: received completion for unknown cmd in process irq287: nvme1 on release 9.8P14 (C)

 

  • The flash cache may eventually show as failed after this
slot 3: Flash Cache NVMe
  Serial Number:      1234567890
  Part Number:        119-00329
  Hardware Revision:  A0
  Firmware Version:   NA03
  Firmware File:      X3311_S000PM963NVM
  Model Name:         X3311A
  Capacity:           1024 GB
  State:              failed

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.